MARK6582 · AI & Marketing · Georgetown University

Day 3B Study Guide

Chatbot Infrastructure, Token Pricing, and API Keys
Part 1

What Is an API and How Do AI Systems Communicate

API Request Response API Key Authentication Endpoint

Before understanding how a chatbot is priced or how tokens accumulate, it helps to understand what is actually happening when software components talk to each other. APIs are the mechanism. They are used everywhere in modern AI systems: between your application and the language model, between the language model and external tools, and between different services in a tech stack.

Concept 1

An API Is a Standardized Way for Software to Make Requests and Receive Responses

API stands for Application Programming Interface. The word "interface" is the key: an API is a defined contract that specifies how one piece of software can ask another piece of software to do something. The requester sends a structured message to a specific address (the endpoint) with specific parameters; the responder processes it and returns a structured result. Neither side needs to know how the other works internally. They only need to agree on the format of the request and the response.

This is not unique to AI. When a weather app displays the current temperature, it is making an API call to a weather service. When you pay for something online, the checkout page is making an API call to a payment processor. APIs are how modern software systems are composed from separate, specialized services rather than built as a single monolithic program.

In AI systems, APIs serve the same purpose. When a chatbot needs a language model to generate a response, it makes an API call to the model provider, sending the conversation as input and receiving the generated text as output. The chatbot does not contain the model; it accesses it remotely through the API.

What an API call to OpenAI looks like
Your application sends a request to https://api.openai.com/v1/chat/completions containing: which model to use, the system prompt, the conversation history, and the new user message. OpenAI processes it and returns a JSON object containing the model's response text, the number of input tokens used, and the number of output tokens generated. Your application reads the response text and displays it to the user.
Concept 2

AI Systems Are Composed of Multiple APIs Talking to Each Other

A deployed AI product is rarely a single system. It is a chain of API calls. The user interacts with an interface (a website, a survey, an app). That interface calls a backend server. The backend server calls the language model API. The language model may in turn call tool APIs to retrieve information or take actions. Each call is a separate request-response exchange, and each has its own latency, cost, and potential failure mode.

In the Day 1 framework, these API calls are part of the behavior layer: the pipeline is executing, each component is doing its job, and the results flow from one layer to the next. Understanding that each step is a discrete, billable call changes how you think about system design. A slow tool call adds latency the user feels. A verbose tool response adds tokens the operator pays for. A failed API call at any step can break the whole chain.

┌──────────────────────────────────────────────────────────────────────┐
│  HOW AI SYSTEMS COMMUNICATE: A CHAIN OF API CALLS                   │
└──────────────────────────────────────────────────────────────────────┘

  USER
    │  types a message
    ▼
  ┌─────────────────────────┐
  │  Interface              │   (Qualtrics survey, website, mobile app)
  │  (browser / frontend)   │
  └────────────┬────────────┘
             │  HTTP request → backend server
             ▼
  ┌─────────────────────────┐
  │  Backend server         │   Adds API key, assembles full context,
  │  (your application)     │   enforces constraints, logs usage
  └────────────┬────────────┘
             │  API call (with key + context)
             ▼
  ┌─────────────────────────┐
  │  Language model API     │   Reads context window, generates response
  │  (OpenAI, Anthropic...) │   Returns: text + token counts + metadata
  └────────────┬────────────┘
             │  Tool call (if needed)
             ▼
  ┌─────────────────────────┐
  │  External tool APIs     │   Product database, order system,
  │  (optional)             │   web search, calendar, CRM...
  └────────────┬────────────┘
             │  Tool result returned to model
             ▼
  ┌─────────────────────────┐
  │  Language model         │   Incorporates tool result,
  │  (continues)            │   generates final response
  └────────────┬────────────┘
             │  Response text returned to backend
             ▼
  ┌─────────────────────────┐
  │  Backend server         │   Logs response, appends to history,
  │                         │   sends to interface
  └────────────┬────────────┘
             │  Displays to user
             ▼
  USER sees response
    
Concept 3

An API Key Is a Credential That Links Requests to a Billing Account

Every API call to a commercial model provider must include an API key: a long string of characters that identifies the account making the request. The key serves two purposes simultaneously. First, authentication: the provider verifies that the request comes from an authorized account before processing it. Second, billing: every token processed under your key is charged to your account. The key is the link between computation and cost.

This means an exposed API key is not just a security problem; it is an open billing account. Anyone with your key can make API calls charged to you, up to your account's spending limit. The risk is proportional to scale: a key exposed in a public code repository can be found and exploited by automated scrapers within minutes.

This is why, in the LUCID setup for this course, the API key lives on the backend server and never appears in the Qualtrics survey code. When a participant sends a message, Qualtrics calls the backend, which adds the key and forwards the request to OpenAI. The participant never sees the key, and neither does anyone inspecting the survey source.

Security Rules
Do: Save the key immediately when generated. You will not be able to see it again after closing the creation page.
Do: Store it in a secure location such as a password manager or environment variable, never in plain text in a file.
Do not: Put it in any code that runs in the browser or is committed to a public repository. Client-side code is visible to anyone who views the page source.
Do not: Share it in email, Slack, or any messaging platform.
If compromised: Delete the key immediately from the OpenAI API Keys page and generate a new one. The old key stops working as soon as it is revoked.

Resources for This Section

Guide How to Code Data Using the OpenAI API: A Simple R Guide (questionableresearch.ai)

A practical walkthrough of making API calls from R, covering authentication, request structure, and handling responses. Useful context for understanding the request-response cycle in practice.

Article Where to Find Your OpenAI API Key (OpenAI Help)

Step-by-step instructions for creating and managing API keys in the OpenAI platform.

Part 2

How a Chatbot Conversation Actually Runs

System Prompt Context Window Input Tokens Output Tokens Conversation History
Concept 4

Every Turn Is a Fresh API Call with the Full History Re-Sent

A chatbot conversation is not a continuous session where the model remembers what was said. It is a series of discrete API calls. Each time a user sends a message, the entire conversation history is assembled and sent to the model from scratch, along with the system prompt. The model has no memory between calls. Everything it knows about the conversation is in the input it receives at that moment.

The input to each API call has three parts: the system prompt (instructions about how the model should behave), the conversation history (every prior user and assistant message), and the new user message. Together these form the context window for that turn. The model reads the full context, generates a response, and returns it. That response is then appended to the history and included in the next call.

This has a direct consequence for cost: input tokens accumulate with every turn. The system prompt is paid for on every call. Each assistant response becomes part of the history and is re-sent as input on all future turns. A conversation does not cost the same each turn; it gets progressively more expensive as history grows.

Example: Home Depot Chatbot
A customer opens the Home Depot website and asks the chatbot for drill recommendations. That is Turn 1. The model receives: system prompt (120 tokens) + user message (10 tokens) = 180 tokens in, and generates a response of 95 tokens.
On Turn 2 the customer asks a follow-up. The model now receives: system prompt (120) + Turn 1 user message + Turn 1 assistant response (124 tokens of history) + new user message (10) = 254 tokens in. The Turn 1 response has become part of the input.
By Turn 4 the input has grown to 476 tokens, even though the user's final message was only 4 tokens long. The system prompt and accumulated history account for the rest.
┌─────────────────────────────────────────────────────────────────────────┐
│  4-TURN HOME DEPOT CONVERSATION                                         │
└─────────────────────────────────────────────────────────────────────────┘

TURN 1
  ┌─────────────────────────────┐
  │ System prompt        120 tk │  ─────────────────────────────────┐
  │ Scenario context      50 tk ││ User: "Does this come        ││        with a battery?" 10tk│                             ┌──────────────┐
  │                             │                             │  API call    │
  │ INPUT TOTAL:        180 tk  │  ──────────────────────────►│  OpenAI      │
  └─────────────────────────────┘                             │              │
                                                              │ OUTPUT:95 tk │
                                                              └──────┬───────┘
                                                                     │
                                           ┌─────────────────────────┘
                                           │  Response appended to history
                                           ▼
TURN 2
  ┌─────────────────────────────┐
  │ System prompt        120 tk │
  │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
  │ History (Turn 1)     124 tk │  ◄── Turn 1 output is now input
  │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
  │ User: "What batteries?"  10 ││                             │                             ┌──────────────┐
  │ INPUT TOTAL:        254 tk  │  ──────────────────────────►│  API call    │
  └─────────────────────────────┘                             │ OUTPUT:75 tk │
                                                              └──────┬───────┘
                                                                     │
                                           ┌─────────────────────────┘
                                           ▼
TURN 3
  ┌─────────────────────────────┐
  │ System prompt        120 tk │
  │ History (Turns 1-2)  213 tk │  ◄── Grows with every turn
  │ User message          14 tk ││ INPUT TOTAL:        347 tk  │  ──────────────────────────►┌──────────────┐
  └─────────────────────────────┘                             │ OUTPUT:120tk │
                                                              └──────┬───────┘
                                                                     │
                                           ┌─────────────────────────┘
                                           ▼
TURN 4
  ┌─────────────────────────────┐
  │ System prompt        120 tk │
  │ History (Turns 1-3)  352 tk │
  │ User: "with the             │
  │        same drill"    4 tk  │  ◄── Only 4 new tokens from user
  │                             ││ INPUT TOTAL:        476 tk  │  ──────────────────────────►┌──────────────┐
  └─────────────────────────────┘                             │ OUTPUT:137tk │
                                                              └──────────────┘

  Total input across all turns:  1,257 tokens
  Total output across all turns:   427 tokens
    
Part 3

Token Math: Why Conversations Get More Expensive

Token Input Tokens Output Tokens Cumulative Cost Prompt Caching Tool Call
Concept 5

Input Tokens Grow Every Turn. Output Tokens Become Input.

A token is roughly three to four characters of text, or about three-quarters of a word. Models do not process words or sentences; they process tokens. Every API call is priced on two quantities: input tokens (everything sent to the model) and output tokens (everything the model generates in response). Input and output are priced separately, and output is usually more expensive per token.

The key dynamic is that output tokens from one turn become input tokens on the next. The assistant's response is appended to the conversation history and re-sent in full on every subsequent call. This means a long assistant response in Turn 2 will be paid for again as part of the input in Turns 3, 4, 5, and so on. Prompt efficiency matters because the cost compounds across turns.

Turn What is new Input tokens Output tokens Cumulative input
1 System prompt (120) + user message (10) 180 95 180
2 History from Turn 1 (124) + new user message (10) 254 75 434
3 History from Turns 1-2 (213) + new user message (14) 347 120 781
4 History from Turns 1-3 (352) + new user message (4) 476 137 1,257
Total across all four turns 1,257 427

Note that the system prompt (120 tokens) alone accounts for 480 tokens of input across the four turns, simply because it is re-sent every time. A poorly written system prompt is not a one-time cost.

Why It Matters

At small scale, the difference between a 100-token and a 300-token system prompt is negligible. At one million conversations per month, it is the difference between 100 million and 300 million input tokens, which translates directly into dollars. System prompt efficiency, response length, and conversation depth are all cost levers, not just design choices.

Why not just send the new message instead of the full history?

Because the model has no memory between calls. Without the history, the model cannot connect "with the same drill" (Turn 4) to the drill discussion in Turn 1. The full context has to be re-sent for the model to produce coherent, contextually appropriate responses. Some systems implement summarization to compress older history and reduce token count, but this introduces the risk of losing details the model might need.

Concept 6

Prompt Caching: Paying Once for What Gets Re-Sent Every Time

Because the system prompt is re-sent on every turn, API providers have introduced prompt caching as a cost optimization. When caching is enabled, the provider stores the processed representation of a long, stable prefix (typically the system prompt and any fixed context) after the first call. On subsequent calls in the same conversation, if that prefix has not changed, the provider reads from the cache instead of reprocessing it. Cached tokens are charged at a significantly reduced rate.

OpenAI charges cached input tokens at 50% of the standard input price. Anthropic's prompt caching is similar in structure, with cache write tokens slightly more expensive than standard input and cache read tokens substantially cheaper. The economics favor any system where the same large prompt is sent repeatedly across many turns or many conversations.

In the Home Depot example, the 120-token system prompt is re-sent on all four turns. With caching, after the first call that prefix is stored and the three subsequent reads are cheaper. At scale across millions of conversations, caching the system prompt can meaningfully reduce the total input token bill without changing the model's behavior at all.

Why It Matters

Caching changes the cost calculus for system prompt design. Without caching, a shorter system prompt is always cheaper. With caching, a longer and more detailed system prompt that is mostly static may cost almost the same as a short one once caching kicks in. This removes pressure to write terse system prompts and allows more thorough instructions without proportionally higher cost.

Concept 7

Tool Calls: When the Model Reaches Outside the Context Window

A language model by itself can only work with what is in the context window. It cannot look up a product's current inventory, check an order status, or retrieve a customer's purchase history unless that information is explicitly included in the input. Tool calls extend what the model can do by allowing it to request external data or trigger external actions mid-conversation.

When a tool call is made, the exchange looks like this: the model generates a structured request specifying which tool to call and with what parameters; the application runs the tool (a database query, an API call, a function) and returns the result; that result is then added to the conversation context as a new message and the model continues generating its response. In the Day 1 framework, tool calls are part of the behavior layer, and their outputs become part of the representation on the next step.

The token cost implication is the same as for any other content: the tool call request and the tool result both consume tokens and are included in the history going forward. A tool that returns a verbose response — a full product catalog, a long JSON object — can add significantly to the input token count for every subsequent turn. Designing tool outputs to be concise is a direct cost lever.

Example: Home Depot Chatbot with Inventory Lookup
In Turn 3, the customer asks for a kit that includes a drill, battery, and charger. Rather than generating an answer from training data alone, the chatbot calls a product search tool with parameters like {"query": "drill kit with battery charger", "category": "power-tools"}. The tool returns a list of matching products from the live catalog. That result is injected into the context, and the model uses it to generate its response. The customer sees an up-to-date answer; the model never had to memorize inventory.
The tool result now lives in the conversation history and will be re-sent as part of the input on Turn 4. If the product list returned 800 tokens of JSON, those 800 tokens are paid for again on every subsequent turn.
Part 4

Model Pricing: The Same Conversation, Very Different Costs

Price per Million Tokens Input Price Output Price Self-Hosted
Concept 8

Choosing a Model Is a Cost-Quality Tradeoff, Not Just a Quality Decision

The conversation structure described above works identically regardless of which language model you use. The same system prompt, history, and user message are sent; the same token counts apply. What changes is the price per token, and that difference can be substantial.

OpenAI prices models by input and output tokens per million. Output tokens are more expensive than input tokens because generating text requires more compute than reading it. Prices as of the time of this lecture:

Model Input (per 1M tokens) Output (per 1M tokens) Cost at 1M conversations
GPT-4o $2.50 $10.00 $4,799
GPT-4o mini $0.15 $0.60 $632
Open-source (self-hosted, A100 GPU) ~$0 API ~$0 API $21,900

The open-source row illustrates an important point: API cost is not the same as total cost. Running your own model requires cloud GPU infrastructure, which at the scale of this example costs more than using GPT-4o via the API. Self-hosting only becomes cost-effective at very high token volumes, where the per-unit infrastructure cost drops below the per-token API price.

For the Home Depot 4-turn conversation (1,257 input tokens, 427 output tokens), the cost on GPT-4o mini is approximately $0.00057 per conversation, well under a tenth of a cent. At one million conversations that becomes $570. On GPT-4o the same conversation costs roughly $7.40 per thousand conversations.

Why It Matters

Model selection is often framed as a quality decision. It is also a cost decision, and the two are frequently in tension. A model that performs 15% better on your task but costs eight times more may not be the right choice at scale. For many commercial applications, a smaller, cheaper model is sufficient, and the cost savings at volume are significant enough to change the business case for deploying the chatbot at all.

Resources for This Section

Guide Understanding OpenAI Pricing: How Much Does It Cost to Use the API? (questionableresearch.ai)

A worked example of calculating API costs for a real task, with input/output token breakdowns and comparisons across models.

Interactive OpenAI Tokenizer

Paste any text and see exactly how it is broken into tokens. Useful for estimating prompt costs before sending a single API call.

Reference OpenAI API Pricing

Current per-token prices for all OpenAI models. Prices change; always check this page rather than relying on numbers from a specific date.

All Resources at a Glance

Topic Resource Format
API basics and coding data How to Code Data Using the OpenAI API: A Simple R Guide Guide
API Pricing Understanding OpenAI Pricing (questionableresearch.ai) Guide
Tokenization OpenAI Tokenizer Interactive
Model Pricing OpenAI API Pricing Reference
API Key Setup Where to Find Your OpenAI API Key (OpenAI Help) Article
Part 1

How a Chatbot Conversation Actually Runs

System Prompt Context Window Input Tokens Output Tokens Conversation History
Concept 1

Every Turn Is a Fresh API Call with the Full History Re-Sent

A chatbot conversation is not a continuous session where the model remembers what was said. It is a series of discrete API calls. Each time a user sends a message, the entire conversation history is assembled and sent to the model from scratch, along with the system prompt. The model has no memory between calls. Everything it knows about the conversation is in the input it receives at that moment.

The input to each API call has three parts: the system prompt (instructions about how the model should behave), the conversation history (every prior user and assistant message), and the new user message. Together these form the context window for that turn. The model reads the full context, generates a response, and returns it. That response is then appended to the history and included in the next call.

This has a direct consequence for cost: input tokens accumulate with every turn. The system prompt is paid for on every call. Each assistant response becomes part of the history and is re-sent as input on all future turns. A conversation does not cost the same each turn; it gets progressively more expensive as history grows.

Example: Home Depot Chatbot
A customer opens the Home Depot website and asks the chatbot for drill recommendations. That is Turn 1. The model receives: system prompt (120 tokens) + user message (10 tokens) = 180 tokens in, and generates a response of 95 tokens.
On Turn 2 the customer asks a follow-up. The model now receives: system prompt (120) + Turn 1 user message + Turn 1 assistant response (124 tokens of history) + new user message (10) = 254 tokens in. The Turn 1 response has become part of the input.
By Turn 4 the input has grown to 476 tokens, even though the user's final message was only 4 tokens long. The system prompt and accumulated history account for the rest.
┌─────────────────────────────────────────────────────────────────────────┐
│  4-TURN HOME DEPOT CONVERSATION                                         │
└─────────────────────────────────────────────────────────────────────────┘

TURN 1
  ┌─────────────────────────────┐
  │ System prompt        120 tk │  ─────────────────────────────────┐
  │ Scenario context      50 tk ││ User: "Does this come        ││        with a battery?" 10tk│                             ┌──────────────┐
  │                             │                             │  API call    │
  │ INPUT TOTAL:        180 tk  │  ──────────────────────────►│  OpenAI      │
  └─────────────────────────────┘                             │              │
                                                              │ OUTPUT:95 tk │
                                                              └──────┬───────┘
                                                                     │
                                           ┌─────────────────────────┘
                                           │  Response appended to history
                                           ▼
TURN 2
  ┌─────────────────────────────┐
  │ System prompt        120 tk │
  │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
  │ History (Turn 1)     124 tk │  ◄── Turn 1 output is now input
  │ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
  │ User: "What batteries?"  10 ││                             │                             ┌──────────────┐
  │ INPUT TOTAL:        254 tk  │  ──────────────────────────►│  API call    │
  └─────────────────────────────┘                             │ OUTPUT:75 tk │
                                                              └──────┬───────┘
                                                                     │
                                           ┌─────────────────────────┘
                                           ▼
TURN 3
  ┌─────────────────────────────┐
  │ System prompt        120 tk │
  │ History (Turns 1–2)  213 tk │  ◄── Grows with every turn
  │ User message          14 tk ││ INPUT TOTAL:        347 tk  │  ──────────────────────────►┌──────────────┐
  └─────────────────────────────┘                             │ OUTPUT:120tk │
                                                              └──────┬───────┘
                                                                     │
                                           ┌─────────────────────────┘
                                           ▼
TURN 4
  ┌─────────────────────────────┐
  │ System prompt        120 tk │
  │ History (Turns 1–3)  352 tk │
  │ User: "with the             │
  │        same drill"    4 tk  │  ◄── Only 4 new tokens from user
  │                             ││ INPUT TOTAL:        476 tk  │  ──────────────────────────►┌──────────────┐
  └─────────────────────────────┘                             │ OUTPUT:137tk │
                                                              └──────────────┘

  Total input across all turns:  1,257 tokens
  Total output across all turns:   427 tokens
    
Part 2

Token Math: Why Conversations Get More Expensive

Token Input Tokens Output Tokens Cumulative Cost Prompt Caching Tool Call
Concept 2

Input Tokens Grow Every Turn. Output Tokens Become Input.

A token is roughly three to four characters of text, or about three-quarters of a word. Models do not process words or sentences; they process tokens. Every API call is priced on two quantities: input tokens (everything sent to the model) and output tokens (everything the model generates in response). Input and output are priced separately, and output is usually more expensive per token.

The key dynamic is that output tokens from one turn become input tokens on the next. The assistant's response is appended to the conversation history and re-sent in full on every subsequent call. This means a long assistant response in Turn 2 will be paid for again as part of the input in Turns 3, 4, 5, and so on. Prompt efficiency matters because the cost compounds across turns.

Turn What is new Input tokens Output tokens Cumulative input
1 System prompt (120) + user message (10) 180 95 180
2 History from Turn 1 (124) + new user message (10) 254 75 434
3 History from Turns 1–2 (213) + new user message (14) 347 120 781
4 History from Turns 1–3 (352) + new user message (4) 476 137 1,257
Total across all four turns 1,257 427

Note that the system prompt (120 tokens) alone accounts for 480 tokens of input across the four turns, simply because it is re-sent every time. A poorly written system prompt is not a one-time cost.

Why It Matters

At small scale, the difference between a 100-token and a 300-token system prompt is negligible. At one million conversations per month, it is the difference between 100 million and 300 million input tokens, which translates directly into dollars. System prompt efficiency, response length, and conversation depth are all cost levers, not just design choices.

Why not just send the new message instead of the full history?

Because the model has no memory between calls. Without the history, the model cannot connect "with the same drill" (Turn 4) to the drill discussion in Turn 1. The full context has to be re-sent for the model to produce coherent, contextually appropriate responses. Some systems implement summarization to compress older history and reduce token count, but this introduces the risk of losing details the model might need.

Concept 3

Prompt Caching: Paying Once for What Gets Re-Sent Every Time

Because the system prompt is re-sent on every turn, API providers have introduced prompt caching as a cost optimization. When caching is enabled, the provider stores the processed representation of a long, stable prefix (typically the system prompt and any fixed context) after the first call. On subsequent calls in the same conversation, if that prefix has not changed, the provider reads from the cache instead of reprocessing it. Cached tokens are charged at a significantly reduced rate.

OpenAI charges cached input tokens at 50% of the standard input price. Anthropic's prompt caching is similar in structure, with cache write tokens slightly more expensive than standard input and cache read tokens substantially cheaper. The economics are favorable for any system where the same large prompt is sent repeatedly across many turns or many conversations.

In the Home Depot example, the 120-token system prompt is re-sent on all four turns. With caching, after the first call that prefix is stored and the three subsequent reads are cheaper. At scale across millions of conversations, caching the system prompt can meaningfully reduce the total input token bill without changing the model's behavior at all.

Why It Matters

Caching changes the cost calculus for system prompt design. Without caching, a shorter system prompt is always cheaper. With caching, a longer and more detailed system prompt that is mostly static may cost almost the same as a short one once caching kicks in. This removes some of the pressure to write terse system prompts and allows more thorough instructions without proportionally higher cost.

Concept 4

Tool Calls: When the Model Reaches Outside the Context Window

A language model by itself can only work with what is in the context window. It cannot look up a product's current inventory, check an order status, or retrieve a customer's purchase history unless that information is explicitly included in the input. Tool calls extend what the model can do by allowing it to request external data or trigger external actions mid-conversation.

When a tool call is made, the exchange looks like this: the model generates a structured request specifying which tool to call and with what parameters; the application runs the tool (a database query, an API call, a function) and returns the result; that result is then added to the conversation context as a new message and the model continues generating its response. In the Day 1 framework, tool calls are part of the behavior layer, and their outputs become part of the representation on the next step.

The token cost implication is the same as for any other content: the tool call request and the tool result both consume tokens and are included in the history going forward. A tool that returns a verbose response (a full product catalog, a long JSON object) can add significantly to the input token count for every subsequent turn. Designing tool outputs to be concise is a direct cost lever.

Example: Home Depot Chatbot with Inventory Lookup
In Turn 3, the customer asks for a kit that includes a drill, battery, and charger. Rather than generating an answer from training data alone, the chatbot calls a product search tool with parameters like {"query": "drill kit with battery charger", "category": "power-tools"}. The tool returns a list of matching products from the live catalog. That result is injected into the context, and the model uses it to generate its response. The customer sees an up-to-date answer; the model never had to memorize inventory.
The tool result now lives in the conversation history and will be re-sent as part of the input on Turn 4. If the product list returned 800 tokens of JSON, those 800 tokens are paid for again on every subsequent turn.
Part 3

Model Pricing: The Same Conversation, Very Different Costs

Price per Million Tokens Input Price Output Price Self-Hosted
Concept 5

Choosing a Model Is a Cost-Quality Tradeoff, Not Just a Quality Decision

The conversation structure described above works identically regardless of which language model you use. The same system prompt, history, and user message are sent; the same token counts apply. What changes is the price per token, and that difference can be substantial.

OpenAI prices models by input and output tokens per million. Output tokens are more expensive than input tokens because generating text requires more compute than reading it. Prices as of the time of this lecture:

Model Input (per 1M tokens) Output (per 1M tokens) Cost at 1M conversations
GPT-4o $2.50 $10.00 $4,799
GPT-4o mini $0.15 $0.60 $632
Open-source (self-hosted, A100 GPU) ~$0 API ~$0 API $21,900

The open-source row illustrates an important point: API cost is not the same as total cost. Running your own model requires cloud GPU infrastructure, which at the scale of this example costs more than using GPT-4o via the API. Self-hosting only becomes cost-effective at very high token volumes, where the per-unit infrastructure cost drops below the per-token API price.

For the Home Depot 4-turn conversation (1,257 input tokens, 427 output tokens), the cost on GPT-4o mini is approximately $0.00057 per conversation, well under a tenth of a cent. At one million conversations that becomes $570. On GPT-4o the same conversation costs roughly $7.40 per thousand conversations.

Why It Matters

Model selection is often framed as a quality decision. It is also a cost decision, and the two are frequently in tension. A model that performs 15% better on your task but costs eight times more may not be the right choice at scale. For many commercial applications, a smaller, cheaper model is sufficient, and the cost savings at volume are significant enough to change the business case for deploying the chatbot at all.

Part 4

API Keys: Authentication, Billing, and Security

API Key Authentication Usage Tracking
Concept 6

An API Key Is a Credential, Not Just a Password

To make API calls to OpenAI, every request must include an API key: a long string of characters that identifies your account. The key serves two purposes. First, authentication: OpenAI verifies that the request is coming from an authorized account before processing it. Second, billing: every token processed under your key is charged to your account. The key is the link between the computation and the invoice.

This means an exposed API key is not just a security breach; it is an open billing account. Anyone who obtains your key can make API calls that are charged to you, with no limit other than your account's spending cap. At scale, an exposed key can generate substantial charges before you notice.

Security Rules
Do: Save the key immediately when generated. You will not be able to see it again after closing the creation page.
Do: Store it in a secure location (a password manager or environment variable), never in plain text.
Do not: Put it in any code that runs in the browser or is committed to a public repository. Client-side code is visible to anyone who views the page source.
Do not: Share it in email, Slack, or any other messaging platform.
If compromised: Delete the key immediately from the OpenAI API Keys page and generate a new one. The old key stops working as soon as it is revoked.
Why the key lives on the server, not in the browser
In the LUCID setup for this course, the API key is stored on a backend server, not in the Qualtrics survey. When a participant sends a message, Qualtrics sends the text to the backend server, which adds the API key to the request and forwards it to OpenAI. The participant never sees the key. If the key were embedded in the Qualtrics survey code instead, anyone who opened their browser's developer tools could read it directly from the page source.

Resources for This Section

Interactive OpenAI Tokenizer

Paste any text and see exactly how it is broken into tokens. Useful for estimating prompt costs before sending a single API call.

Reference OpenAI API Pricing

Current per-token prices for all OpenAI models. Prices change; always check this page rather than relying on numbers from a specific date.

Article Where to Find Your OpenAI API Key (OpenAI Help)

Step-by-step guide to creating and managing API keys in the OpenAI platform.

All Resources at a Glance

Topic Resource Format
Tokenization OpenAI Tokenizer Interactive
Model Pricing OpenAI API Pricing Reference
API Key Setup Where to Find Your OpenAI API Key (OpenAI Help) Article