AI & Marketing
To think about how GenAI is being implemented, we need to understand how companies use chatbots inside their existing products (e.g., their website).
To illustrate that, we’ll pick a fictional example: a Home Depot chatbot helping a customer pick out a drill.
The scenario
Customer opens Home Depot’s website and starts a chat.
They ask: “I need help picking a drill for home projects. What do you recommend?”
The chatbot responds with suggestions based on their needs.
Simple exchange. But something is running behind it.
That something is a model running as a loop.
Every time the customer types, the loop runs once.
Step through each turn of a realistic 4-turn conversation. Watch how the input context grows and costs accumulate.
Input tokens (re-sent on every turn)
Total input across all stages: 1,257 tokens
Output tokens (generated once per turn)
Total output across all stages: 427 tokens
System instructions — The rules that tell the model how to behave. Example: “You are a Home Depot assistant. Help customers pick tools. Do not make up product details.”
Context window — The total number of tokens the model can see at once. It includes system instructions + conversation history + the current message.
User response — What the customer types. Example: “I need a drill for occasional home projects.”
AI response — What the model generates. Example: “For occasional use, I’d recommend the DeWalt 20V. It’s affordable and reliable.”
Input tokens — All the tokens sent to the model (system + history + user message). This is what you pay for.
Output tokens — All the tokens the model generates in its response. This is what you pay for (usually more expensive than input).
How they relate: Input tokens grow with each turn because you re-send everything. Output tokens become input tokens on the next turn. That’s why conversations get more expensive as they get longer.
Different language models have very different costs.
The same conversation structure works with any model.
But the cost per token is very different.
GPT-4o (high quality, expensive)
GPT-4o mini (cheaper, often good enough)
Open-source (self-hosted)
Cost comparison at 1M visitors
| Model | Cost |
|---|---|
| GPT-4o | $4,799 |
| GPT-4o mini | $632 |
| Open-source (A100) | $21,900 |
Running your own LLM can be expensive. It’s only cost-effective if you’re processing billions of tokens.
A tech stack is the set of systems that work together to deliver a digital experience. For Home Depot, the website and chatbot run on the same core platform.
The 5 Core Elements (Business View)
1. Customer Experience Layer - Website, mobile app, chatbot interface - What customers see and interact with - Drives engagement and conversion
2. Application Layer (Business Logic) - Handles key actions: search products, check inventory, place orders - Ensures everything works correctly behind the scenes
3. Data Layer - Stores and processes: product data, customer data, transaction history - Enables personalization and decision-making
4. Infrastructure Layer - Cloud systems and content delivery - Ensures: speed, reliability, scalability during peak demand
5. Chatbot (Integrated Capability) - A feature within the experience, not a separate system - Helps customers: find products, answer questions, complete tasks faster - Pulls information from the same systems as the website
Key insight: The chatbot is not a standalone tool. It is another interface into the same business platform that powers the entire customer experience.
An API key is a unique identifier that allows OpenAI to verify your identity and track your usage when you interact with its AI models.
Two main functions:
Authentication — Your API key tells OpenAI who is making the request, ensuring only authorized users can access the service.
Usage Tracking & Billing — OpenAI charges based on the amount of text (tokens) you process. The API key links your usage to your account, so OpenAI knows how much you’re using and bills you accordingly.
Step 1: Sign up at platform.openai.com
Create an account with an email, phone number, and credit card. (You need a credit card for identity verification, even if using free credits.)
Step 2: Go to the API Keys page
Once logged in, visit https://platform.openai.com/account/api-keys and click “Create a new secret key”. Give it a name like “My first API key”.
A long string of letters and numbers will appear — this is your API key.
Copy and save it in a safe place immediately. You will not be able to see it again once you close this page.
Your API key is like your credit card number. Guard it like your life depends on it.
DO: ✓ Save it in a safe place immediately — you will not see it again ✓ Keep it secure — treat it like a password ✓ Use it only for this class project
DO NOT: ✗ Share it publicly or commit it to GitHub ✗ Put it in email, Slack, or any chat ✗ Expose it in browser code or client-side JavaScript ✗ Let anyone else use it (they can run up your bill)
If compromised: Delete it immediately from the API Keys page and generate a new one. OpenAI will stop accepting requests from the old key.
Three layers, always
Messages flow: Browser → Server → OpenAI → Server → Browser
The API key stays on the server.
The flow
Participant types message
↓
Qualtrics (Browser)
↓
Backend Server (Layer 2)
+ API key attached
↓
OpenAI
+ Generates response
↓
Backend Server
+ Formats response
↓
Qualtrics (Browser)
+ Displays to participant
Layer 1: Qualtrics
Qualtrics is the experiment container.
Layer 2: Backend
The backend is the messenger and gatekeeper.
Layer 3: OpenAI
It is a black box. You send it a request, you get a response.
The alternative (bad)
Put ChatGPT’s API directly in Qualtrics.
Pros:
Cons:
Your setup (good)
Server in the middle.
Pros:
Cons:
The complexity is worth it because your experiment depends on control.
Your experiment lives in this Qualtrics template.
Let me walk through it piece by piece, in the order the system runs.
When a participant enters the survey, this is what happens:
In the template, there is an embedded data field called LUCIDBackendURL.
This field contains the address of my backend server.
When a participant enters the survey, Qualtrics checks this field.
If it is there and correct, the backend connection is live.
If it is missing or wrong, the chat will not work.
What you see
Embedded data field in Qualtrics:
LUCIDBackendURL =
https://backend.example.com/lucid
This URL is the bridge between Layer 1 and Layer 2.
Why it matters
If this URL breaks, the participant types a message and nothing happens.
It is the first place to check if something goes wrong.
The backend is working, but Qualtrics does not know where to send the message.
Right after the backend URL is checked, the randomizer fires.
The randomizer is a Qualtrics block that branches.
It randomly assigns each participant to Condition A or B.
This is exactly how your experiment gets created — not abstractly, but literally at this moment.
What happens
Participant enters survey.
Qualtrics flips a coin (not really, but essentially).
Heads → Condition A branch
Tails → Condition B branch
All the embedded data for that condition gets populated.
The rest of the template runs with those values.
After randomization, the participant sees a paragraph of text.
This is the scenario. It tells them what situation they are in.
The scenario is the same for Condition A and Condition B.
It anchors the interaction. It makes it real.
Example scenario
“You bought a DeWalt cordless drill kit from Home Depot two weeks ago. When it arrived, the drill was damaged — the chuck is loose and it does not hold bits securely. The accessories (battery and charger) are fine. You want a full refund and plan to keep the working accessories. You open the Home Depot customer support chatbot to request a return.”
This paragraph is what the participant reads. Then the chat begins.
Why this order
Scenario first, prompts second.
Participants need to understand their situation before they start interacting.
The scenario and the hidden instructions should match.
If the scenario says “I want a refund” but the bot offers repair, something is misaligned.
Alignment = realistic interaction = valid experiment.
The system prompt sets the role and goal.
It is the same for every turn (it does not change).
Example system prompt
“You are a Home Depot customer support chatbot. The customer has a damaged drill and wants a return. Your goal is to verify the order, confirm the damage, check the return window, and issue a return label if eligible. Be helpful and professional. If the issue is outside your scope, escalate to a human agent.”
This prompt tells the model: Here is your role. Here is your goal. Here is how you behave.
Why it is initial/system
This prompt does not change.
It is the foundation.
Everything the chatbot does is based on this.
The reinforcement prompt is appended to every turn.
Example reinforcement
“Respond in a professional and concise manner. Use complete sentences. Avoid informal language.”
Or for Condition B:
“Respond in a friendly, conversational tone similar to a helpful store associate.”
These are different. Everything else is the same.
Why it gets appended every turn
Language models can drift.
Without a reminder, Condition A might become casual by turn 3.
Without a reminder, Condition B might become formal by turn 4.
By appending the reinforcement every turn, you keep the tone consistent.
This is what keeps your conditions pure.
This gets appended every turn.
What the participant experiences
They type a message.
They hit send.
The chatbot replies.
It feels like a live conversation.
They do not see the backend or OpenAI. They just see the interface.
What is happening behind the scenes
Participant message → Qualtrics → Backend (Layer 2) → Reinforcement appended → OpenAI (Layer 3) → Response generated → Backend → Qualtrics → Displayed
What costs money each turn:
After the chat ends, the participant answers 10 questions.
These are the same 10 questions for every participant, every condition.
All questions use a 5-point scale: Strongly Disagree to Strongly Agree.
DO NOT TOUCH THESE QUESTIONS
The 10 items (from your proposal)
Why standardized
Every team uses these items.
That means everyone’s data is comparable.
Team A tests tone. Team B tests empathy. Team C tests competence.
But all three collect the same outcomes.
When we analyze the class data, we can see patterns.
If tone affects warmth, we will see it.
If empathy affects competence, we will see it.
We will have the AI visibility exam next week. And the guest speaker.
If something feels wrong during testing, email me or Resham before launch.