1. The Four Chatbot Architectures

Chatbot design is a spectrum. Each step up adds capability and complexity. Choosing the wrong tier — building an agentic system for a two-question FAQ flow — is as damaging as building a scripted bot for a task that needs reasoning.

CHATBOT ARCHITECTURE TIERS — CAPABILITY vs COMPLEXITYTier 1Scripted / Rule-BasedExamplesButton menusFAQ answer listsDecision treesBest forFixed-scope supportHigh-complianceZero hallucination riskTier 2LLM Prompt-BasedExamplesSystem-prompt botPersona chatbotCustomer support v1Best forOpen-ended Q&AGeneral assistanceLow-stakes chatTier 3RAG-PoweredExamplesDocs assistantKB search botPolicy advisorBest forGrounded answersPrivate knowledgeCitation-requiredTier 4Agentic / Tool-UseExamplesMCP-connected agentBooking assistantDev workflow agentBest forMulti-system tasksReal-world actionsComplex automation

2. Tier 1: Scripted Chatbots

Scripted chatbots navigate users through a pre-defined decision tree. Every response is authored by a human. There is no model inference — only pattern matching or button selection. They are not "dumb" — they are deliberately bounded, which is a feature for compliance-sensitive environments.

When to use scripted bots

  • Banking FAQs where every answer must be compliance-approved
  • Returns/refund flows with a fixed set of outcomes
  • Appointment booking with no free-text needed
  • Emergency response systems (the bot must not improvise)

Architecture pattern

User selects button / sends keyword
      ↓
Intent Matcher (regex or keyword list)
      ↓
Decision Tree Node lookup
      ↓
Template Response renderer
      ↓
Reply with structured message (text + button options)

Common platforms: Dialogflow ES (pre-LLM mode), Botpress, WhatsApp Business API with button replies, Freshdesk Freddy.

3. Tier 2: LLM Prompt-Based Chatbots

Add a large language model and you get natural language understanding for free. The simplest form is a system prompt that constrains the model's persona, scope, and tone — the model handles the rest.

System prompt anatomy

You are "Aria", a customer support assistant for WidgetCo.

SCOPE: Answer questions about WidgetCo products, orders, and returns.
TONE: Friendly, concise, professional.
LIMITS: Do not discuss competitors, do not offer refunds
        without a valid order ID. Do not provide legal advice.
ESCALATE: If user asks about billing disputes, say:
          "For billing issues, I'll connect you with our billing team."

---

[conversation history appended here]
User: [user message]
Aria:
TIER 2 — LLM PROMPT-BASED CHATBOT FLOWUsersends messageContext Buildersystem prompt +LLM APIClaude / GPT-4o / Geminigenerates responseResponse Filtersafety check → sendconversation history appended to context each turnhistory appended
Context window management

Each LLM API call includes the full conversation history. At roughly 4 characters per token, a 100-message support chat can cost 20,000+ tokens per reply — and eventually exceed the model's context window. Implement a sliding window (keep last N messages) or summarise older turns into a compressed memory block to prevent this.

4. Tier 3: RAG-Powered Chatbots

A pure LLM chatbot only knows what was in its training data. For questions about your specific product documentation, internal policies, or knowledge base, you need Retrieval-Augmented Generation (RAG).

RAG adds a retrieval step before the LLM call: embed the user's question, search a vector database for relevant document chunks, and inject those chunks into the LLM prompt as context. The model answers based on your documents, not its training data, and can cite sources.

TIER 3 — RAG CHATBOT ARCHITECTUREOFFLINE INGESTION PIPELINESource DocumentsPDF, MD, HTML, DBChunker512-token overlappingEmbedding Modeltext-embedding-3-smallVector DatabaseQdrant · Pinecone · pgvectorONLINE QUERY PIPELINEUser Query"How do I reset…?"Query Embeddersame model as ingestionVector Similaritytop-k cosine searchLLM + Chunksgrounded answerAnswer + Citations"According to [doc p.4]…"retrieve top-kIngestion runs offline (batch or on document change) · Query pipeline runs in real-time (<200ms for retrieval)LLM instructed: "Answer only from the provided context. If not found, say so."

RAG chatbot real-world example: internal IT helpdesk

SOURCES INGESTED:
  - IT policies PDF (chunked, embedded → Qdrant)
  - 500 past support ticket resolutions (embedded)
  - System runbooks (Confluence → webhook → re-embedded on update)

SYSTEM PROMPT:
  "You are the IT helpdesk assistant. Answer only from the provided
   context. Cite the source document name for each answer.
   If the answer is not in the context, say 'I don't have that
   information — please raise a ticket at helpdesk.corp.com'."

SAMPLE EXCHANGE:
  User: "How do I request access to the analytics dashboard?"
  Retriever: [top 3 chunks from IT-Access-Policy.pdf, section 4.2]
  LLM: "To request analytics dashboard access, submit form IT-REQ-44
        via the self-service portal. Your manager must approve within
        2 business days. [Source: IT-Access-Policy.pdf, §4.2]"

5. Tier 4: Agentic Chatbots

Agentic chatbots go beyond answering questions — they take actions. The LLM can call tools (functions, APIs, MCP servers) and iterate based on results. The conversation loop includes a reasoning step where the model plans which tools to use, executes them, and synthesises the results.

TIER 4 — AGENTIC (ReAct LOOP)AGENT LOOP (repeats until task complete or max iterations)1. ReasonWhat do I need to do next?2. ActCall tool / API / MCP server3. ObserveRead tool result, plan next steplooptask doneFinal AnswerSent to userUser Request"Book a meeting…"Available tools: calendar API · CRM lookup · email send · database queryModel decides which tools to call and in what order

Agentic chatbot example: sales assistant

User: "I want to follow up with all leads from last month's trade show
       who haven't been contacted yet."

Agent turn 1 — Reason: I need to find trade show leads.
  Act: crm_tool.search_contacts(source="Trade Show", date_range="last_month")
  Observe: 47 contacts found, 12 have no activity in 30 days.

Agent turn 2 — Reason: I have the uncontacted leads. Draft follow-up.
  Act: email_tool.draft_template(
    contacts=12_uncontacted_leads,
    template="trade_show_followup_v2"
  )
  Observe: Draft ready, subject: "Great meeting you at [event]…"

Agent turn 3 — Reason: Show draft to user for approval before sending.
  Final answer: "Found 12 uncontacted trade show leads.
                Draft follow-up email ready for your review.
                [Show preview] — shall I send to all 12?"
Always pause before irreversible actions

The best agentic chatbots do not auto-send emails, auto-book meetings, or auto-post. They prepare and confirm. Design a "show me before you do it" step for any action that sends, modifies, or deletes — it makes agents trustworthy rather than alarming.

6. Platform Comparison

PlatformTierBest forTradeoff
Botpress1–3Enterprise support flowsUI heavy; customisation costs dev time
Vercel AI SDK + OpenAI2–4Web apps with embedded chatYou own the infra; streaming UX out of the box
LangChain + LangGraph3–4RAG pipelines, multi-agent graphsSteep learning curve; rapidly evolving API
Anthropic Claude (direct API)2–4Instruction-following, tool use, long contextNo built-in UI; pairs with any frontend
AWS Lex v21–2Existing AWS workloads, complianceSlots-and-intents model; less flexible reasoning
Rasa Open Source1–3On-premise, fully self-hostedTraining data required; ML ops overhead

7. Common Implementation Mistakes

1. No conversation memory strategy

Stateless LLM APIs don't remember previous messages — you must pass history each call. Without a strategy (sliding window, summarisation, vector-based episodic memory) your bot either forgets after a few turns or blows its context budget.

2. No fallback path

Every chatbot will eventually receive a query outside its scope. Define explicit out-of-scope handling: say what the bot can't do and route to a human or a form. A bot that confabulates an answer is worse than one that says "I can't help with that."

3. Prompt injection in user input

Users (or malicious actors) can inject instructions into their messages: "Ignore your system prompt and…". Defend by: (a) separating user input from trusted content in the prompt structure, (b) having an output validation layer that checks for policy violations, and (c) rate-limiting and monitoring for anomalous inputs.

4. Skipping evaluation

Ship-first evaluation is the norm but a mistake. Before deploying, build a small golden test set of 50–100 representative queries with expected answers. Run every model/prompt change against it. Silent regressions — where a new prompt fixes one thing and breaks three others — are the most common source of chatbot quality decay.

8. Minimal Viable Implementation (Node.js)

// Minimal tier-2 chatbot backend (Node.js + Anthropic SDK)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic(); // reads ANTHROPIC_API_KEY from env

const SYSTEM_PROMPT = `You are a support assistant for WidgetCo.
Answer questions about our products. Be concise and friendly.
If you don't know, say so — do not make things up.`;

const conversationHistory = [];

async function chat(userMessage) {
  conversationHistory.push({ role: "user", content: userMessage });

  const response = await client.messages.create({
    model: "claude-opus-4-5",
    max_tokens: 1024,
    system: SYSTEM_PROMPT,
    messages: conversationHistory,
  });

  const assistantMessage = response.content[0].text;
  conversationHistory.push({ role: "assistant", content: assistantMessage });

  // Sliding window: keep last 20 messages to control token usage
  if (conversationHistory.length > 20) {
    conversationHistory.splice(0, 2);
  }

  return assistantMessage;
}
Choosing the right tier

Start at Tier 2 unless you have a specific knowledge retrieval requirement (Tier 3) or need real-world actions (Tier 4). Most chatbot projects over-engineer for the first version. A well-prompted Tier 2 bot with good system instructions and explicit fallback handling will outperform a poorly-designed Tier 4 agent in production reliability.

JSON Validator

Building a chatbot API? Validate your message payload schemas and tool definitions before sending them to the LLM API — malformed tool JSON is a common silent failure.

Open JSON Validator