RAG Pipeline Explainer — AnyCompany Support Workshop

🧠 The Problem: LLMs Only Know Their Training Data

Large language models are trained on public internet data up to a cutoff date. They don't know about AnyCompany Support's internal SOPs, the latest market-specific exemption policy, or yesterday's KB update. Ask Claude about your refund policy for a cancelled GrabFood order with partial delivery and it will guess — confidently, but incorrectly. An agent who trusts that guess can ship a wrong policy answer to a Pax in seconds.

🧠

The Knowledge Gap

LLMs can't access your SOP corpus, recent KB updates, or internal escalation procedures. They'll fill gaps with plausible-sounding but potentially wrong policy answers — exactly the failure mode that triggers DSAT and re-contacts.

📎

The Solution: Retrieve, Then Generate

Instead of retraining the model, RAG retrieves relevant SOP chunks and pastes them into the prompt. The LLM reads the context and generates a grounded answer — with citation. This is exactly what your SOP Lookup agent does.

🔢

Powered by Embeddings

Embeddings turn text into numbers (vectors) so we can measure similarity. "drunk driving" and "DUI" have similar vectors — even with different words. That's how a Pax who said "the driver smelled like beer" gets matched to the Safety SOP.

🎧

Why This Matters for GS Support

RAG is the foundation of every flagship Cyborg use case in your cohort: SOP Lookup (cited answers), KB Retrieval & Spiel Generator (Benedictus #32), SmartGuide (Preenanya #29), LiveAssist (Pornpailin #21). Same pattern, different markets.

💡

You're already doing manual RAG every shift. When an agent searches Glean, finds the relevant SOP, and pastes it into GrabGPT before asking the question — that's the RAG pattern by hand. The pipeline just automates it across the entire SOP corpus, with citation, in <2 seconds.

🔗 Connection to Day 1

Remember the Embeddings Explainer where you clicked a word and found its nearest neighbors in 3D space? That's exactly what RAG's retrieval step does — but with document chunks instead of single words.

✂️Tokenizer

→

🔢Embeddings

→

🔍RAG Retrieval

→

🧠Transformer

→

✨Output

RAG sits between embeddings and the transformer — it uses embedding similarity to find relevant documents before the LLM generates.

⚙️ The RAG Pipeline — 6 Steps

Every RAG system follows the same pattern. Here's what happens when an agent in SG/MY asks the SOP Lookup: "How do I handle a refund for a cancelled GrabFood order where the food was already delivered?"

❓User Query

→

🔢Embed Query

→

🔍Search Vectors

→

📄Retrieve Chunks

→

🧩Build Prompt

→

✨Generate Answer

Step	What Happens	SOP Lookup Example
❓ User Query	Agent asks a question in natural language	"How do I handle a refund for a cancelled GrabFood order where the food was already delivered?"
🔢 Embed Query	Question is converted to a vector (numbers)	Query becomes `[0.55, 0.72, 0.13, -0.28, 0.41, ...]`
🔍 Search Vectors	Find SOP chunks with similar vectors	Searches across all chunked SOPs and KB articles for vectors close to the query
📄 Retrieve Chunks	Top 3-5 most relevant chunks returned with scores	SOP-MIWI-04.2 §3.1 (0.94), Partial-Delivery Refund Decision Tree §7.1 (0.87), MIWI Refund Threshold Table §3 (0.82)
🧩 Build Prompt	Assemble: steering rules + retrieved chunks + question	"Answer ONLY from the SOPs below. Cite article ID. If not covered, say 'Not in SOPs — escalate to TL.' Context: [chunk 1] [chunk 2] [chunk 3]. Question: ..."
✨ Generate Answer	LLM generates a grounded answer with citation	"Process partial refund per SOP-MIWI-04.2 §3.1. Refund only undelivered items. Suggested next agent action: confirm undelivered SKUs with the Pax before processing."

🔗

The "Build Prompt" step is what you learned today. The grounding rules you practiced — "ONLY from provided documents, cite sections, admit gaps" — are exactly what goes into the assembled prompt. RAG automates the document retrieval; your prompt skills control the generation quality.

⚖️ RAG vs. The Alternatives

Approach	How It Works	Pros	Cons	Best For
Paste Full Doc	Copy entire document into prompt	Simple, no setup	Context window limits, expensive (200 pages = ~50K tokens/call), slow	Quick one-off questions, small docs
RAG ✓	Auto-retrieve relevant chunks, inject into prompt	Fresh data, cites sources, cost-efficient, no retraining	Needs infrastructure (vector DB, embeddings), chunking quality matters	Production Q&A over large/changing doc sets
Fine-Tuning	Retrain model on your data	Model "learns" your domain style	Expensive, slow to update, can't cite sources, needs ML team	Specialized language/style (not facts)

💬

When your tech team asks "should we fine-tune or use RAG?" — for document Q&A (policies, compliance, procedures), RAG wins almost every time. Fine-tuning is for changing how the model writes, not what it knows.

🎮 Interactive RAG Demo

Watch the RAG pipeline process a query step-by-step. Select a scenario and press play. Drag the 3D vector space to rotate.

🔍 RAG Pipeline — SOP Lookup demos

Step 0 / 6

Vector Space

Query Document Retrieved

Drag to rotate · Scroll to zoom

Select a scenario and press ▶ to start the RAG pipeline walkthrough.

✂️ Chunking — The Hidden Quality Driver

Before documents enter the vector store, they're split into chunks — smaller pieces the system can search and retrieve. How you chunk determines whether the AI gets complete answers or broken fragments.

❌ Bad Chunking — Split at Fixed Length

Chunk 1:
...merchant risk rating.

4.2 Chargeback Thresholds

Chunk 2:
Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring...

⚠️ The section header got split from its content. Chunk 1 matches "chargeback thresholds" but has no useful answer. Chunk 2 has the answer but may score lower because it lacks the header.

✅ Good Chunking — Respect Section Boundaries

Chunk:
4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring.
Below 1.0% is GREEN with standard quarterly 
review cycle.

✓ Header + content together. The retrieval system finds the right chunk and the LLM gets the complete answer with all three thresholds.

⚠️

Your input matters here. Your SOP / KB articles have tables, numbered sections, and cross-references. If those get split across chunks, the AI gets fragments instead of complete answers. You know which sections belong together — your tech team needs that knowledge to configure chunking correctly.

📏 Chunking Strategies

Strategy	How It Works	Best For	Watch Out
Fixed-size	Split every N characters/tokens	Simple, fast	Breaks mid-sentence, splits tables
Sentence-based	Split at sentence boundaries	General text	May split related sentences apart
Section-based ✓	Split at document headings/sections	Structured docs (policies, manuals)	Sections may be too large or too small
Semantic	Use embeddings to find natural topic breaks	Unstructured text	More complex, slower
Overlap	Chunks share N tokens at boundaries	Reducing context loss at edges	Increases storage, may retrieve duplicates

🏦

For AnyCompany Support SOPs: Section-based chunking with overlap is the sweet spot. Your SOPs and KB articles have clear section numbers — chunk at those boundaries, with 2-3 sentence overlap to preserve cross-references.

🛠️ RAG for Business Users — No Infrastructure Needed

The pipeline demo shows how enterprise RAG systems work at scale. But you don't need a vector database to do RAG today. When you use Kiro or any AI assistant, the "RAG" you'll actually do is simpler — and just as powerful for your daily work.

🚫

What You DON'T Need

No vector database. No embeddings pipeline. No chunking configuration. No infrastructure. No tech team involvement.

✅

What You DO

Convert your documents to clean Markdown. Drop them in your workspace. Write a grounding prompt. The AI reads the full file.

📋 The 3-Step Workflow

1

Convert PDF → Markdown

PDFs are terrible for AI. Headers become random text, tables lose structure, columns merge, page numbers inject mid-sentence. Markdown preserves the hierarchy so the AI can navigate your document.

💡

In Kiro: Drag a PDF into chat and ask: "Convert this PDF to clean Markdown. Preserve all headings, tables, and section numbers." Kiro will produce a structured .md file you can save to your workspace.

2

Add Files to Your Workspace

Place the .md files in your project folder. In Kiro, you can reference them with #File in chat, or the AI can read them directly from your workspace when you ask questions.

        your-cowork-project/

        ├── sops/

        │   ├── miwi-refund-sop.md

        │   ├── irt-safety-severity.md

        │   ├── kh-mm-multilang-handling.md

        │   └── pac-tagging-decision-tree.md

        ├── .kiro/steering/

        │   └── grounding-rules.md

        └── ...

3

Ask with Grounding Rules

Reference the file and add grounding constraints. The AI reads the entire document into its context window — no chunking, no retrieval step. It's all in memory.

        PROMPT:

        Read #miwi-refund-sop.md

        Answer ONLY from this SOP.

        Cite [SOP-ID §X.X] after each claim.

        If not in SOP, say "Not in SOPs — escalate to TL."

        Question: How do I handle a partial-delivery

        refund on a cancelled GrabFood order?

⚡ Why PDF → Markdown Makes a Huge Difference

❌ Raw PDF (what the AI sees)

4.2 Chargeback Thresholds
Merchants exceeding 3.0% chargeback
rate shall be classified as RED and
subject to immediate review by the
Risk Team within 48 hours. Merchants
Page 23 of 156
between 1.0-3.0% are classified
AMBER with enhanced monitoring.
Table 4.1: Threshold Summary
GREEN AMBER RED
≤1.0% 1.0-3.0% >3.0%
Quarterly Enhanced Immediate

⚠️ Page number injected mid-paragraph. Table structure lost. AI may misread thresholds.

✅ Clean Markdown (what the AI sees)

## 4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate
shall be classified as RED and subject to
immediate review by the Risk Team within
48 hours. Merchants between 1.0-3.0% are
classified AMBER with enhanced monitoring.

| Rating | Threshold | Review Cycle |
|--------|-----------|-------------|
| GREEN  | ≤1.0%     | Quarterly   |
| AMBER  | 1.0-3.0%  | Enhanced    |
| RED    | >3.0%     | Immediate   |

✓ Clean heading. Table preserved. No page artifacts. AI reads it perfectly.

⚠️

The #1 optimization you can do today: Convert your most-used policy documents from PDF to Markdown. One hour of conversion saves hundreds of hours of better AI answers. Focus on documents with tables, numbered sections, and cross-references — those break the worst in PDF.

📝 Why Markdown? The AI's Preferred Language

You've noticed: steering files, SKILL.md, prompt templates, RAG documents — everything in this workshop is .md. That's not a coincidence. Markdown is the format AI understands best.

🏗️

Structure Without Overhead

## headings, | tables, - lists give the AI a document hierarchy to navigate — with zero parsing complexity (unlike HTML, XML, or JSON).

🪙

Token-Efficient

## 4.2 Chargeback Thresholds = ~8 tokens. The HTML equivalent = ~20 tokens. When your context window is limited, every token counts.

👥

Three Audiences, One Format

Your compliance officer can read it. The AI can parse it. Your tech team can version it in git. No other format serves all three.

🧠

LLMs Were Trained On It

GitHub, Stack Overflow, documentation sites — the training data is saturated with Markdown. Models understand its conventions natively.

Format	Human Readable	AI Parseable	Token Cost	Versionable	Verdict
PDF	✅ Great	❌ Terrible	N/A (binary)	❌ No	Convert away from
Word (.docx)	✅ Good	⚠️ Needs extraction	N/A (binary)	❌ No	OK for drafting
HTML	⚠️ With browser	✅ Good	🔴 High (tags)	✅ Yes	Too verbose
JSON	❌ Hard	✅ Great	🟡 Medium	✅ Yes	For data, not docs
Markdown ✓	✅ Great	✅ Great	🟢 Low	✅ Yes	Best for AI docs

🔗

The pattern across both days:
Day 1: You learn that tokens cost money → Markdown is token-efficient. You learn grounding and RAG → Markdown preserves document structure for accurate answers
Day 2: You create steering files, skills, and agent configs → all Markdown because it's the format AI tools read natively

Markdown isn't just a file format — it's the interface layer between you and AI.

🔄 How This Connects to the Full Pipeline

Your 3-step workflow and the enterprise RAG pipeline solve the same problem — they just operate at different scales:

Step	Your Workflow (Kiro)	Enterprise Pipeline (Bedrock KB)
Prepare docs	You convert PDF → Markdown manually	Automated ingestion + chunking
Store docs	Files in your workspace folder	Vector database (embeddings)
Find relevant info	You reference the right file with `#File`	Similarity search retrieves top chunks
Ground the answer	Grounding rules in your prompt	Same grounding rules, automated
Scale	1-5 documents at a time (context window limit)	Thousands of documents, auto-retrieved

🎯

The key insight: Your prompt skills (grounding rules, citation requirements, gap admission) are the same whether you're doing manual RAG in Kiro or your tech team builds a full pipeline. The quality of the answer depends on the quality of your prompt — not the infrastructure.

🎯 Three Levels of RAG

RAG isn't all-or-nothing. You're already at Level 1. Today you learn Level 2. Your tech team builds Level 3.

1

Manual RAG

Copy-paste a policy section into your AI assistant, ask a question. You select the document, you paste the context.

👤 You do this today

2

Prompt-Level RAG

Write grounding rules: "ONLY from provided documents, cite sections, admit gaps." The prompt controls quality.

📝 Today's skill (Day 1)

3

System-Level RAG

Bedrock Knowledge Base auto-retrieves from your document library. Chunking, embedding, and search happen automatically.

🔧 Tech team builds this

🔗

Day 3 connection: On Day 3, when we cover MCP (Model Context Protocol), you'll see how Kiro connects to databases and document stores. MCP is the plumbing that makes Level 3 RAG possible — the AI queries your systems directly instead of you pasting documents.

🏦 RAG Use Cases at AnyCompany

Use Case	Documents	Who Benefits	RAG Level
SOP Lookup ⭐	SOPs, KB articles, decision trees, market-specific exemptions	All agents (every market)	Level 2-3 — your Day 2 anchor
Case Context Summarizer	D365 case data, booking details, Pax / Dax history	IRT TLs, MIWI agents	Level 2 — your Day 1 exercise
KB Retrieval & Spiel Generator	Internal KB, ARC handover transcripts, response templates	Live-chat agents (Benedictus #32)	Level 2-3
Helpcenter Content Auditor	Help-Centre articles, KB content, past update logs	KB / Content team (Mikko #4)	Level 3
Macro Review & QC	Macro templates, 4H brand voice, regional QA guidelines	TQA / TQM (Mikko #5 + Project AIONIC)	Level 2-3
New Agent Onboarding (LiveChat Sim)	SOPs, scenario playbooks, C5–C8 audit criteria	Training (Sanidwong #45 — already shipped)	Level 2 — your Day 2 opening proof

❓ Common Questions

Can we use RAG with our actual SOP corpus?

Yes — Bedrock Knowledge Bases supports PDF, Word, HTML. Your tech team uploads the documents, configures chunking, and exposes it as an API. You define which documents to include and review the output quality.

How is this different from just searching SharePoint?

SharePoint / Glean keyword search matches keywords. RAG matches meaning. "Drunk driving handling" would find SOPs about "DUI" and "impairment reports" even if the agent didn't use those exact terms. Plus, RAG doesn't just find the document — it reads it and generates a cited answer with the article ID.

What about data security?

With Bedrock Knowledge Bases, documents stay in your AWS account. Embeddings are stored in your own vector database. Nothing leaves your environment. This is why AWS-hosted RAG is preferred over public tools for regulated industries.

How accurate is it? Can we trust it for compliance?

RAG dramatically reduces hallucination but doesn't eliminate it. That's why the grounding prompt rules are critical: cite sources, admit gaps, no outside knowledge. For compliance, always use RAG + human review (Level 2 autonomy from Day 3). The AI drafts, the human verifies.