← Workshop Home

How AI Answers from Your Documents

See how Retrieval-Augmented Generation (RAG) lets LLMs answer questions using AnyCompany Support's SOPs and KB articles — anchored on your cohort's SOP Lookup Top Idea, the Day 2 build target.

📄 RAG Pipeline 🔍 Interactive 🎧 AnyCompany Support 📐 Day 1

🧠 The Problem: LLMs Only Know Their Training Data

Large language models are trained on public internet data up to a cutoff date. They don't know about AnyCompany Support's internal SOPs, the latest market-specific exemption policy, or yesterday's KB update. Ask Claude about your refund policy for a cancelled GrabFood order with partial delivery and it will guess — confidently, but incorrectly. An agent who trusts that guess can ship a wrong policy answer to a Pax in seconds.

🧠

The Knowledge Gap

LLMs can't access your SOP corpus, recent KB updates, or internal escalation procedures. They'll fill gaps with plausible-sounding but potentially wrong policy answers — exactly the failure mode that triggers DSAT and re-contacts.

📎

The Solution: Retrieve, Then Generate

Instead of retraining the model, RAG retrieves relevant SOP chunks and pastes them into the prompt. The LLM reads the context and generates a grounded answer — with citation. This is exactly what your SOP Lookup agent does.

🔢

Powered by Embeddings

Embeddings turn text into numbers (vectors) so we can measure similarity. "drunk driving" and "DUI" have similar vectors — even with different words. That's how a Pax who said "the driver smelled like beer" gets matched to the Safety SOP.

🎧

Why This Matters for GS Support

RAG is the foundation of every flagship Cyborg use case in your cohort: SOP Lookup (cited answers), KB Retrieval & Spiel Generator (Benedictus #32), SmartGuide (Preenanya #29), LiveAssist (Pornpailin #21). Same pattern, different markets.

💡
You're already doing manual RAG every shift. When an agent searches Glean, finds the relevant SOP, and pastes it into GrabGPT before asking the question — that's the RAG pattern by hand. The pipeline just automates it across the entire SOP corpus, with citation, in <2 seconds.

🔗 Connection to Day 1

Remember the Embeddings Explainer where you clicked a word and found its nearest neighbors in 3D space? That's exactly what RAG's retrieval step does — but with document chunks instead of single words.

✂️Tokenizer
🔢Embeddings
🔍RAG Retrieval
🧠Transformer
Output

RAG sits between embeddings and the transformer — it uses embedding similarity to find relevant documents before the LLM generates.

⚙️ The RAG Pipeline — 6 Steps

Every RAG system follows the same pattern. Here's what happens when an agent in SG/MY asks the SOP Lookup: "How do I handle a refund for a cancelled GrabFood order where the food was already delivered?"

User Query
🔢Embed Query
🔍Search Vectors
📄Retrieve Chunks
🧩Build Prompt
Generate Answer
StepWhat HappensSOP Lookup Example
❓ User QueryAgent asks a question in natural language"How do I handle a refund for a cancelled GrabFood order where the food was already delivered?"
🔢 Embed QueryQuestion is converted to a vector (numbers)Query becomes [0.55, 0.72, 0.13, -0.28, 0.41, ...]
🔍 Search VectorsFind SOP chunks with similar vectorsSearches across all chunked SOPs and KB articles for vectors close to the query
📄 Retrieve ChunksTop 3-5 most relevant chunks returned with scoresSOP-MIWI-04.2 §3.1 (0.94), Partial-Delivery Refund Decision Tree §7.1 (0.87), MIWI Refund Threshold Table §3 (0.82)
🧩 Build PromptAssemble: steering rules + retrieved chunks + question"Answer ONLY from the SOPs below. Cite article ID. If not covered, say 'Not in SOPs — escalate to TL.' Context: [chunk 1] [chunk 2] [chunk 3]. Question: ..."
✨ Generate AnswerLLM generates a grounded answer with citation"Process partial refund per SOP-MIWI-04.2 §3.1. Refund only undelivered items. Suggested next agent action: confirm undelivered SKUs with the Pax before processing."
🔗
The "Build Prompt" step is what you learned today. The grounding rules you practiced — "ONLY from provided documents, cite sections, admit gaps" — are exactly what goes into the assembled prompt. RAG automates the document retrieval; your prompt skills control the generation quality.

⚖️ RAG vs. The Alternatives

ApproachHow It WorksProsConsBest For
Paste Full Doc Copy entire document into prompt Simple, no setup Context window limits, expensive (200 pages = ~50K tokens/call), slow Quick one-off questions, small docs
RAG ✓ Auto-retrieve relevant chunks, inject into prompt Fresh data, cites sources, cost-efficient, no retraining Needs infrastructure (vector DB, embeddings), chunking quality matters Production Q&A over large/changing doc sets
Fine-Tuning Retrain model on your data Model "learns" your domain style Expensive, slow to update, can't cite sources, needs ML team Specialized language/style (not facts)
💬
When your tech team asks "should we fine-tune or use RAG?" — for document Q&A (policies, compliance, procedures), RAG wins almost every time. Fine-tuning is for changing how the model writes, not what it knows.

🎮 Interactive RAG Demo

Watch the RAG pipeline process a query step-by-step. Select a scenario and press play. Drag the 3D vector space to rotate.

🔍 RAG Pipeline — SOP Lookup demos

Step 0 / 6
Vector Space
Query Document Retrieved
Drag to rotate · Scroll to zoom
Document Store
Select a scenario and press ▶ to start the RAG pipeline walkthrough.

✂️ Chunking — The Hidden Quality Driver

Before documents enter the vector store, they're split into chunks — smaller pieces the system can search and retrieve. How you chunk determines whether the AI gets complete answers or broken fragments.

❌ Bad Chunking — Split at Fixed Length
Chunk 1:
...merchant risk rating.

4.2 Chargeback Thresholds

Chunk 2:
Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring...

⚠️ The section header got split from its content. Chunk 1 matches "chargeback thresholds" but has no useful answer. Chunk 2 has the answer but may score lower because it lacks the header.

✅ Good Chunking — Respect Section Boundaries
Chunk:
4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate 
shall be classified as RED and subject to 
immediate review by the Risk Team within 
48 hours. Merchants between 1.0-3.0% are 
classified AMBER with enhanced monitoring.
Below 1.0% is GREEN with standard quarterly 
review cycle.

✓ Header + content together. The retrieval system finds the right chunk and the LLM gets the complete answer with all three thresholds.

⚠️
Your input matters here. Your SOP / KB articles have tables, numbered sections, and cross-references. If those get split across chunks, the AI gets fragments instead of complete answers. You know which sections belong together — your tech team needs that knowledge to configure chunking correctly.

📏 Chunking Strategies

StrategyHow It WorksBest ForWatch Out
Fixed-sizeSplit every N characters/tokensSimple, fastBreaks mid-sentence, splits tables
Sentence-basedSplit at sentence boundariesGeneral textMay split related sentences apart
Section-based ✓Split at document headings/sectionsStructured docs (policies, manuals)Sections may be too large or too small
SemanticUse embeddings to find natural topic breaksUnstructured textMore complex, slower
OverlapChunks share N tokens at boundariesReducing context loss at edgesIncreases storage, may retrieve duplicates
🏦
For AnyCompany Support SOPs: Section-based chunking with overlap is the sweet spot. Your SOPs and KB articles have clear section numbers — chunk at those boundaries, with 2-3 sentence overlap to preserve cross-references.

🛠️ RAG for Business Users — No Infrastructure Needed

The pipeline demo shows how enterprise RAG systems work at scale. But you don't need a vector database to do RAG today. When you use Kiro or any AI assistant, the "RAG" you'll actually do is simpler — and just as powerful for your daily work.

🚫

What You DON'T Need

No vector database. No embeddings pipeline. No chunking configuration. No infrastructure. No tech team involvement.

What You DO

Convert your documents to clean Markdown. Drop them in your workspace. Write a grounding prompt. The AI reads the full file.

📋 The 3-Step Workflow

1

Convert PDF → Markdown

PDFs are terrible for AI. Headers become random text, tables lose structure, columns merge, page numbers inject mid-sentence. Markdown preserves the hierarchy so the AI can navigate your document.

💡
In Kiro: Drag a PDF into chat and ask: "Convert this PDF to clean Markdown. Preserve all headings, tables, and section numbers." Kiro will produce a structured .md file you can save to your workspace.
2

Add Files to Your Workspace

Place the .md files in your project folder. In Kiro, you can reference them with #File in chat, or the AI can read them directly from your workspace when you ask questions.

your-cowork-project/
├── sops/
│ ├── miwi-refund-sop.md
│ ├── irt-safety-severity.md
│ ├── kh-mm-multilang-handling.md
│ └── pac-tagging-decision-tree.md
├── .kiro/steering/
│ └── grounding-rules.md
└── ...
3

Ask with Grounding Rules

Reference the file and add grounding constraints. The AI reads the entire document into its context window — no chunking, no retrieval step. It's all in memory.

PROMPT:
Read #miwi-refund-sop.md

Answer ONLY from this SOP.
Cite [SOP-ID §X.X] after each claim.
If not in SOP, say "Not in SOPs — escalate to TL."

Question: How do I handle a partial-delivery
refund on a cancelled GrabFood order?

Why PDF → Markdown Makes a Huge Difference

❌ Raw PDF (what the AI sees)
4.2 Chargeback Thresholds
Merchants exceeding 3.0% chargeback
rate shall be classified as RED and
subject to immediate review by the
Risk Team within 48 hours. Merchants
Page 23 of 156
between 1.0-3.0% are classified
AMBER with enhanced monitoring.
Table 4.1: Threshold Summary
GREEN AMBER RED
≤1.0% 1.0-3.0% >3.0%
Quarterly Enhanced Immediate

⚠️ Page number injected mid-paragraph. Table structure lost. AI may misread thresholds.

✅ Clean Markdown (what the AI sees)
## 4.2 Chargeback Thresholds

Merchants exceeding 3.0% chargeback rate
shall be classified as RED and subject to
immediate review by the Risk Team within
48 hours. Merchants between 1.0-3.0% are
classified AMBER with enhanced monitoring.

| Rating | Threshold | Review Cycle |
|--------|-----------|-------------|
| GREEN  | ≤1.0%     | Quarterly   |
| AMBER  | 1.0-3.0%  | Enhanced    |
| RED    | >3.0%     | Immediate   |

✓ Clean heading. Table preserved. No page artifacts. AI reads it perfectly.

⚠️
The #1 optimization you can do today: Convert your most-used policy documents from PDF to Markdown. One hour of conversion saves hundreds of hours of better AI answers. Focus on documents with tables, numbered sections, and cross-references — those break the worst in PDF.

📝 Why Markdown? The AI's Preferred Language

You've noticed: steering files, SKILL.md, prompt templates, RAG documents — everything in this workshop is .md. That's not a coincidence. Markdown is the format AI understands best.

🏗️

Structure Without Overhead

## headings, | tables, - lists give the AI a document hierarchy to navigate — with zero parsing complexity (unlike HTML, XML, or JSON).

🪙

Token-Efficient

## 4.2 Chargeback Thresholds = ~8 tokens. The HTML equivalent = ~20 tokens. When your context window is limited, every token counts.

👥

Three Audiences, One Format

Your compliance officer can read it. The AI can parse it. Your tech team can version it in git. No other format serves all three.

🧠

LLMs Were Trained On It

GitHub, Stack Overflow, documentation sites — the training data is saturated with Markdown. Models understand its conventions natively.

FormatHuman ReadableAI ParseableToken CostVersionableVerdict
PDF✅ Great❌ TerribleN/A (binary)❌ NoConvert away from
Word (.docx)✅ Good⚠️ Needs extractionN/A (binary)❌ NoOK for drafting
HTML⚠️ With browser✅ Good🔴 High (tags)✅ YesToo verbose
JSON❌ Hard✅ Great🟡 Medium✅ YesFor data, not docs
Markdown ✓✅ Great✅ Great🟢 Low✅ YesBest for AI docs
🔗
The pattern across both days:
Day 1: You learn that tokens cost money → Markdown is token-efficient. You learn grounding and RAG → Markdown preserves document structure for accurate answers
Day 2: You create steering files, skills, and agent configs → all Markdown because it's the format AI tools read natively

Markdown isn't just a file format — it's the interface layer between you and AI.

🔄 How This Connects to the Full Pipeline

Your 3-step workflow and the enterprise RAG pipeline solve the same problem — they just operate at different scales:

StepYour Workflow (Kiro)Enterprise Pipeline (Bedrock KB)
Prepare docs You convert PDF → Markdown manually Automated ingestion + chunking
Store docs Files in your workspace folder Vector database (embeddings)
Find relevant info You reference the right file with #File Similarity search retrieves top chunks
Ground the answer Grounding rules in your prompt Same grounding rules, automated
Scale 1-5 documents at a time (context window limit) Thousands of documents, auto-retrieved
🎯
The key insight: Your prompt skills (grounding rules, citation requirements, gap admission) are the same whether you're doing manual RAG in Kiro or your tech team builds a full pipeline. The quality of the answer depends on the quality of your prompt — not the infrastructure.

🎯 Three Levels of RAG

RAG isn't all-or-nothing. You're already at Level 1. Today you learn Level 2. Your tech team builds Level 3.

1
Manual RAG
Copy-paste a policy section into your AI assistant, ask a question. You select the document, you paste the context.
👤 You do this today
2
Prompt-Level RAG
Write grounding rules: "ONLY from provided documents, cite sections, admit gaps." The prompt controls quality.
📝 Today's skill (Day 1)
3
System-Level RAG
Bedrock Knowledge Base auto-retrieves from your document library. Chunking, embedding, and search happen automatically.
🔧 Tech team builds this
🔗
Day 3 connection: On Day 3, when we cover MCP (Model Context Protocol), you'll see how Kiro connects to databases and document stores. MCP is the plumbing that makes Level 3 RAG possible — the AI queries your systems directly instead of you pasting documents.

🏦 RAG Use Cases at AnyCompany

Use CaseDocumentsWho BenefitsRAG Level
SOP Lookup ⭐SOPs, KB articles, decision trees, market-specific exemptionsAll agents (every market)Level 2-3 — your Day 2 anchor
Case Context SummarizerD365 case data, booking details, Pax / Dax historyIRT TLs, MIWI agentsLevel 2 — your Day 1 exercise
KB Retrieval & Spiel GeneratorInternal KB, ARC handover transcripts, response templatesLive-chat agents (Benedictus #32)Level 2-3
Helpcenter Content AuditorHelp-Centre articles, KB content, past update logsKB / Content team (Mikko #4)Level 3
Macro Review & QCMacro templates, 4H brand voice, regional QA guidelinesTQA / TQM (Mikko #5 + Project AIONIC)Level 2-3
New Agent Onboarding (LiveChat Sim)SOPs, scenario playbooks, C5–C8 audit criteriaTraining (Sanidwong #45 — already shipped)Level 2 — your Day 2 opening proof

Common Questions

Can we use RAG with our actual SOP corpus?

Yes — Bedrock Knowledge Bases supports PDF, Word, HTML. Your tech team uploads the documents, configures chunking, and exposes it as an API. You define which documents to include and review the output quality.

How is this different from just searching SharePoint?

SharePoint / Glean keyword search matches keywords. RAG matches meaning. "Drunk driving handling" would find SOPs about "DUI" and "impairment reports" even if the agent didn't use those exact terms. Plus, RAG doesn't just find the document — it reads it and generates a cited answer with the article ID.

What about data security?

With Bedrock Knowledge Bases, documents stay in your AWS account. Embeddings are stored in your own vector database. Nothing leaves your environment. This is why AWS-hosted RAG is preferred over public tools for regulated industries.

How accurate is it? Can we trust it for compliance?

RAG dramatically reduces hallucination but doesn't eliminate it. That's why the grounding prompt rules are critical: cite sources, admit gaps, no outside knowledge. For compliance, always use RAG + human review (Level 2 autonomy from Day 3). The AI drafts, the human verifies.