โ† Workshop Home

The 4 Pillars of Prompt Engineering

The techniques that turn vague AI outputs into production-grade, QA-defensible results โ€” with interactive before/after comparisons and a live prompt builder anchored on your Day 1 exercise: Case Context Summarizer.

๐Ÿ“– Day 1 Reference โšก Interactive ๐ŸŽง GS Support Examples

๐ŸŽฏ The 4 Pillars of Effective Prompts

80% of prompt quality comes from 4 fundamentals. Master these and every prompt you write will be dramatically better.

๐ŸŽฏ

1. Clarity

Say exactly what you mean. If a colleague would ask "what do you mean?" โ€” your prompt needs work.

๐Ÿ“š

2. Context

Give the AI the background it needs: domain, data, situation, constraints. Without context, it guesses.

๐Ÿ‘ค

3. Role Assignment

Tell the AI who to be. A "senior IRT TL briefing leadership" focuses on different signals than a "Live-chat agent drafting a Pax response."

๐Ÿ“‹

4. Output Framing

Define what "done" looks like โ€” format, length, structure, style. No framing = unpredictable output.

๐Ÿ“š 4 Types of Context

Context is the most impactful pillar for GS workflows. Skip any type and the output suffers in a specific way:

TypeWhat it tells the AIIf you skip it...GS example
DomainIndustry, market, business areaGeneric, non-specific answers"In the context of GS Support for Southeast Asian ride-hailing..."
DataSpecific case data, transcripts, historyAI hallucinates plausible booking IDs / SOP steps"Here is the D365 case + last 5 chat turns: [data]"
SituationalWhy you need this now โ€” the triggerWrong tone, urgency, severity framing"True Safety case escalation" vs "Routine MIWI write-up"
ConstraintsRules, limits, requirementsIgnores your standards"Default SGD; redact PAX/DAX names; cite SOP article ID"

๐Ÿ‘€ See the Difference

โŒ Without 4 Pillars

Prompt: "Summarise this case"

The Pax seems to have an issue with their ride. They mentioned some concerns about the driver. The case looks like it might need follow-up. Overall this appears to be a customer service matter to resolve.

โš ๏ธ Vague, no severity, no booking, no SOP citation, not stakeholder-ready, would fail QA audit

โœ… With 4 Pillars

Prompt: Role + Context + Structure + Constraints

RISK RATING: ๐ŸŸก AMBER
Transaction Analysis: Volume surged 271% (4,200โ†’15,600). Avg txn flat at $10 suggests organic growth.
Chargeback Alert: Rate 0.3%โ†’4.1%, exceeding 1.0% benchmark by 4x.
Action: 1) Issue warning letter (Ops, 5 days) 2) Reduce PayLater limit (Risk, immediate)

โœ“ Structured, data-cited, specific actions with owners and timelines

๐Ÿ’ก
Same data, dramatically different quality. The difference is entirely in the prompt โ€” not the AI model.

๐Ÿง  Chain-of-Thought Reasoning

Financial decisions require multi-step logic. CoT makes reasoning visible and auditable โ€” the AI shows its work.

๐Ÿ’ฌ

Zero-Shot CoT

Just add "Think step by step." No examples needed. Best for quick calculations and simple logic.

๐Ÿ“

Few-Shot CoT

Provide one example with reasoning. The AI follows the exact same pattern. Best for consistent processes.

๐Ÿ”ญ

Step-Back

"First identify key factors, then analyze." Forces prioritization before writing. Best for complex analysis.

๐Ÿ—ณ๏ธ

Self-Consistency

"Solve 3 ways, report majority." Multiple approaches catch what a single analysis misses. Best for high-stakes.

๐Ÿ‘€ Zero-Shot CoT: Before & After

Question: "Is this IRT case true Safety P1 or downgradable?"

โŒ Without CoT

Yes, this looks like a true Safety case. The Pax raised a concern that warrants IRT attention.

โš ๏ธ No reasoning. No SOP citation. No severity criteria visible. Not QA-defensible. Could be wrong (= 30-min SLA loss + DSAT).

โœ… With "Think step by step"

Pax message: "the driver smelled like beer when I got in"
โ†’ Explicit impairment language? YES [SOP ยง8.3 trigger]
โ†’ Ride concluded <1 hour ago? YES (22:14 vs 22:38)
โ†’ Two or more P1 indicators? YES โ†’ P1 confirmed [SOP ยง7.1]
โ†’ Action: suspend Dax + notify Country Safety in 30 min [SOP ยง2.4]
โœ… Severity: P1 true Safety

โœ“ Every step visible. SOP citations. Conclusion backed by criteria. QA-defensible.

๐Ÿ—ณ๏ธ Self-Consistency: 3 Approaches, 1 Answer

For high-stakes Safety / fraud / Dax-suspension decisions, analyse from 3 independent angles and take the majority vote:

ApproachAnalysisConclusion
1. Pax languageExplicit "smelled like beer" โ€” impairment keyword per SOP ยง8.3๐Ÿ”ด P1 SAFETY
2. TimingRide ended 22:14 SGT, case opened 22:38 โ€” within 1-hour P1 window๐Ÿ”ด P1 SAFETY
3. Pax credibilityGrab VIP, no past complaints, consistent ride history๐ŸŸข LEGITIMATE REPORT

Majority: 2/3 P1 SAFETY. Approach 3 alone (looking only at Pax credibility) might have de-prioritised this case. The majority vote catches what a single lens misses โ€” and protects the 30-min SLA.

โš ๏ธ
GS rule: Any decision that could be audited should use CoT โ€” the reasoning trail IS your documentation.

๐Ÿ‘ค Role & Persona Prompting

Same data, dramatically different insights โ€” just by changing who the AI "is." The AI was trained on millions of documents written by different professionals. When you assign a persona, you activate that specific knowledge cluster.

The Persona Formula

You are [TITLE] at [COMPANY TYPE]
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC].
When [SITUATION], you always [BEHAVIOR].
๐Ÿ’ก
The last two fields โ€” CHARACTERISTIC and BEHAVIOR โ€” matter most. "Cautious" vs "opportunity-focused" produces completely different recommendations from the same data.

๐Ÿ‘€ Same Case, Different Eyes

Data: Pax (Grab VIP) reported "the driver smelled like beer", ride ended 22:14, Dax has a 4.2 rating with one prior unresolved Pax safety report

๐Ÿ›ก๏ธ IRT Safety Analyst (default lens)

SEVERITY: P1 โ€” IMMEDIATE ACTION

Explicit impairment language is an automatic P1 trigger per SOP ยง8.3. Dax has a prior unresolved Pax safety flag โ€” pattern risk.

Action: Suspend Dax pending review. Notify Country Safety in <30 min. First-response email to Pax within SLA.

๐Ÿ“ž Pax Care Lead (different lens)

P1 SAFETY + PAX RECOVERY OPPORTUNITY

Pax is a Grab VIP โ€” DSAT and brand risk if not handled with care. Beyond the SOP-required actions, the relationship matters.

Action: Standard P1 escalation + VIP-tier care script. Personalised first response within 15 min, offer goodwill credit, dedicated TL follow-up.

Both are valid. The Safety analyst sees the regulatory + Dax-action picture. The Pax Care lead sees the Pax-relationship picture. Neither is wrong โ€” they serve different audiences (Country Safety lead vs DSAT trend report).

๐Ÿค Multi-Agent Framing

Get 3 perspectives in one prompt โ€” no need to schedule 3 meetings:

PerspectiveFocusKey finding
๐Ÿ›ก๏ธ Risk ManagerDefault rate, exposure, regulation"Doubling limits increases exposure by $12M"
๐Ÿ“Š Product ManagerAdoption, competition, revenue"Current $500 limit is #1 reason for churn"
โš–๏ธ ComplianceResponsible lending, MAS guidelines"MAS requires affordability assessment above $500"

Synthesis: Proceed with phased rollout ($750 first) with income verification. Monitor default rate weekly. Full $1,000 after 90-day review.

๐Ÿ”
The synthesis is where the real insight lives. No single perspective dominates โ€” the balanced recommendation is stronger than any individual view.

The Research Behind It

Multi-agent framing is a well-established prompt engineering technique with several names in the research literature:

TechniqueSourceKey idea
Solo Performance Prompting (SPP)Wang et al., 2023A single LLM simulates multiple personas that collaborate internally โ€” "cognitive synergy through multi-persona self-collaboration"
Multi-Persona Thinking (MPT)arXiv 2025Dialectical reasoning from multiple perspectives to reduce bias and improve decision quality
Town Hall Debate PromptingarXiv 2025Splices a language model into multiple personas that debate one another to reach a conclusion
Self-ConsistencyWang et al., 2022Generate multiple reasoning paths and aggregate โ€” the broader technique family that multi-perspective builds on
๐Ÿ’ก
Why it works: LLMs are trained on millions of documents written by different professionals. When you assign a persona, you activate that specific knowledge cluster. Asking for 3 personas in one prompt triggers 3 distinct "knowledge activations" โ€” producing genuinely different analyses, not just rephrased versions of the same answer.
๐Ÿ”—
Day 2 connection: On Day 1, you simulate multiple perspectives in a single prompt. On Day 2, you'll see the agentic version โ€” the Parallelization pattern โ€” where each perspective actually runs as a separate AI agent simultaneously, and a real aggregator combines the results. Same concept, automated at scale.

๐Ÿ“‹ Structured Outputs & RAG Grounding

Consistent format + grounded in YOUR data = production-safe outputs.

Why Structure Matters

โŒ Unstructured = Conversation

Different every time. Hard to compare. Can't feed into systems. Requires human parsing.

โœ… Structured = Form

Consistent format. Comparable across items. Machine-parseable. Scannable by busy stakeholders.

How to Prompt for Structured Output

Tell the AI exactly what shape the output should take. The more specific your format instructions, the more consistent the results.

TechniquePrompt exampleWhat you get
Named sections"Use these sections: Symptom, Severity, Booking, Action, Next Step"Same headings every case โ€” scannable for TLs and stakeholders
Table format"Present as: Field | Value | Source | Confidence"Aligned data, scannable in D365 case notes
JSON output"Return JSON: {symptom, severity, booking_id, action, sop_citation}"Machine-readable, feeds into D365 / dashboards / Slack escalations
Numbered actions"List 3 next agent actions. Each: action, owner (agent / TL / SPV), deadline, SOP-ID"Actionable items with accountability + SOP traceability
Severity + justification"Assign P1 / P2 / P3 severity. Justify in exactly 2 sentences with SOP citation."Consistent QA-defensible severity decisions across all cases
Length control"Stakeholder summary: max 3 sentences. Detail: max 150 words."Right depth for the audience (stakeholder vs TL vs agent)

Full Example: Combining Techniques

OUTPUT FORMAT: 1. Risk Rating โ€” GREEN/AMBER/RED with 2-sentence justification 2. Key Metrics Table: | Metric | Value | Benchmark | Assessment | 3. Analysis โ€” max 150 words, cite specific numbers 4. Recommended Actions: - Numbered list, each with: action, owner, deadline 5. JSON Summary (for system integration): {"rating": "...", "confidence": 0-100, "top_risk": "..."}
๐Ÿ’ก
Pro tip: You can mix human-readable sections (1-4) with machine-readable JSON (5) in the same prompt. The AI handles both formats in one response. This is how production templates work โ€” the human reads the narrative, the system reads the JSON.

The Best Default Format: Markdown (.md)

When you ask AI to produce a report, analysis, or any reusable document โ€” ask for Markdown. It's the format that works best for both humans and AI.

FormatHuman readableAI readableToken costReusable
PDFโœ…โŒ Can't parseN/AโŒ
Word (.docx)โœ…โš ๏ธ PartialN/AโŒ
HTMLโš ๏ธ Tags clutterโœ…High (~20 tokens/heading)โœ…
Markdown โœ“โœ…โœ…Low (~8 tokens/heading)โœ…

How to ask for it:

Save the output as "case-summary-{case_id}.md" with: - ## headings for each section - | tables | for data comparisons - - bullet lists for action items
๐Ÿ”—
Why this matters for you: In today's exercises, every output file is .md. On Day 2, every artifact you create โ€” steering files (.kiro/steering/rules.md), skills (SKILL.md), agent configs โ€” is Markdown. It's the interface layer between you and AI: structured enough for machines, readable enough for humans, and 60% fewer tokens than HTML.

The Research Behind Markdown for AI

This isn't just a convention โ€” research and industry practice back it up:

FindingImpactSource
Markdown vs HTML token usage60% fewer tokens for same content structureToken comparison (heading: ~8 vs ~20 tokens)
Markdown vs JSON for LLM comprehension16% average token savings with equal or better accuracyFormat performance benchmarks
Table extraction accuracyMarkdown 60.7% vs HTML 53.6%ReleasePad, 2025
RAG retrieval with clean MarkdownUp to 35% better retrieval accuracy, 20-30% fewer tokensAnythingMD
llms.txt web standard (Sept 2024)Websites now serve Markdown specifically for AI agentsJeremy Howard, Answer.AI
LLM Markdown awareness researchLLMs are expected to produce structured Markdown for readabilityarXiv:2501.15000, 2025
๐Ÿ’ก
The industry is converging on Markdown as the standard interface between humans and AI. LLMs are trained on it, tools expect it, and it costs less. The llms.txt standard (proposed by Jeremy Howard of fast.ai in September 2024) is like robots.txt but for AI โ€” websites now serve Markdown files at their root specifically for AI agents to read. When you write a steering file, a SKILL.md, or ask for a report โ€” Markdown is the right default.

๐Ÿ”ง Advanced: XML Tags for Claude (Optional)

This section is for Citizen Developers and technical team members. Most business users can skip this โ€” the plain-text techniques above are all you need for daily use.

When building prompt templates at the code level (Bedrock API, application backends), developers often wrap prompt sections in XML tags. This is how Anthropic recommends structuring complex API calls โ€” the tags create unambiguous boundaries between instructions, data, and constraints.

// Typically constructed in application code, not typed by hand: <role>Senior IRT Team Lead at AnyCompany Support, 8 years handling Safety cases in SEA</role> <data> Case ID: BK-2026-4821 Market: SG ยท Channel: Live Chat ยท Type: IRT Pax: P-99421 (Grab VIP, no past complaints) Dax: D-7711 (rating 4.2, 1 prior unresolved Pax safety flag) Booking: GR-9821, ride ended 22:14 SGT Pax message: "the driver smelled like beer when I got in" </data> <task>Summarise the case for stakeholder escalation. Assign severity P1 / P2 / P3 with SOP citation.</task> <constraints> - ONLY use data in <data> tags - Max 300 words </constraints>
PatternTagsWhen to use
Data analysis<data> <task> <format>Analyzing reports, transactions, metrics
Document Q&A<document> <question> <rules>Policy lookups, compliance checks
Multi-step<context> <step1> <step2>Complex workflows, chained analysis
Review<draft> <criteria> <instructions>Reviewing reports, emails, proposals
๐Ÿ”
Where XML tags live in practice: In production systems, the developer builds the prompt template with XML tags programmatically. The end user fills in a form or pastes data โ€” the application wraps it in <data>...</data> tags behind the scenes before sending to the Bedrock API. You design the content; your tech team handles the XML structure. For daily use in Claude Cowork or Kiro, plain-text headers (### ROLE, ### CONTEXT) work just as well.

๐Ÿ”’ RAG Grounding: Only YOUR Data

Without grounding rules, the AI mixes its training data with your documents โ€” you can't trace which claims come from where. In an audit, that's a disaster.

CRITICAL GROUNDING RULES: 1. Base your answer ONLY on the provided documents 2. After each claim, cite: [Doc Name, Section] 3. If not in documents: "Not available in provided documents" 4. Do NOT use outside knowledge
โŒ Without RAG Grounding

Q: "What happens if a customer misses 3 PayLater payments?"

The account is typically suspended and sent to collections. Late fees of around 1-2% per month are usually applied.

โš ๏ธ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination.

โœ… With RAG Grounding

Q: Same question + policy doc attached + grounding rules

Per PayLater Terms v3.2:
โ€ข After 3 missed payments โ†’ "Default" status [Section 8.3]
โ€ข Late fee: $5 or 1.5% whichever higher [Section 7.1]
โ€ข [INSUFFICIENT DATA: credit score reporting not in provided document]

โœ“ Every claim cites a section. Admits what it doesn't know. No hallucination.

๐Ÿ”ง Interactive Prompt Builder

Toggle techniques on/off to see how the prompt AND the AI's response change. Watch quality improve as you add each technique.

๐Ÿ“ Your Prompt 0 words
Loading...
๐Ÿค– AI Response
Loading...
๐Ÿ’ก What changed: Toggle techniques above to see how the AI response improves.

๐Ÿ“Š Quality Score

Completeness
2/5
Data Grounding
1/5
Actionability
1/5
Consistency
2/5
6/20
Needs work โ€” toggle more techniques

๐Ÿ” Issues in AI Response

โš ๏ธ 7 Prompt Mistakes Everyone Makes

Recognize these patterns? Fix them with one-line additions to your prompt.

MistakeWhy it hurtsQuick fix
๐Ÿณ The Kitchen SinkCramming 5 tasks into 1 promptOne task per prompt, chain results
๐Ÿ“„ The Blank CanvasNo examples = AI guesses your formatShow 1-2 examples of desired output
๐Ÿ™ˆ The Trust FallNo grounding = confident hallucinations"ONLY from provided data"
๐Ÿ” The Vague Ask"Analyze this" โ€” analyze what, how, for whom?Specify audience, format, length
โฑ๏ธ The One-Shot WonderExpecting perfection on first tryPlan for 2-3 refinement turns
๐Ÿ“‹ The Copy-Paste TrapSame prompt for different modelsTune syntax per model family
โš™๏ธ The Set-and-ForgetNever re-testing after model updatesMonthly prompt health checks

๐Ÿ”„ The 3-Round Improvement Workflow

Every production-quality prompt goes through this cycle:

RoundWhat you doResult
1. BaselineWrite prompt using 4 pillars. Run 3 times.See what AI gets right and wrong (~60% quality)
2. Fix failuresAdd negative constraints + example of good output. Run 3 more.Consistency jumps to ~85%
3. PolishAdd self-review step. Tighten format. Test edge cases.Production-ready at ~95%
๐Ÿ’ก
Total time: 15-20 minutes to go from first draft to production template. That template then saves hours every week.

๐Ÿšซ Tell the AI What NOT to Do

Negative constraints prevent common failure modes:

ProblemAdd this constraint
AI adds unsolicited opinions"Do not include personal opinions or speculation"
AI uses data not in your input"Do not reference any data outside the provided documents"
AI writes too much"Do not exceed 300 words"
AI hedges everything"Do not use phrases like 'it depends' or 'generally speaking'"
AI explains obvious things"Do not explain what PayLater is or how digital wallets work"
AI invents numbers"If a metric is not in the data, write [DATA NOT AVAILABLE]"
๐Ÿ”
Source: Claude's prompting best practices recommend telling Claude what to do instead of what not to do for general instructions, but negative constraints are highly effective for preventing specific failure modes โ€” especially in finance where hallucinated numbers are dangerous. Claude Prompting Best Practices โ†’

๐Ÿง  The #1 Misconception: "AI Remembers Me"

It doesn't. Each session is completely isolated. The AI has zero memory of previous conversations.

โŒ What people think
  • "It remembers our conversation from last week"
  • "I should keep this tab open so it doesn't forget"
  • "My old sessions are giving it context"
โœ… How it actually works
  • Each session starts with zero memory
  • Old tabs have no effect on new sessions
  • Closing old sessions is safe โ€” cosmetic, not functional
What persistsWhat doesn't
โœ… Files in your workspace (reports, templates, code)โŒ Chat conversation history
โœ… Steering files (.kiro/steering/) โ€” loaded every sessionโŒ What you said 3 sessions ago
โœ… Skills (.kiro/skills/) โ€” activated by keywordsโŒ Old tabs or closed sessions
โœ… Custom agents (.kiro/agents/) โ€” invoked by nameโŒ Your "relationship" with the AI
๐Ÿ’ก
The mental model: chat is ephemeral, files are permanent. Save important outputs as files. Reference files (not old chats) when you need context in a new session. Steering files and skills ARE the AI's persistent memory โ€” they're loaded automatically into every new session.