For the moments the room wants to go deeper — GS Support edition
AnyCompany Support · GS Cyborg Edition · Generative & Agentic AI Workshop
Companion to Module 7 (Prompt Engineering Explainer) and the Case Context Summarizer Exercise
Who this deck is built for
Every example in the slides ahead is calibrated to the functions in the room — not generic prompt examples from the internet.
7
GS Operations & Service Success
5
Chatbot & Digital Content / Automation
3
Training & Quality (TQA / TQM)
3
WFM, Reporting & Analytics
2
Knowledge Base & Help Centre
1
Strategy / PMO
1
SORT (Social Care)
22
total across 8 markets
Anchor examples by function: GS Ops → Case Context Summarizer · Chatbot/Automation → Guided SOP Assistant · Training & Quality → DSAT write-up + LiveChat Simulator · WFM/Analytics → Forecasting Co-Pilot · KB/Help Centre → SOP citation grounding · Strategy → OKR auto-commentary · SORT → social chat tagging. Pick the example closest to who's asking.
Module 1
Prompt Fundamentals Deep Dive
The 4 pillars that determine 80% of output quality
The 80/20 Rule of Prompting
80% of output quality comes from getting four things right — every other technique is a refinement.
1Clarity
Say exactly what you mean. If a colleague would ask "what do you mean?" — your prompt needs work.
2Context
Give the AI the background it needs. Without context, it guesses — dangerous in finance.
3Role
Tell the AI who to be. An "audit lead" looks at different signals than an "FP&A partner."
4Output Framing
Define what "done" looks like — format, length, structure, style.
Pillar 1: Clarity
Vague
Clear
"Categorise these invoices"
"Categorise these 47 vendor invoices into the 6 GL expense categories used in our chart of accounts. Flag any that don't match a single category clearly."
"Write up the variance"
"Draft a 3-paragraph variance commentary on Q3 marketing spend (8% over plan). Lead with the driver. Cite the two largest line items."
"Help me with this circular"
"Summarise the new IRAS GST e-tax guide (attached) in 5 bullets — focus on what changes for our existing PR/PO workflow."
"Review this contract"
"Compare the attached vendor MSA against our standard procurement template. List clauses that differ — clause name, our standard, vendor proposal, deviation type."
Rule of thumb: The more specific your prompt, the less the AI has to guess. Specificity isn't verbosity — it's precision.
Pillar 2: Context
Without context
"Is this vendor invoice unusual?"
With context
"This vendor (Acme Tech) typically invoices SGD 8–12K monthly for managed services. This month: SGD 47K with no PO change order. Period: Sept 2026. Is this unusual? Cite which signals make it so."
Four types of context you can layer:DomainDataSituationalConstraints
The 4 Types of Context
Type
What it tells the AI
Finance example (this cohort)
Domain
The industry, market, business area
"In the context of an SEA financial-services group's monthly close — SGD reporting currency, IFRS basis, three operating regions."
Data
The specific numbers, records, or documents to analyse
"Here is the vendor's invoice history for the last 12 months: [data]. Here is our PO ledger for the same period: [data]."
Situational
Why you need this now — the trigger or audience
"Audit committee meets Friday. Treasury flagged a 23% jump in vendor disbursements last month."
Constraints
Rules, limits, requirements the output must follow
"All amounts in SGD. Materiality threshold SGD 250K. Cite IAS 36 references where impairment is mentioned."
Rule of thumb: Skip Domain → generic answers. Skip Data → hallucination. Skip Situational → wrong tone or depth. Skip Constraints → output ignores your standards.
Context in Action: Vendor Scorecard
[DOMAIN]
You are reviewing a strategic-sourcing vendor for AnyCompany
Financial Group's regional procurement team (SG-anchored, SEA scope).
[DATA]
Vendor: Acme Tech Solutions Pte Ltd (ID: V-08842)
Category: Technology Procurement (managed services)
Annual spend: SGD 1.4M → SGD 3.2M (FY26 vs FY25)
Active POs: 12 (up from 4)
Invoice exceptions: 18 in last 90 days (avg 3 prior)
Top exception type: PO-to-invoice price variance >5%
[SITUATIONAL]
Vendor proposed for Tier-1 status in FY27 sourcing review.
Category Head needs to take a position by next Tuesday.
[CONSTRAINTS]
- All amounts in SGD
- Reference our standard vendor-tier rubric (attached)
- Use ONLY the data provided above and the attached rubric
- Output must include a recommended Tier (1 / 2 / 3) and three key risks
Pillar 3: Role Assignment
Role
What changes in the output
Internal Audit Lead
Focuses on assertions, evidence quality, sample design; flags hallucination risk
FP&A Partner
Drivers, comparisons, scenarios, narrative around variance
Disclosure-language tone, consistency with prior period, IFRS-paragraph references
Pro tip: Add experience and scope — "Senior FP&A Partner with 12 years in SEA financial services, specialising in monthly variance commentary for the regional CFO." That second sentence is what shifts the output from generic to expert.
Pillar 4: Output Framing
Dimension
Example
Format
"Respond as a 4-column table: Risk · Likelihood · Impact · Owner"
"Use sections: Summary · Findings · Recommendation · Open Questions"
Style
"Write for a non-technical audit committee reader" / "Disclosure-language tone — consistent with the FY25 annual report"
Constraints
"Do not include forward-looking statements" / "No personal opinions"
This is where most prompts fall short. People tell the AI what to do but not how to present it. Like asking a designer to "make a poster" vs "A3 landscape, headline at top, three bullets in the middle, contact info at bottom."
All Four Pillars Together
[ROLE] You are a Senior Internal Audit Lead at an SEA
financial-services group, specialising in technology
audits and analytics-driven sampling.
[CONTEXT] Below is the IT change-log extract for Q3 2026
(audit period). Materiality threshold for SOX-relevant
changes: SGD 250K. Total changes: 1,247. Emergency
changes (no CAB approval): 38. Scope: production
financial systems only.
[TASK] Identify the 5 highest-risk changes for
substantive testing, ranked by audit risk.
[OUTPUT]
Format as a table:
Rank | Change ID | System | Risk Driver | Test Procedure | Sample Size
After the table, write a 2-sentence summary I can drop
into the audit committee pre-read.
Use ONLY the data provided. Cite specific Change IDs.
See the Difference: Variance Commentary
❌ Without the four pillars
Prompt: "Write something about the marketing variance"
The marketing budget came in over plan this quarter, which is something to take note of. There were several reasons for this including some campaigns that ran longer than expected and some agency fees that were higher than originally budgeted. Overall the team should look into this and consider whether to adjust spending going forward.
⚠ Wall of text. No numbers. No drivers named. No action.
✅ With the four pillars
Role + Context + Task + Output framing — same data
Q3 Marketing Variance — Commentary Headline: Marketing ran 8.2% over plan (SGD +1.41M), driven by two line items.
Drivers:
• Brand campaign extended into Sep — SGD +0.92M (65% of variance) — vs plan ended Aug
• Agency retainer renegotiated mid-Q — SGD +0.49M — phasing already aligned for Q4
✓ Number-led, drivers cited, action clear, scannable in 30 seconds
Module 2
Chain-of-Thought Reasoning
Make the AI show its work — step by step, audit-defensible
Why Chain-of-Thought?
Finance decisions are multi-step by nature. CoT makes the reasoning visible and auditable — which is the same thing a Big-4 audit reviewer would demand from your team.
Without CoT
"Is this intercompany
balance reconciled?"
→ "Yes, it reconciles."
(no working shown)
With CoT
"Walk through this step
by step."
SG ledger: SGD 4.2M
TH ledger: SGD 4.18M (after FX)
Difference: SGD 0.02M (0.5%)
Materiality: SGD 0.25M
0.5% < mat. → Reconciled
Four CoT Techniques
Technique
Trigger
Best for (this cohort)
Zero-Shot CoT
"Think step by step"
Period-close calculations · GL classification logic · simple variance math
NOW CLASSIFY:
Vendor B: SGD 0.4M annual; one of three approved providers; multi-year contract until FY28.
AI output (mirrors your pattern)
AI Response
Step 1 — Spend SGD 0.4M < SGD 1M threshold ❌
Step 2 — Three approved providers = no dependency ✅
Step 3 — Contract until FY28 = no near-term renewal risk ✅
Step 4 — TIER 3 (sub-threshold spend + healthy alternatives + locked term).
Note: Re-evaluate in FY27 H2 ahead of contract anniversary.
Key insight: The AI copied your exact Step 1–4 format, applied the same rubric, and even added a forward-looking note — all from one example.
Step-Back Prompting — Tax Circular Impact
First identify the principles, then apply them to the specific case.
Your prompt
PROMPT
Before analysing this new IRAS GST e-tax guide, first answer:
1. What are the 4 most relevant criteria for assessing GST-circular impact on a financial-services group?
2. What are common transition pitfalls in similar guides?
Now assess THIS guide using those criteria:
[Attached: IRAS e-Tax Guide on Reverse Charge for B2B services, 17-page PDF, effective 1 Jan 2027]
AI response
AI ResponseSTEP 1 — KEY CRITERIA:
1. Scope of newly-taxable supplies
2. Effective date and transitional period
3. Compliance burden (system / process changes)
4. Materiality of impacted spend categories
ASSESSMENT — MEDIUM IMPACT
• In scope: cross-border managed services (Tech & Marketing Procurement) — ~SGD 12M FY27 spend
• Effective 1 Jan 2027; transition rules apply for POs raised before 30 Sep 2026
• System change required: GL mapping for reverse-charge GST input/output Recommend: Tax + Procurement joint task force in October to scope PO-cutoff strategy.
Why Step-Back works: The AI built a framework FIRST, then applied it systematically — instead of jumping straight to "looks like medium impact." This is exactly what your Tax team would do manually.
Self-Consistency for High-Stakes Calls
Solve three ways, report the majority — for decisions where being wrong is costly.
Your prompt
PROMPT
Assess whether this asset shows impairment indicators per IAS 36.
Analyse using 3 independent approaches:
1. External: Market / industry / regulatory signals
2. Internal: Operating performance / cash-flow signals
3. Quantitative: Carrying value vs recoverable amount estimate
For each: conclude IMPAIRMENT INDICATED / NOT INDICATED.
Final assessment = majority. If approaches disagree → flag for partner-level review.
AI response
AI ResponseApproach 1 — External: Industry consolidation; key competitor wrote down similar asset Q2. → INDICATED
Approach 3 — Quantitative: Carrying SGD 18.4M vs preliminary VIU SGD 19.1M (margin 3.8%). → NOT INDICATED
MAJORITY: 2 of 3 INDICATED Recommend: Run formal impairment test for FY26 close. Engage component auditor early.
Why three approaches? Approach 3 alone would have cleared the asset. The majority vote catches what a single quantitative test misses — exactly why your auditors triangulate.
See the Difference: Audit Sampling Decision
❌ Without CoT
Prompt: "Should we use 100% testing or sampling for the IT change-log review?"
I would recommend sampling. Sampling is a reasonable approach when the population is large and the controls are operating. It provides good coverage and is efficient. The team should select a sample and test it.
⚠ No math, no risk reasoning, no sample-size justification
✅ With Chain-of-Thought
Prompt: "Walk through the sampling decision step by step against our methodology."
Step 1 — Population: 1,247 changes (Q3). 38 emergency changes (no CAB) = high-risk subpopulation. Step 2 — Materiality: SOX scope only → 412 changes touch financial systems. Step 3 — Risk: Emergency changes are higher-risk, suggesting stratified approach. Step 4 — Method: 100% test on the 38 emergency changes; statistical sample of 47 from the remaining 374 (95% conf, 5% tolerable error). Decision:Stratified sampling. Total tests: 85 (vs 412 for 100%). Justified by emergency-change risk concentration.
✓ Every step shown, sample-size method named, defensible against partner review
Module 3
Persona & Multi-Perspective
Same data, different eyes — pick the perspective that fits your audience
The Persona Formula
You are [TITLE] at [ORGANISATION TYPE]
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC].
When [SITUATION], you always [BEHAVIOR].
🛡 Conservative Audit Lead
"Senior Internal Audit Lead, 14 years in financial-services audit. Known for evidentiary rigour. When in doubt, requests additional substantive testing rather than relying on controls reliance."
📈 Forward-Looking FP&A Partner
"FP&A Partner, 9 years in regional planning for SEA financial services. Known for crisp narrative. Always pairs the variance number with a forward-look on Q4 implications."
The last two fields matter most: CHARACTERISTIC and BEHAVIOR shape the AI's judgement calls. "Evidentiary rigour" vs "crisp narrative" produce completely different outputs from the same data.
Same Data, Different Eyes
Data: Q3 revenue ran 6% under plan; new product line accounts for 80% of the gap; competitor disclosed similar headwind in their Q3 release.
🛡 Conservative Audit Lead read
FINDING — REQUIRES TESTING
Concern: Revenue under-plan by 6% concentrated in a single new product line raises completeness and cut-off concerns. Competitor parallel may indicate sector dynamic or may indicate management bias on accruals.
Test plan:
• Substantive cut-off testing on Q3-end revenue
• Recalculation of contract-asset balances on the new product line
• Inquiry of revenue-recognition policy applied vs prior period
Recommend: Formal management representation re: revenue completeness; flag as Significant Risk for audit-committee pre-read.
📈 Forward-Looking FP&A Partner read
VARIANCE COMMENTARY — Q3 REVENUE
Headline: Revenue 6% under plan (–SGD 8.4M); 80% concentrated in new product line; pattern matches sector signal from competitor's Q3 release.
What it means:
• Sector headwind, not idiosyncratic — recovery likely synchronous with peers
• New product line ramp 1 quarter behind, not broken — pipeline coverage holding at 1.2×
Forward look: Q4 still trackable to full-year guidance if Oct close lands at plan. Recommend committee note, not full-year revision. Partner with Reporting on disclosure language.
Multi-Perspective in One Prompt
Get three perspectives in one prompt — instead of scheduling three meetings.
Each: 3 concerns + Recommendation + 1 metric to monitor.
Then: Balanced synthesis from all three views.
AI response (synthesis excerpt)
AI Response🛡 AUDIT: "Capitalisation requires documented technological-feasibility tests per project. Risk of inconsistent application across squads. Recommend: central register of capitalisation decisions with quarterly review."
📈 FP&A: "Shifts ~SGD 12M from opex to amortisation over 3 years. Distorts run-rate margin comparisons unless prior periods adjusted. Reforecast required."
📋 REPORTING: "IAS 38.57 criteria need explicit policy disclosure in FY27 financials. Two SGX peers disclose similar threshold — comparable."
SYNTHESIS: Adopt with three guardrails — central capitalisation register (Audit ask), prior-period restatement of run-rate margin (FP&A ask), accounting-policy disclosure paragraph drafted by Reporting before Q1 close.
Why this works: Forces balanced analysis. No single function dominates. The synthesis is where the real recommendation lives — a policy adoption with three named guardrails.
Same Data, Different Audiences
Data point: "Q3 expense variance ran 8% over plan, driven by marketing extension and agency renegotiation"
Audience
Persona prompt
Output style
Audit Committee
"You are the CFO presenting a 5-minute update to the audit committee chair"
Strategic, control-aware, no surprises framing
Regional CFO
"You are the FP&A Partner pre-briefing the regional CFO before the close call"
Number-led, driver-aware, forward-look on Q4
Functional Owner (CMO)
"You are the FP&A business partner briefing the CMO on her budget"
Decision-oriented — what to do about Q4 phasing
Group Reporting
"You are the Group Reporting Lead drafting MD&A language"
💡 In-room exercise (10 min): Pick the data point above. Write prompts for two of these audiences. Compare how the tone, depth, and recommendations change. The data is identical — only the framing moves.
Module 4
Structured Outputs & Markdown
Get consistent, comparable, machine-readable output every time
Why Structure Matters
Unstructured = Conversation
Different every time. Hard to compare. Can't drop into the next step. Requires human re-formatting.
Structured = Form
Consistent format. Comparable across items. Drops into Excel / your reporting pack / your audit working papers without re-typing.
Daily-work cases for this cohort:
Procurement: Invoice-line extraction → AP ledger import
Controllership: Journal-entry classification → reconciliation working paper
Audit: Control-test result coding → consistent finding template
Reporting: MD&A bullet drafting → disclosure-template population
"Give a Tier 1/2/3 rating. Justify in exactly 2 sentences."
Consistent decision format across the batch
Length control
"Executive summary: max 3 sentences. Detail: max 250 words."
Right depth for the audience
Markdown output
"Save as .md with ## headings, bullet lists, and | tables"
AI-native format — low tokens, reusable, versionable
Pro tip — combine techniques: "Use sections — Headline (3 sentences), Driver Table (Driver | SGD | % of variance | Phasing), Action (numbered, with owner and deadline). Return the audit-committee-ready summary as JSON at the end."
The Best Default Format: Markdown
When you ask AI to save output as a file or produce a reusable document, Markdown wins on every dimension that matters:
Ask AI to "save as .md" — you get structured headings, tables, and lists with ~60% fewer tokens than HTML. Readable by you, parseable by AI, and on Day 2 every Cowork Skill (SKILL.md), Project Instruction, and saved artefact will be Markdown.
Why Markdown? The Numbers
Not just a preference — Markdown is measurably better for AI work:
60%
fewer tokens than HTML for the same content structure
35%
better RAG retrieval accuracy with clean Markdown vs unstructured text
61%
table-extraction accuracy in Markdown vs 54% for HTML tables
llms.txt
2024 web standard — sites now serve Markdown specifically for AI agents to read
What this means for you: Cowork Project Instructions and saved Skills load on every request. Concise Markdown = lower cost, better accuracy, and outputs that stay reusable across tools. Detailed sources in the M8 Prompt Engineering explainer.
Module 5
RAG Grounding for Finance
Make every AI claim traceable to YOUR documents — not its training data
The Grounding Problem
Without grounding rules, the AI mixes its training data with your documents — and you can't tell which is which.
❌ Without grounding
Fills gaps with plausible fiction — invents policy details that aren't in your handbook
Hedges with "typically" / "usually" — masking that it's guessing
Mixes sources invisibly — your doc + training data, no way to separate
Never says "I don't know" — answers confidently regardless
✅ With grounding rules
Every claim cites the source — section / paragraph after each statement
Admits gaps explicitly — "[INSUFFICIENT DATA]" instead of inventing
No outside knowledge — strictly from the provided documents
Audit-ready output — every claim is traceable
Why this matters for this room: If a Tax interpretation, audit conclusion, or Procurement contract clause is challenged, "the AI told me" is not a defence. Grounding rules turn AI output from a starting point into a defensible reference.
RAG — The 4 Grounding Rules
Add these rules to any prompt where defensibility matters:
CRITICAL RULES:
1. Base your answer ONLY on the provided documents
2. After each claim, cite the source: [Doc · Section]
3. If not covered: write "[INSUFFICIENT DATA: not in
provided documents]"
4. Do NOT use outside knowledge or assumptions
DOCUMENTS:
[Doc 1: Procurement Policy v4.2 — Variance & Tolerance]
[Doc 2: Standard MSA Template — Pricing & Adjustment Clauses]
QUESTION: What is our policy if a vendor invoice is more
than 10% above the matched PO total?
When to use: Policy lookups · Tax circular Q&A · audit-procedure determination · contract-clause review · disclosure-language drafting · regulatory impact assessment — any task where the answer must be traceable to a specific document.
See the Difference: Procurement Policy Q&A
❌ Without RAG Grounding
Prompt: "What's our policy if an invoice is over PO by 10%?"
If a vendor invoice exceeds the PO by more than 10%, the standard practice is typically to require a change order before payment. Late fees of around 1–2% per month are usually applied to disputed amounts. The vendor's spend pattern is reviewed and they may be flagged for category-head escalation if the variance is recurring.
⚠ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination.
✅ With RAG Grounding
Prompt: "ONLY from provided documents" + Procurement Policy attached
Per Procurement Policy v4.2:
• PO-to-invoice tolerance is 5% (not 10%) — variances above tolerance require a change order before approval. [Procurement Policy v4.2 · §3.4]
• Variances above 15% require Category Head sign-off in addition to the change order. [§3.4.2]
• [INSUFFICIENT DATA: late-fee policy not addressed in provided documents — refer to MSA Template clauses]
✓ Every claim cites a section. Admits what isn't covered. Caught the 10% misconception.
Module 6
Best Practices & Common Anti-Patterns
The seven mistakes everyone makes — and the fixes that take 30 seconds
7 Prompt Mistakes Everyone Makes
Mistake
Why it hurts
Quick fix
The Kitchen Sink
Cramming 5 tasks into 1 prompt
One task per prompt — chain results
The Blank Canvas
No examples → AI guesses your format
Show 1–2 examples of desired output
The Trust Fall
No grounding → confident hallucinations
"ONLY from provided data"
The Vague Ask
"Analyse this" — analyse what, how, for whom?
Specify audience, format, length
The One-Shot Wonder
Expecting perfection on first try
Plan for 2–3 refinement turns
The Copy-Paste Trap
Same prompt for different models
Tune the framing per model tier
The Set-and-Forget
Never re-testing after model updates
Monthly health check on production prompts
The Draft → Score → Revise Loop
Don't accept the first output. Build a self-improving cycle into your prompt:
Step 1 — DRAFT: Write a 200-word audit-committee summary
of the IT change-log review using the data provided.
Step 2 — SCORE: Rate your draft on these criteria:
• Completeness (0-5): All four required sections?
• Grounding (0-5): Every claim cites a Change ID?
• Actionability (0-5): Specific test plan with sample sizes?
Step 3 — REVISE: If total < 12 of 15, rewrite to fix the
lowest-scoring area. Maximum 2 revisions.
Output ONLY the final version after the loop completes.
Why this works: The AI self-corrects before you read it. Teams using this pattern report 40–60% fewer revision cycles. Three lines added to the prompt. Zero technical skill required.
Break Big Tasks into Small Steps
Complex tasks fail when you ask for everything at once. Decompose instead:
❌ One Giant Prompt
"Analyse our Q3 close exceptions,
identify the top intercompany
mismatches, calculate the
materiality impact, compare to
Q2, draft a controllership
note, and recommend three
process improvements."
6 tasks = shallow work on each
✅ Chained Prompts
P1: "Analyse Q3 close
exceptions and flag
intercompany mismatches"
P2: "From these mismatches,
identify the top 3 by
materiality"
P3: "Compare Q3 vs Q2 on
the same three"
P4: "Draft a controllership
note with three process
improvements"
Each step gets full attention
Tell the AI What NOT to Do
Positive instructions tell the AI what to include. Negative constraints prevent common failure modes:
Problem you keep seeing
Negative constraint to add
AI adds unsolicited opinions
"Do not include personal opinions or speculation"
AI uses data not in your input
"Do not reference any data outside the provided documents"
AI writes too much
"Do not exceed 300 words. Do not add a conclusion section."
AI hedges everything
"Do not use 'it depends' or 'generally speaking'"
AI explains the obvious
"Do not explain what materiality is or how variance commentary works"
AI invents numbers
"If a metric is not in the data, write [DATA NOT AVAILABLE]"
Pro tip: After your first test run, note what went wrong and add a "Do NOT" line for each issue. Your prompt improves with every iteration.
Decision Rules: Override Subjective Judgement
Different models give different reads on the same data. Decision rules encode YOUR policy:
❌ Without rules
Sonnet 4.5: "Variance 8% over plan = significant" → flag for committee Sonnet 4.6: "Variance within reasonable range" → routine commentary Same data, two reads, no policy reference.
✅ With decision rules
RULES: Variance >5% AND >SGD 1M → flag for committee · Variance >5% only → category-head note · Variance ≤5% → routine commentary
Sonnet 4.5: "8% variance, SGD +1.41M ⇒ flags both rules" → committee Sonnet 4.6: "8% variance, SGD +1.41M ⇒ flags both rules" → committee Both agree. Policy enforced.
Use when: Consistency matters more than nuance — variance flagging, materiality calls, audit-finding severity, vendor tier-rating, capitalisation thresholds. If your team has a written threshold, encode it in the prompt.
Structure Your Prompts Like Documents
Well-organised prompts produce well-organised outputs. Use clear sections and delimiters:
### ROLE
You are a Senior IRT Team Lead at AnyCompany Support.
### CONTEXT
<<<
[Paste the segmental P&L extract or disclosure draft here]
>>>
### TASK
Draft the Operating & Financial Review section for the Q3 2026 board pack.
### OUTPUT FORMAT
- Headline (1 sentence, leads with the number that matters)
- Driver paragraphs (3 max, in order of variance contribution)
- Forward look (max 100 words)
- Disclosure-language tone, consistent with FY25 annual report
### CONSTRAINTS
- Use ONLY the data provided above
- All amounts in SGD
- Do not exceed 400 words total
- Do not include forward-looking statements not already in management commentary
Why delimiters matter: The triple angle brackets tell the AI "this is input data, not instructions." Without them, the AI may follow instructions inside a pasted document instead of analysing it — especially dangerous when pasting policy text or contract clauses.
Show, Don't Tell — One Example Beats 100 Words
Especially for batched classification work — like the journal-entry coding the Controllership team does monthly:
❌ Telling
"Code each journal entry as
direct cost, indirect cost,
or capex based on the GL
account, vendor, and
description. Format as a
table with JE ID, code,
and reasoning."
~50 words of instruction. AI still guesses your format.
✅ Showing
"Code each journal entry like this:
| JE ID | Code | Reason |
| JE-1142 | DC | Vendor invoice for
client engagement deliverables |
| JE-1156 | CAPEX | Software license
>SGD 250K, 3-year term |
Now code these: [data]"
One example = perfect format every time
The 3-Round Improvement Workflow
Every production-quality prompt goes through this cycle:
Round
What you do
What improves
R1: Baseline
Write your first prompt using the four pillars. Run it 3 times.
You see what the AI gets right and wrong
R2: Fix failures
Add negative constraints for each failure. Add one example of good output. Run 3 more times.
Consistency jumps from ~60% to ~85%
R3: Polish
Add the self-review (draft-score-revise) step. Tighten length and format. Test edge cases.
Production-ready at ~95% consistency
Total time: 15–20 minutes to go from first draft to production template. The template then saves hours every week.
Build a Team Prompt Library
Your best prompts are team assets, not personal notes. Treat them like shared templates:
📋 Metadata each prompt should include
Prompt name and purpose
The full prompt with {{variables}}
Recommended model tier (Haiku / Sonnet / Opus)
1–2 example outputs (good vs bad)
Known limitations and edge cases
Last tested date and model version
🎯 Starter library for this cohort
Vendor scorecard (Procurement)
Variance commentary (FP&A)
Audit finding writeup (Internal Audit)
Tax circular impact memo (Tax)
MD&A bullet draft (Reporting)
Intercompany reconciliation note (Controllership)
Day 2 connection: In Cowork, every entry in this library becomes a saved Skill (a SKILL.md file). The library moves from "shared Word doc" to "installable bundle the whole team uses with one click." That's why we save them as Markdown.
Module 7
Managing Long Conversations
Why AI "gets dumber" mid-session — and what to do about it
Why AI "Gets Dumber" Mid-Conversation
It's not a bug — it's a context window problem. Every AI has a finite "working memory."
⚙ What happens inside
Every message + every AI response stays in the context window
At 60–70% capacity, performance drops sharply — sudden cliff, not gradual
The AI compresses and deprioritises earlier messages
"Lost in the Middle" — AI remembers the start and end best, forgets the middle (Liu et al. 2023, one of the most cited AI papers)
👀 What you experience
AI contradicts instructions you gave 10 messages ago
AI re-introduces ideas you already rejected
AI ignores constraints from the start of the chat
Outputs get vague, generic, or repetitive
AI starts hallucinating more frequently
Key insight: Most people blame the AI for "getting stupid." The real problem is the conversation got too long. The fix is context management, not a bigger model.
5 Rules for Managing Long Conversations
Rule
Why it works
One task per session — don't mix audit, FP&A, and contract review in one chat
Each session gets full attention capacity
Paste only what's relevant — don't dump the entire policy doc when you need three clauses
Reduces noise, keeps the AI focused
Key instructions at start AND end — restate critical constraints in the closing line
Exploits primacy + recency bias
Keep sessions under 15–20 turns — start fresh after that
Stays in the performance sweet spot
Use session summaries — ask AI to summarise, paste into a new session
Fresh context window with all the accumulated knowledge
The Session Summary Technique
When a conversation gets too long but you can't lose the state:
Step 1 — Ask for a summary
In the OLD session
Summarise our conversation so far:
• Key decisions we made (and why)
• Data and findings established
• What still needs to happen next
Format as a briefing I can paste into a new session.
Step 2 — Start fresh with context
In the NEW session
Here is the context from our previous session:
[PASTE SUMMARY HERE]
Continue from where we left off. The next step is to draft the audit-committee pre-read using the findings established above.
✓ Fresh context window + all accumulated knowledge = best of both worlds
Think of it as "saving your game." You compress hours of conversation into a focused briefing, then load it into a fresh session with full attention capacity.
"Save Your Game" — Memory for Long Projects
For projects that span weeks or months, you want two files — not one growing document:
📋 project-status.md
Load every session — compact, ~2 pages
What exists now (file list, decisions made)
What's remaining (next steps)
Key rules and constraints
Like a project brief — current state only
📓 session-log.md
Load only when needed — grows over time
What was done each session
Why decisions were made
Edge cases and gotchas
Like meeting minutes — history archive
When
What to say
Start session
"Here's my project context: [paste project-status.md]"
End session
"Update project-status.md with the current state. Append today's work to session-log.md."
Look back
"Read session-log.md — when did we change the materiality threshold and why?"
In Cowork (Day 2): The project-status pattern becomes your Project Instructions — loaded automatically on every conversation in that project. The session-log becomes a file in your mounted folder. Persistent memory without the manual paste.
The #1 Misconception: "AI Remembers Me"
It doesn't. Each session starts with zero memory. Here's what AI actually sees:
❌
What people think
"The AI remembers our conversation from last week"
"It knows what I worked on yesterday"
"I should keep this session open so it doesn't forget"
"My old browser tabs are giving it context"
✅
How it actually works
Each session starts with zero memory
AI sees only: your current message + this session's history
Old tabs / sessions have no effect on new ones
Closing old sessions is safe — it's cosmetic, not functional
Mental model: chat is ephemeral, files are permanent. The AI's "memory" is the files it created — reports, templates, Skills. Those persist. The conversation that produced them does not. When you need context in a new session, reference the files — not the old chat.
When to Start Fresh vs. Continue
🟢 Start a new session
Switching to a completely different task
The conversation has gone off track
You want to test a refined prompt cleanly
Session is past 15–20 turns
AI keeps repeating the same mistake
AI contradicts earlier instructions
🔵 Continue the session
Iterating on the same output
You need the AI to remember earlier context
Building step by step (funnel pattern)
Refining format or tone
Follow-up questions on the same topic
Session is still under 15 turns
The 3-strike rule: If you've corrected the AI three times and it's still wrong — the context is working against you. Start fresh. It's faster than fighting a polluted conversation.
Circuit Breaker Patterns
Pattern
Symptom
Fix
Repetition Loop
Same wrong output after correction
New session, rephrase from scratch
Hallucination Spiral
Inventing data not in your input
Add: "Use ONLY provided data"
Over-Eager Helper
2,000 words for a 5-bullet ask
Add: "Exactly 5 bullets, under 20 words each"
Format Drift
Format changes mid-output
Add: "Continue using EXACTLY the same format"
Confidence Trap
Uncertain info presented as fact
Add: "Prefix uncertain claims with [UNCERTAIN]"
Take it home
Quick Reference Card
The 12 trigger phrases that make every prompt better — copy this slide
12 Trigger Phrases — Pinned to Your Desk
Technique
Trigger phrase to add to your prompt
Zero-Shot CoT
"Think step by step before answering"
Expert Persona
"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective
"Present the case FOR and AGAINST" / "Three voices: Audit · FP&A · Reporting"
Structured Output
"Use EXACTLY these sections: 1… 2… 3…"
RAG Grounding
"Base your answer ONLY on the provided documents"
Self-Critique
"Review your draft: Is every claim supported by data?"
Meta-Prompting
"Write the best possible prompt for [TASK]"
Negative Constraints
"Do NOT include / Do NOT use / Do NOT exceed"
Decision Rules
"If [metric] > [threshold] → MUST be [rating]"
Task Decomposition
Break 1 big prompt into 3–4 focused prompts
Draft → Score → Revise
"Draft, score on [rubric], revise if < threshold"
Show, Don't Tell
Include 1–2 examples of desired output format
That's the Appendix
Pull this deck up any time the room asks "but how do I actually do that?"
Live alongside · M7 Prompt Engineering Explainer · The Day 1 Case Context Summarizer Exercise
Site: https://grabsupport.lab.mywcloud.net
AnyCompany Support · GS Cyborg Edition · June 2026
M Menu O Overview S Speaker Notes F Fullscreen ? All shortcuts