Day 1 · Appendix Deck

Prompt Engineering
Deep Dive

For the moments the room wants to go deeper — GS Support edition

AnyCompany Support · GS Cyborg Edition · Generative & Agentic AI Workshop

Companion to Module 7 (Prompt Engineering Explainer) and the Case Context Summarizer Exercise

Who this deck is built for

Every example in the slides ahead is calibrated to the functions in the room — not generic prompt examples from the internet.

7
GS Operations &
Service Success
5
Chatbot & Digital
Content / Automation
3
Training & Quality
(TQA / TQM)
3
WFM, Reporting
& Analytics
2
Knowledge Base
& Help Centre
1
Strategy /
PMO
1
SORT
(Social Care)
22
total across
8 markets

Anchor examples by function: GS Ops → Case Context Summarizer · Chatbot/Automation → Guided SOP Assistant · Training & Quality → DSAT write-up + LiveChat Simulator · WFM/Analytics → Forecasting Co-Pilot · KB/Help Centre → SOP citation grounding · Strategy → OKR auto-commentary · SORT → social chat tagging. Pick the example closest to who's asking.

Module 1

Prompt Fundamentals
Deep Dive

The 4 pillars that determine 80% of output quality

The 80/20 Rule of Prompting

80% of output quality comes from getting four things right — every other technique is a refinement.

1Clarity

Say exactly what you mean. If a colleague would ask "what do you mean?" — your prompt needs work.

2Context

Give the AI the background it needs. Without context, it guesses — dangerous in finance.

3Role

Tell the AI who to be. An "audit lead" looks at different signals than an "FP&A partner."

4Output Framing

Define what "done" looks like — format, length, structure, style.

Pillar 1: Clarity

VagueClear
"Categorise these invoices""Categorise these 47 vendor invoices into the 6 GL expense categories used in our chart of accounts. Flag any that don't match a single category clearly."
"Write up the variance""Draft a 3-paragraph variance commentary on Q3 marketing spend (8% over plan). Lead with the driver. Cite the two largest line items."
"Help me with this circular""Summarise the new IRAS GST e-tax guide (attached) in 5 bullets — focus on what changes for our existing PR/PO workflow."
"Review this contract""Compare the attached vendor MSA against our standard procurement template. List clauses that differ — clause name, our standard, vendor proposal, deviation type."

Rule of thumb: The more specific your prompt, the less the AI has to guess. Specificity isn't verbosity — it's precision.

Pillar 2: Context

Without context "Is this vendor invoice unusual?"
With context "This vendor (Acme Tech) typically invoices SGD 8–12K monthly for managed services. This month: SGD 47K with no PO change order. Period: Sept 2026. Is this unusual? Cite which signals make it so."

Four types of context you can layer: DomainDataSituationalConstraints

The 4 Types of Context

TypeWhat it tells the AIFinance example (this cohort)
DomainThe industry, market, business area"In the context of an SEA financial-services group's monthly close — SGD reporting currency, IFRS basis, three operating regions."
DataThe specific numbers, records, or documents to analyse"Here is the vendor's invoice history for the last 12 months: [data]. Here is our PO ledger for the same period: [data]."
SituationalWhy you need this now — the trigger or audience"Audit committee meets Friday. Treasury flagged a 23% jump in vendor disbursements last month."
ConstraintsRules, limits, requirements the output must follow"All amounts in SGD. Materiality threshold SGD 250K. Cite IAS 36 references where impairment is mentioned."

Rule of thumb: Skip Domain → generic answers. Skip Data → hallucination. Skip Situational → wrong tone or depth. Skip Constraints → output ignores your standards.

Context in Action: Vendor Scorecard

[DOMAIN]
You are reviewing a strategic-sourcing vendor for AnyCompany
Financial Group's regional procurement team (SG-anchored, SEA scope).

[DATA]
Vendor: Acme Tech Solutions Pte Ltd (ID: V-08842)
Category: Technology Procurement (managed services)
Annual spend: SGD 1.4M → SGD 3.2M (FY26 vs FY25)
Active POs: 12 (up from 4)
Invoice exceptions: 18 in last 90 days (avg 3 prior)
Top exception type: PO-to-invoice price variance >5%

[SITUATIONAL]
Vendor proposed for Tier-1 status in FY27 sourcing review.
Category Head needs to take a position by next Tuesday.

[CONSTRAINTS]
- All amounts in SGD
- Reference our standard vendor-tier rubric (attached)
- Use ONLY the data provided above and the attached rubric
- Output must include a recommended Tier (1 / 2 / 3) and three key risks

Pillar 3: Role Assignment

RoleWhat changes in the output
Internal Audit LeadFocuses on assertions, evidence quality, sample design; flags hallucination risk
FP&A PartnerDrivers, comparisons, scenarios, narrative around variance
Tax ManagerCitation-heavy, conservative interpretation, materiality framing
Procurement Category HeadSpend exposure, vendor concentration, contract leverage points
Group Reporting LeadDisclosure-language tone, consistency with prior period, IFRS-paragraph references

Pro tip: Add experience and scope — "Senior FP&A Partner with 12 years in SEA financial services, specialising in monthly variance commentary for the regional CFO." That second sentence is what shifts the output from generic to expert.

Pillar 4: Output Framing

DimensionExample
Format"Respond as a 4-column table: Risk · Likelihood · Impact · Owner"
Length"In exactly 3 sentences" / "Under 200 words" / "Maximum 5 bullets"
Structure"Use sections: Summary · Findings · Recommendation · Open Questions"
Style"Write for a non-technical audit committee reader" / "Disclosure-language tone — consistent with the FY25 annual report"
Constraints"Do not include forward-looking statements" / "No personal opinions"

This is where most prompts fall short. People tell the AI what to do but not how to present it. Like asking a designer to "make a poster" vs "A3 landscape, headline at top, three bullets in the middle, contact info at bottom."

All Four Pillars Together

[ROLE] You are a Senior Internal Audit Lead at an SEA
financial-services group, specialising in technology
audits and analytics-driven sampling.

[CONTEXT] Below is the IT change-log extract for Q3 2026
(audit period). Materiality threshold for SOX-relevant
changes: SGD 250K. Total changes: 1,247. Emergency
changes (no CAB approval): 38. Scope: production
financial systems only.

[TASK] Identify the 5 highest-risk changes for
substantive testing, ranked by audit risk.

[OUTPUT]
Format as a table:
Rank | Change ID | System | Risk Driver | Test Procedure | Sample Size

After the table, write a 2-sentence summary I can drop
into the audit committee pre-read.

Use ONLY the data provided. Cite specific Change IDs.

See the Difference: Variance Commentary

❌ Without the four pillars

Prompt: "Write something about the marketing variance"

The marketing budget came in over plan this quarter, which is something to take note of. There were several reasons for this including some campaigns that ran longer than expected and some agency fees that were higher than originally budgeted. Overall the team should look into this and consider whether to adjust spending going forward.

⚠ Wall of text. No numbers. No drivers named. No action.

✅ With the four pillars

Role + Context + Task + Output framing — same data

Q3 Marketing Variance — Commentary
Headline: Marketing ran 8.2% over plan (SGD +1.41M), driven by two line items.

Drivers:
• Brand campaign extended into Sep — SGD +0.92M (65% of variance) — vs plan ended Aug
• Agency retainer renegotiated mid-Q — SGD +0.49M — phasing already aligned for Q4

Action: Q4 phasing already corrected; full-year guidance unchanged. Recommend committee note, not budget revision.

✓ Number-led, drivers cited, action clear, scannable in 30 seconds

Module 2

Chain-of-Thought
Reasoning

Make the AI show its work — step by step, audit-defensible

Why Chain-of-Thought?

Finance decisions are multi-step by nature. CoT makes the reasoning visible and auditable — which is the same thing a Big-4 audit reviewer would demand from your team.

Without CoT

"Is this intercompany
balance reconciled?"

→ "Yes, it reconciles."
(no working shown)

With CoT

"Walk through this step
by step."

SG ledger:    SGD 4.2M
TH ledger:    SGD 4.18M (after FX)
Difference:   SGD 0.02M (0.5%)
Materiality:  SGD 0.25M
0.5% < mat. → Reconciled

Four CoT Techniques

TechniqueTriggerBest for (this cohort)
Zero-Shot CoT"Think step by step"Period-close calculations · GL classification logic · simple variance math
Few-Shot CoTProvide one example with reasoningVendor risk classification · invoice categorisation · journal-entry coding
Step-Back"First identify principles, then apply them"Tax circular impact analysis · audit walkthrough planning · disclosure drafting
Self-Consistency"Solve three ways, report the majority"Materiality calls · impairment indicator review · high-stakes accounting judgements

Finance rule of thumb: Any decision that could be audited should use CoT — the reasoning trail is your documentation.

Zero-Shot CoT — Period-Close Reconciliation

Just add "Think step by step" to any prompt — no examples needed.

❌ Without CoT

PROMPT SG entity recorded SGD 4.20M intercompany payable to TH entity. TH entity recorded SGD 4.18M receivable (after FX at 24.5 THB/SGD spot rate). Materiality SGD 0.25M. Reconciled?
AI Response Yes, the intercompany balance is reconciled. The amounts are close enough.

⚠ No math. Not auditable. "Close enough" is not a control.

✅ With "Think step by step"

PROMPT Same data. + "Think through the reconciliation step by step."
AI Response SG payable: SGD 4.20M
TH receivable: SGD 4.18M (post-FX)
Difference: SGD 0.02M
% of payable: 0.02 / 4.20 = 0.48%
Materiality: SGD 0.25M (= 5.95% of payable)
Difference (0.48%) < materiality (5.95%)
Reconciled within materiality threshold. Recommend documenting FX rate timing as the likely cause of the 0.02M residual.

✓ Every step visible. Materiality applied. Residual explained.

Few-Shot CoT — Vendor Risk Classification

Provide one example with reasoning — the AI follows the exact same pattern across the batch.

Your prompt (with example)

PROMPT Classify each vendor as Tier 1 / Tier 2 / Tier 3 using our rubric. Show reasoning step by step.

EXAMPLE — Vendor A: Annual spend SGD 2.1M; sole-source for core system; contract expires 4 months.
Step 1 — Spend > SGD 1M threshold ✅
Step 2 — Sole-source = strategic dependency ✅
Step 3 — Contract expiry <6mo = leverage risk ⚠
Step 4 — TIER 1 (high spend + dependency + renewal risk).

NOW CLASSIFY:
Vendor B: SGD 0.4M annual; one of three approved providers; multi-year contract until FY28.

AI output (mirrors your pattern)

AI Response Step 1 — Spend SGD 0.4M < SGD 1M threshold ❌
Step 2 — Three approved providers = no dependency ✅
Step 3 — Contract until FY28 = no near-term renewal risk ✅
Step 4 — TIER 3 (sub-threshold spend + healthy alternatives + locked term).

Note: Re-evaluate in FY27 H2 ahead of contract anniversary.

Key insight: The AI copied your exact Step 1–4 format, applied the same rubric, and even added a forward-looking note — all from one example.

Step-Back Prompting — Tax Circular Impact

First identify the principles, then apply them to the specific case.

Your prompt

PROMPT Before analysing this new IRAS GST e-tax guide, first answer:
1. What are the 4 most relevant criteria for assessing GST-circular impact on a financial-services group?
2. What are common transition pitfalls in similar guides?

Now assess THIS guide using those criteria:
[Attached: IRAS e-Tax Guide on Reverse Charge for B2B services, 17-page PDF, effective 1 Jan 2027]

AI response

AI Response STEP 1 — KEY CRITERIA:
1. Scope of newly-taxable supplies
2. Effective date and transitional period
3. Compliance burden (system / process changes)
4. Materiality of impacted spend categories

STEP 2 — TRANSITION PITFALLS:
• PO-issued-before / invoiced-after timing rules
• Reverse-charge GL coding gaps
• Vendor-side certificate-of-residence requirements

ASSESSMENT — MEDIUM IMPACT
• In scope: cross-border managed services (Tech & Marketing Procurement) — ~SGD 12M FY27 spend
• Effective 1 Jan 2027; transition rules apply for POs raised before 30 Sep 2026
• System change required: GL mapping for reverse-charge GST input/output
Recommend: Tax + Procurement joint task force in October to scope PO-cutoff strategy.

Why Step-Back works: The AI built a framework FIRST, then applied it systematically — instead of jumping straight to "looks like medium impact." This is exactly what your Tax team would do manually.

Self-Consistency for High-Stakes Calls

Solve three ways, report the majority — for decisions where being wrong is costly.

Your prompt

PROMPT Assess whether this asset shows impairment indicators per IAS 36.

Analyse using 3 independent approaches:
1. External: Market / industry / regulatory signals
2. Internal: Operating performance / cash-flow signals
3. Quantitative: Carrying value vs recoverable amount estimate

For each: conclude IMPAIRMENT INDICATED / NOT INDICATED.

Final assessment = majority. If approaches disagree → flag for partner-level review.

AI response

AI Response Approach 1 — External: Industry consolidation; key competitor wrote down similar asset Q2. → INDICATED

Approach 2 — Internal: Operating margin compressed 320 bps YoY; pipeline coverage at 0.8× (vs 1.4× plan). → INDICATED

Approach 3 — Quantitative: Carrying SGD 18.4M vs preliminary VIU SGD 19.1M (margin 3.8%). → NOT INDICATED

MAJORITY: 2 of 3 INDICATED
Recommend: Run formal impairment test for FY26 close. Engage component auditor early.

Why three approaches? Approach 3 alone would have cleared the asset. The majority vote catches what a single quantitative test misses — exactly why your auditors triangulate.

See the Difference: Audit Sampling Decision

❌ Without CoT

Prompt: "Should we use 100% testing or sampling for the IT change-log review?"

I would recommend sampling. Sampling is a reasonable approach when the population is large and the controls are operating. It provides good coverage and is efficient. The team should select a sample and test it.

⚠ No math, no risk reasoning, no sample-size justification

✅ With Chain-of-Thought

Prompt: "Walk through the sampling decision step by step against our methodology."

Step 1 — Population: 1,247 changes (Q3). 38 emergency changes (no CAB) = high-risk subpopulation.
Step 2 — Materiality: SOX scope only → 412 changes touch financial systems.
Step 3 — Risk: Emergency changes are higher-risk, suggesting stratified approach.
Step 4 — Method: 100% test on the 38 emergency changes; statistical sample of 47 from the remaining 374 (95% conf, 5% tolerable error).
Decision: Stratified sampling. Total tests: 85 (vs 412 for 100%). Justified by emergency-change risk concentration.

✓ Every step shown, sample-size method named, defensible against partner review

Module 3

Persona &
Multi-Perspective

Same data, different eyes — pick the perspective that fits your audience

The Persona Formula

You are [TITLE] at [ORGANISATION TYPE]
with [X years] of experience in [SPECIALTY].
You are known for [CHARACTERISTIC].
When [SITUATION], you always [BEHAVIOR].
🛡 Conservative Audit Lead

"Senior Internal Audit Lead, 14 years in financial-services audit. Known for evidentiary rigour. When in doubt, requests additional substantive testing rather than relying on controls reliance."

📈 Forward-Looking FP&A Partner

"FP&A Partner, 9 years in regional planning for SEA financial services. Known for crisp narrative. Always pairs the variance number with a forward-look on Q4 implications."

The last two fields matter most: CHARACTERISTIC and BEHAVIOR shape the AI's judgement calls. "Evidentiary rigour" vs "crisp narrative" produce completely different outputs from the same data.

Same Data, Different Eyes

Data: Q3 revenue ran 6% under plan; new product line accounts for 80% of the gap; competitor disclosed similar headwind in their Q3 release.

🛡 Conservative Audit Lead read
FINDING — REQUIRES TESTING

Concern: Revenue under-plan by 6% concentrated in a single new product line raises completeness and cut-off concerns. Competitor parallel may indicate sector dynamic or may indicate management bias on accruals.

Test plan:
• Substantive cut-off testing on Q3-end revenue
• Recalculation of contract-asset balances on the new product line
• Inquiry of revenue-recognition policy applied vs prior period

Recommend: Formal management representation re: revenue completeness; flag as Significant Risk for audit-committee pre-read.
📈 Forward-Looking FP&A Partner read
VARIANCE COMMENTARY — Q3 REVENUE

Headline: Revenue 6% under plan (–SGD 8.4M); 80% concentrated in new product line; pattern matches sector signal from competitor's Q3 release.

What it means:
• Sector headwind, not idiosyncratic — recovery likely synchronous with peers
• New product line ramp 1 quarter behind, not broken — pipeline coverage holding at 1.2×

Forward look: Q4 still trackable to full-year guidance if Oct close lands at plan. Recommend committee note, not full-year revision. Partner with Reporting on disclosure language.

Multi-Perspective in One Prompt

Get three perspectives in one prompt — instead of scheduling three meetings.

Your prompt

PROMPT Evaluate this proposed accounting-policy change (capitalising internally-developed software above SGD 250K threshold):

🛡 INTERNAL AUDIT: Documentation rigour, audit-trail risk, judgement-area exposure
📈 FP&A: P&L phasing impact, KPI distortion, planning-process implications
📋 GROUP REPORTING: Disclosure burden, comparability with peers, IAS 38 application

Each: 3 concerns + Recommendation + 1 metric to monitor.

Then: Balanced synthesis from all three views.

AI response (synthesis excerpt)

AI Response 🛡 AUDIT: "Capitalisation requires documented technological-feasibility tests per project. Risk of inconsistent application across squads. Recommend: central register of capitalisation decisions with quarterly review."

📈 FP&A: "Shifts ~SGD 12M from opex to amortisation over 3 years. Distorts run-rate margin comparisons unless prior periods adjusted. Reforecast required."

📋 REPORTING: "IAS 38.57 criteria need explicit policy disclosure in FY27 financials. Two SGX peers disclose similar threshold — comparable."

SYNTHESIS: Adopt with three guardrails — central capitalisation register (Audit ask), prior-period restatement of run-rate margin (FP&A ask), accounting-policy disclosure paragraph drafted by Reporting before Q1 close.

Why this works: Forces balanced analysis. No single function dominates. The synthesis is where the real recommendation lives — a policy adoption with three named guardrails.

Same Data, Different Audiences

Data point: "Q3 expense variance ran 8% over plan, driven by marketing extension and agency renegotiation"

AudiencePersona promptOutput style
Audit Committee"You are the CFO presenting a 5-minute update to the audit committee chair"Strategic, control-aware, no surprises framing
Regional CFO"You are the FP&A Partner pre-briefing the regional CFO before the close call"Number-led, driver-aware, forward-look on Q4
Functional Owner (CMO)"You are the FP&A business partner briefing the CMO on her budget"Decision-oriented — what to do about Q4 phasing
Group Reporting"You are the Group Reporting Lead drafting MD&A language"Disclosure tone, prior-period consistent, IAS-paragraph aware

💡 In-room exercise (10 min): Pick the data point above. Write prompts for two of these audiences. Compare how the tone, depth, and recommendations change. The data is identical — only the framing moves.

Module 4

Structured Outputs
& Markdown

Get consistent, comparable, machine-readable output every time

Why Structure Matters

Unstructured = Conversation

Different every time. Hard to compare. Can't drop into the next step. Requires human re-formatting.

Structured = Form

Consistent format. Comparable across items. Drops into Excel / your reporting pack / your audit working papers without re-typing.

Daily-work cases for this cohort:

  • Procurement: Invoice-line extraction → AP ledger import
  • Controllership: Journal-entry classification → reconciliation working paper
  • Audit: Control-test result coding → consistent finding template
  • Reporting: MD&A bullet drafting → disclosure-template population
  • Tax: Circular impact-row extraction → impact-assessment workbook

How to Get Structured Output

Tell the AI exactly what shape the output should take. The more specific, the more consistent.

TechniquePrompt example (this cohort)What you get
Named sections"Use these sections: Headline · Drivers · Forward Look · Action"Same headings every time — scannable, comparable
Table format"Present as a table: Risk · Likelihood · Impact · Owner · Test Procedure"Aligned data, drops into Excel / audit papers
JSON output"Return JSON: {tier, spend_band, dependency, renewal_window, risks[]}"Machine-readable, feeds workflows
Numbered actions"List 3 actions. Each: action · owner · deadline · priority (H/M/L)"Accountability built in
Rating + justification"Give a Tier 1/2/3 rating. Justify in exactly 2 sentences."Consistent decision format across the batch
Length control"Executive summary: max 3 sentences. Detail: max 250 words."Right depth for the audience
Markdown output"Save as .md with ## headings, bullet lists, and | tables"AI-native format — low tokens, reusable, versionable

Pro tip — combine techniques: "Use sections — Headline (3 sentences), Driver Table (Driver | SGD | % of variance | Phasing), Action (numbered, with owner and deadline). Return the audit-committee-ready summary as JSON at the end."

The Best Default Format: Markdown

When you ask AI to save output as a file or produce a reusable document, Markdown wins on every dimension that matters:

FormatHuman readableAI parses cleanlyToken costReusable
PDFN/A
Word⚠️N/A
HTML⚠️High
Markdown ✓Low

HTML heading (~20 tokens):

<h2 id="section-4-2">4.2 Capitalisation Threshold</h2>

Markdown heading (~8 tokens):

## 4.2 Capitalisation Threshold

Ask AI to "save as .md" — you get structured headings, tables, and lists with ~60% fewer tokens than HTML. Readable by you, parseable by AI, and on Day 2 every Cowork Skill (SKILL.md), Project Instruction, and saved artefact will be Markdown.

Why Markdown? The Numbers

Not just a preference — Markdown is measurably better for AI work:

60%
fewer tokens than HTML
for the same content structure
35%
better RAG retrieval accuracy
with clean Markdown vs unstructured text
61%
table-extraction accuracy in Markdown
vs 54% for HTML tables
llms.txt
2024 web standard — sites now serve Markdown
specifically for AI agents to read

What this means for you: Cowork Project Instructions and saved Skills load on every request. Concise Markdown = lower cost, better accuracy, and outputs that stay reusable across tools. Detailed sources in the M8 Prompt Engineering explainer.

Module 5

RAG Grounding
for Finance

Make every AI claim traceable to YOUR documents — not its training data

The Grounding Problem

Without grounding rules, the AI mixes its training data with your documents — and you can't tell which is which.

❌ Without grounding

  • Fills gaps with plausible fiction — invents policy details that aren't in your handbook
  • Hedges with "typically" / "usually" — masking that it's guessing
  • Mixes sources invisibly — your doc + training data, no way to separate
  • Never says "I don't know" — answers confidently regardless

✅ With grounding rules

  • Every claim cites the source — section / paragraph after each statement
  • Admits gaps explicitly — "[INSUFFICIENT DATA]" instead of inventing
  • No outside knowledge — strictly from the provided documents
  • Audit-ready output — every claim is traceable

Why this matters for this room: If a Tax interpretation, audit conclusion, or Procurement contract clause is challenged, "the AI told me" is not a defence. Grounding rules turn AI output from a starting point into a defensible reference.

RAG — The 4 Grounding Rules

Add these rules to any prompt where defensibility matters:

CRITICAL RULES:
1. Base your answer ONLY on the provided documents
2. After each claim, cite the source: [Doc · Section]
3. If not covered: write "[INSUFFICIENT DATA: not in
   provided documents]"
4. Do NOT use outside knowledge or assumptions

DOCUMENTS:
[Doc 1: Procurement Policy v4.2 — Variance & Tolerance]
[Doc 2: Standard MSA Template — Pricing & Adjustment Clauses]

QUESTION: What is our policy if a vendor invoice is more
than 10% above the matched PO total?

When to use: Policy lookups · Tax circular Q&A · audit-procedure determination · contract-clause review · disclosure-language drafting · regulatory impact assessment — any task where the answer must be traceable to a specific document.

See the Difference: Procurement Policy Q&A

❌ Without RAG Grounding

Prompt: "What's our policy if an invoice is over PO by 10%?"

If a vendor invoice exceeds the PO by more than 10%, the standard practice is typically to require a change order before payment. Late fees of around 1–2% per month are usually applied to disputed amounts. The vendor's spend pattern is reviewed and they may be flagged for category-head escalation if the variance is recurring.

⚠ Sounds plausible but NONE of this may be in your actual policy. "Typically" and "usually" = hallucination.

✅ With RAG Grounding

Prompt: "ONLY from provided documents" + Procurement Policy attached

Per Procurement Policy v4.2:
• PO-to-invoice tolerance is 5% (not 10%) — variances above tolerance require a change order before approval. [Procurement Policy v4.2 · §3.4]
• Variances above 15% require Category Head sign-off in addition to the change order. [§3.4.2]
[INSUFFICIENT DATA: late-fee policy not addressed in provided documents — refer to MSA Template clauses]

✓ Every claim cites a section. Admits what isn't covered. Caught the 10% misconception.

Module 6

Best Practices &
Common Anti-Patterns

The seven mistakes everyone makes — and the fixes that take 30 seconds

7 Prompt Mistakes Everyone Makes

MistakeWhy it hurtsQuick fix
The Kitchen SinkCramming 5 tasks into 1 promptOne task per prompt — chain results
The Blank CanvasNo examples → AI guesses your formatShow 1–2 examples of desired output
The Trust FallNo grounding → confident hallucinations"ONLY from provided data"
The Vague Ask"Analyse this" — analyse what, how, for whom?Specify audience, format, length
The One-Shot WonderExpecting perfection on first tryPlan for 2–3 refinement turns
The Copy-Paste TrapSame prompt for different modelsTune the framing per model tier
The Set-and-ForgetNever re-testing after model updatesMonthly health check on production prompts

The Draft → Score → Revise Loop

Don't accept the first output. Build a self-improving cycle into your prompt:

Step 1 — DRAFT: Write a 200-word audit-committee summary
  of the IT change-log review using the data provided.

Step 2 — SCORE: Rate your draft on these criteria:
  • Completeness (0-5): All four required sections?
  • Grounding (0-5): Every claim cites a Change ID?
  • Actionability (0-5): Specific test plan with sample sizes?

Step 3 — REVISE: If total < 12 of 15, rewrite to fix the
  lowest-scoring area. Maximum 2 revisions.

Output ONLY the final version after the loop completes.

Why this works: The AI self-corrects before you read it. Teams using this pattern report 40–60% fewer revision cycles. Three lines added to the prompt. Zero technical skill required.

Break Big Tasks into Small Steps

Complex tasks fail when you ask for everything at once. Decompose instead:

❌ One Giant Prompt

"Analyse our Q3 close exceptions,
identify the top intercompany
mismatches, calculate the
materiality impact, compare to
Q2, draft a controllership
note, and recommend three
process improvements."

6 tasks = shallow work on each

✅ Chained Prompts

P1: "Analyse Q3 close
  exceptions and flag
  intercompany mismatches"
P2: "From these mismatches,
  identify the top 3 by
  materiality"
P3: "Compare Q3 vs Q2 on
  the same three"
P4: "Draft a controllership
  note with three process
  improvements"

Each step gets full attention

Tell the AI What NOT to Do

Positive instructions tell the AI what to include. Negative constraints prevent common failure modes:

Problem you keep seeingNegative constraint to add
AI adds unsolicited opinions"Do not include personal opinions or speculation"
AI uses data not in your input"Do not reference any data outside the provided documents"
AI writes too much"Do not exceed 300 words. Do not add a conclusion section."
AI hedges everything"Do not use 'it depends' or 'generally speaking'"
AI explains the obvious"Do not explain what materiality is or how variance commentary works"
AI invents numbers"If a metric is not in the data, write [DATA NOT AVAILABLE]"

Pro tip: After your first test run, note what went wrong and add a "Do NOT" line for each issue. Your prompt improves with every iteration.

Decision Rules: Override Subjective Judgement

Different models give different reads on the same data. Decision rules encode YOUR policy:

❌ Without rules

Sonnet 4.5: "Variance 8% over plan = significant" → flag for committee
Sonnet 4.6: "Variance within reasonable range" → routine commentary
Same data, two reads, no policy reference.

✅ With decision rules

RULES: Variance >5% AND >SGD 1M → flag for committee · Variance >5% only → category-head note · Variance ≤5% → routine commentary

Sonnet 4.5: "8% variance, SGD +1.41M ⇒ flags both rules" → committee
Sonnet 4.6: "8% variance, SGD +1.41M ⇒ flags both rules" → committee
Both agree. Policy enforced.

Use when: Consistency matters more than nuance — variance flagging, materiality calls, audit-finding severity, vendor tier-rating, capitalisation thresholds. If your team has a written threshold, encode it in the prompt.

Structure Your Prompts Like Documents

Well-organised prompts produce well-organised outputs. Use clear sections and delimiters:

### ROLE
You are a Senior IRT Team Lead at AnyCompany Support.

### CONTEXT
<<<
[Paste the segmental P&L extract or disclosure draft here]
>>>

### TASK
Draft the Operating & Financial Review section for the Q3 2026 board pack.

### OUTPUT FORMAT
- Headline (1 sentence, leads with the number that matters)
- Driver paragraphs (3 max, in order of variance contribution)
- Forward look (max 100 words)
- Disclosure-language tone, consistent with FY25 annual report

### CONSTRAINTS
- Use ONLY the data provided above
- All amounts in SGD
- Do not exceed 400 words total
- Do not include forward-looking statements not already in management commentary

Why delimiters matter: The triple angle brackets tell the AI "this is input data, not instructions." Without them, the AI may follow instructions inside a pasted document instead of analysing it — especially dangerous when pasting policy text or contract clauses.

Show, Don't Tell — One Example Beats 100 Words

Especially for batched classification work — like the journal-entry coding the Controllership team does monthly:

❌ Telling

"Code each journal entry as
direct cost, indirect cost,
or capex based on the GL
account, vendor, and
description. Format as a
table with JE ID, code,
and reasoning."

~50 words of instruction. AI still guesses your format.

✅ Showing

"Code each journal entry like this:

| JE ID | Code | Reason |
| JE-1142 | DC | Vendor invoice for
  client engagement deliverables |
| JE-1156 | CAPEX | Software license
  >SGD 250K, 3-year term |

Now code these: [data]"

One example = perfect format every time

The 3-Round Improvement Workflow

Every production-quality prompt goes through this cycle:

RoundWhat you doWhat improves
R1: BaselineWrite your first prompt using the four pillars. Run it 3 times.You see what the AI gets right and wrong
R2: Fix failuresAdd negative constraints for each failure. Add one example of good output. Run 3 more times.Consistency jumps from ~60% to ~85%
R3: PolishAdd the self-review (draft-score-revise) step. Tighten length and format. Test edge cases.Production-ready at ~95% consistency

Total time: 15–20 minutes to go from first draft to production template. The template then saves hours every week.

Build a Team Prompt Library

Your best prompts are team assets, not personal notes. Treat them like shared templates:

📋 Metadata each prompt should include

  • Prompt name and purpose
  • The full prompt with {{variables}}
  • Recommended model tier (Haiku / Sonnet / Opus)
  • 1–2 example outputs (good vs bad)
  • Known limitations and edge cases
  • Last tested date and model version

🎯 Starter library for this cohort

  • Vendor scorecard (Procurement)
  • Variance commentary (FP&A)
  • Audit finding writeup (Internal Audit)
  • Tax circular impact memo (Tax)
  • MD&A bullet draft (Reporting)
  • Intercompany reconciliation note (Controllership)

Day 2 connection: In Cowork, every entry in this library becomes a saved Skill (a SKILL.md file). The library moves from "shared Word doc" to "installable bundle the whole team uses with one click." That's why we save them as Markdown.

Module 7

Managing Long
Conversations

Why AI "gets dumber" mid-session — and what to do about it

Why AI "Gets Dumber" Mid-Conversation

It's not a bug — it's a context window problem. Every AI has a finite "working memory."

⚙ What happens inside

  • Every message + every AI response stays in the context window
  • At 60–70% capacity, performance drops sharply — sudden cliff, not gradual
  • The AI compresses and deprioritises earlier messages
  • "Lost in the Middle" — AI remembers the start and end best, forgets the middle (Liu et al. 2023, one of the most cited AI papers)

👀 What you experience

  • AI contradicts instructions you gave 10 messages ago
  • AI re-introduces ideas you already rejected
  • AI ignores constraints from the start of the chat
  • Outputs get vague, generic, or repetitive
  • AI starts hallucinating more frequently

Key insight: Most people blame the AI for "getting stupid." The real problem is the conversation got too long. The fix is context management, not a bigger model.

5 Rules for Managing Long Conversations

RuleWhy it works
One task per session — don't mix audit, FP&A, and contract review in one chatEach session gets full attention capacity
Paste only what's relevant — don't dump the entire policy doc when you need three clausesReduces noise, keeps the AI focused
Key instructions at start AND end — restate critical constraints in the closing lineExploits primacy + recency bias
Keep sessions under 15–20 turns — start fresh after thatStays in the performance sweet spot
Use session summaries — ask AI to summarise, paste into a new sessionFresh context window with all the accumulated knowledge

The Session Summary Technique

When a conversation gets too long but you can't lose the state:

Step 1 — Ask for a summary

In the OLD session Summarise our conversation so far:
• Key decisions we made (and why)
• Data and findings established
• What still needs to happen next

Format as a briefing I can paste into a new session.

Step 2 — Start fresh with context

In the NEW session Here is the context from our previous session:

[PASTE SUMMARY HERE]

Continue from where we left off. The next step is to draft the audit-committee pre-read using the findings established above.

✓ Fresh context window + all accumulated knowledge = best of both worlds

Think of it as "saving your game." You compress hours of conversation into a focused briefing, then load it into a fresh session with full attention capacity.

"Save Your Game" — Memory for Long Projects

For projects that span weeks or months, you want two files — not one growing document:

📋 project-status.md

Load every session — compact, ~2 pages

  • What exists now (file list, decisions made)
  • What's remaining (next steps)
  • Key rules and constraints

Like a project brief — current state only

📓 session-log.md

Load only when needed — grows over time

  • What was done each session
  • Why decisions were made
  • Edge cases and gotchas

Like meeting minutes — history archive

WhenWhat to say
Start session"Here's my project context: [paste project-status.md]"
End session"Update project-status.md with the current state. Append today's work to session-log.md."
Look back"Read session-log.md — when did we change the materiality threshold and why?"

In Cowork (Day 2): The project-status pattern becomes your Project Instructions — loaded automatically on every conversation in that project. The session-log becomes a file in your mounted folder. Persistent memory without the manual paste.

The #1 Misconception: "AI Remembers Me"

It doesn't. Each session starts with zero memory. Here's what AI actually sees:

What people think
  • "The AI remembers our conversation from last week"
  • "It knows what I worked on yesterday"
  • "I should keep this session open so it doesn't forget"
  • "My old browser tabs are giving it context"
How it actually works
  • Each session starts with zero memory
  • AI sees only: your current message + this session's history
  • Old tabs / sessions have no effect on new ones
  • Closing old sessions is safe — it's cosmetic, not functional

Mental model: chat is ephemeral, files are permanent. The AI's "memory" is the files it created — reports, templates, Skills. Those persist. The conversation that produced them does not. When you need context in a new session, reference the files — not the old chat.

When to Start Fresh vs. Continue

🟢 Start a new session

  • Switching to a completely different task
  • The conversation has gone off track
  • You want to test a refined prompt cleanly
  • Session is past 15–20 turns
  • AI keeps repeating the same mistake
  • AI contradicts earlier instructions

🔵 Continue the session

  • Iterating on the same output
  • You need the AI to remember earlier context
  • Building step by step (funnel pattern)
  • Refining format or tone
  • Follow-up questions on the same topic
  • Session is still under 15 turns

The 3-strike rule: If you've corrected the AI three times and it's still wrong — the context is working against you. Start fresh. It's faster than fighting a polluted conversation.

Circuit Breaker Patterns

PatternSymptomFix
Repetition LoopSame wrong output after correctionNew session, rephrase from scratch
Hallucination SpiralInventing data not in your inputAdd: "Use ONLY provided data"
Over-Eager Helper2,000 words for a 5-bullet askAdd: "Exactly 5 bullets, under 20 words each"
Format DriftFormat changes mid-outputAdd: "Continue using EXACTLY the same format"
Confidence TrapUncertain info presented as factAdd: "Prefix uncertain claims with [UNCERTAIN]"

Take it home

Quick Reference
Card

The 12 trigger phrases that make every prompt better — copy this slide

12 Trigger Phrases — Pinned to Your Desk

TechniqueTrigger phrase to add to your prompt
Zero-Shot CoT"Think step by step before answering"
Expert Persona"You are a Senior [ROLE] with X years in [SPECIALTY]"
Multi-Perspective"Present the case FOR and AGAINST" / "Three voices: Audit · FP&A · Reporting"
Structured Output"Use EXACTLY these sections: 1… 2… 3…"
RAG Grounding"Base your answer ONLY on the provided documents"
Self-Critique"Review your draft: Is every claim supported by data?"
Meta-Prompting"Write the best possible prompt for [TASK]"
Negative Constraints"Do NOT include / Do NOT use / Do NOT exceed"
Decision Rules"If [metric] > [threshold] → MUST be [rating]"
Task DecompositionBreak 1 big prompt into 3–4 focused prompts
Draft → Score → Revise"Draft, score on [rubric], revise if < threshold"
Show, Don't TellInclude 1–2 examples of desired output format

That's the Appendix

Pull this deck up any time the room asks "but how do I actually do that?"

Live alongside · M7 Prompt Engineering Explainer · The Day 1 Case Context Summarizer Exercise

Site: https://grabsupport.lab.mywcloud.net

AnyCompany Support · GS Cyborg Edition · June 2026