← Workshop Home

💰 Tokens, Cost & Model Selection

How AI pricing works, why different models cost differently, and how to make smart selection decisions — the GS Cyborg builder's guide to AI economics at GS scale.

What Are Tokens?

AI models don't read words like you do. They break text into tokens — small pieces that the model processes one at a time. Think of it like a cash register that counts items, not bags.

📝

1 token ≈ ¾ word

Common words = 1 token. Long or rare words split into pieces.

💵

You pay per token

Both what you send (input) and what AI generates (output) cost tokens.

🔢

Numbers are expensive

"$4,200.50" = 5+ tokens. Financial data costs more than narrative text.

📊

Format matters

Markdown uses 60% fewer tokens than HTML for the same content.

Token Estimation for GS Support Work

ContentApproximate tokensAnalogy
A short question ("Summarise this case")~5 tokensA single sentence on a Post-it
A 10-line D365 case description~150 tokensHalf a page of notes
Your engineered Case Summarizer template~400 tokensA one-page playbook
A full case context summary (Symptom · Severity · Booking · Action · Next Step)~800 tokensA two-page brief
Total per case (input + output)~1,350 tokensA three-page document
⚠️ Key insight for GS: A datalake row like BK-2026-4821, IRT, P1, 22:14:00, SGD 24.50, refunded uses ~25 tokens — while the same length of English text uses only ~12 tokens. Booking IDs, timestamps, and amounts are "expensive" because the tokenizer never learned to compress them efficiently. At 400k MIWI cases / month, picking the right format saves real money.
💡 Why this matters for your team: When you send a spreadsheet to AI, you're paying for every comma, dollar sign, and decimal point. Summarizing data in narrative form ("Revenue grew 271% from $4,200 to $15,600") is cheaper than pasting raw tables — and often produces better AI output too.

How AI Pricing Works

AI pricing is simple: you pay per token, both in and out. Think of it like a taxi meter — the meter runs while you talk (input) AND while the AI responds (output). Output tokens cost 3–5× more because generating text is computationally harder than reading it.

THE PRICING FORMULA
Cost = (Input tokens × Input price) + (Output tokens × Output price) Example: Case Context Summarizer on Claude Sonnet 4 Input: 550 tokens × $3.00/million = $0.00165 Output: 800 tokens × $15.00/million = $0.01200 Total per assessment: $0.01365 That's 1.4 cents per assessment. An analyst takes 30 minutes ($25).

The Price Spectrum

Prices vary dramatically — from fractions of a cent to dollars per million tokens. Here's the landscape of models available on Amazon Bedrock:

ModelProviderInput / 1MOutput / 1MBest for
Nova MicroAmazon$0.035$0.14Classification, routing
Nova LiteAmazon$0.06$0.24Drafts, summaries
Llama 4 Maverick 17BMeta$0.24$0.97Multimodal, cost-effective
DeepSeek V3.2DeepSeek$0.27$1.10Coding, general tasks
Mistral Large 3Mistral AI$0.50$1.50Multilingual, structured
Llama 3.3 70BMeta$0.72$0.72Open-weight balanced
Nova ProAmazon$0.80$3.20Reports, analysis
Claude Haiku 4.5Anthropic$1.00$5.00Quality + speed balance
Nova PremierAmazon$2.50$10.00Complex multimodal
Claude Sonnet 4.6Anthropic$3.00$15.00Complex reasoning
Claude Opus 4.7Anthropic$5.00$25.00Deepest multi-step tasks

Pricing as of May 2026 (on-demand, US regions). Check aws.amazon.com/bedrock/pricing for current rates. Additional models available: Qwen3, Kimi K2, NVIDIA Nemotron, Writer Palmyra, and more.

✅ The key insight: The same task can cost 1 cent or 1 dollar depending on which model you choose. Picking the right model for each task is the single biggest cost lever — far more impactful than optimizing prompt length.

Data Privacy — Why Bedrock Is Different

💡 With Amazon Bedrock: Your data stays in your AWS account — it is not used to train the models. You control the region, encryption, and access. All API calls are logged and auditable via CloudTrail. This is different from using ChatGPT or Claude.ai directly — Bedrock provides enterprise-grade data isolation.

The 3 Model Tiers

Model names change every few months. Instead of memorizing names, think in tiers — match your task complexity to the right level of capability. It's like hiring: you don't need a senior consultant for data entry.

Fast & Cheap

Simple tasks
Pattern matching
$0.04–$1/M tokens
Junior analyst

🎯

Balanced

Moderate reasoning
Quality + speed
$1–$5/M tokens
Senior analyst

🧠

Deep Reasoning

Complex analysis
Multi-step logic
$3–$75/M tokens
Expert consultant

Which Tier for Which GS Task?

GS taskTierWhy
PAC tagging (Symptom L1/L2/L3)⚡ FastClassification, no deep reasoning — speed and cost matter most at 400k cases / mo
SORT public-comment filter (Marketing vs Real Support)⚡ FastBinary classification at high volume — Project Snapshoot territory
Pax / Mex first-response email drafts🎯 BalancedNeeds empathy and nuance, not deep analysis
Case Context Summarizer (IRT, MIWI, DSAT)🎯 BalancedStructured synthesis, moderate complexity — your Day 1 anchor build
SOP Lookup with citation (RAG)🎯 BalancedRetrieval-augmented Q&A — your Day 2 anchor build
True-Safety vs downgrade decision (high-stakes triage)🧠 DeepJudgement call with regulatory exposure — Opus territory until trust is earned
Bulk MIWI auto-tagging (400k / month)⚡ FastCost-effective at scale — cheapest model that meets the QA accuracy bar
✅ The golden rule: Start with the cheapest tier that might work. Test it. If quality isn't good enough, move up one tier. Don't start with Deep Reasoning for a task that Fast can handle — you'll pay dollars for something that costs pennies.
💡 Why models perform differently: More parameters = more "knowledge" stored, but also slower and more expensive. A 70B-parameter model has seen more patterns than a 7B model. Some models use mixture-of-experts (MoE) where only a fraction of parameters activate per token — making them faster without losing quality.

Cost vs. Capability Spectrum

Models on Amazon Bedrock grouped by tier. Click a tier to explore the models inside.

Cost per 1M tokens → Intelligence → $0 $0.50 $1.00 $2.00 $3.00 $5.00 Low Med High
Click a tier or model to see details

Explore the 3 tiers: Fast & Cheap (under $1), Balanced ($1–$2), and Deep Reasoning ($2–$5). Each tier has multiple models from different providers.

5 models in Fast tier 3 models in Balanced tier 3 models in Deep tier
Amazon Anthropic Meta Mistral AI DeepSeek Data: Artificial Analysis · May 2026

Model Selection Simulator

Pick a task, adjust the volume, and watch how cost changes across tiers. The right model choice can save your team thousands per month.

1. What's the task?

🎧Case Summarizer
🏷️PAC Auto-Tagging
💬First-Response Email
🔍SOP Lookup
📂Doc Classification

2. How many per month?

Volume
200
Best value
Fast & Cheap
$0.02
per month
Quality fit
Recommended
🎯
Balanced
$0.54
per month
Quality fit
Premium
🧠
Deep Reasoning
$2.64
per month
Quality fit
💸 vs. manual processing: An analyst costs $5,000/month for this task
99.9% saved
Put it this way: 200 case summaries with Claude Sonnet 4 costs less than a single cup of kopi at the office ($2.64). The same work would take a TL ~100 hours of reading and writing.
💡 Recommendation: For Case Context Summarizer, use Balanced (Claude Sonnet 4). The task requires structured synthesis and policy-aware drafting, but doesn't need frontier reasoning. Lightweight models miss nuance in long chat transcripts; Opus is overkill for routine MIWI cases.

5 Cost Optimization Levers

Once you've picked the right model tier, these strategies reduce cost further. Ordered by impact:

🎚️

1. Right-size your model

The biggest lever. Use Nova Micro for classification, Sonnet for complex analysis. Model choice matters more than anything else.

💾

2. Prompt Caching

Up to 90% savings. Cache your template — pay full price once, 10% for every reuse. Perfect for repeated tasks.

📦

3. Batch Processing

50% savings. Submit requests in bulk (not real-time). Ideal for monthly portfolio assessments.

🔀

4. Intelligent Routing

Up to 30% savings. Bedrock auto-routes simple tasks to cheaper models, complex ones to powerful models.

✂️

5. Optimize Prompts

10–40% savings. Remove redundant instructions, use shorter examples, constrain output length. Markdown instead of HTML saves 60% on formatting tokens.

Context Windows: How Much Can the Model "See"?

The context window is the maximum text the model can process at once — your prompt + the AI's response must fit within it.

ModelContext windowEquivalentPractical meaning
Nova Micro128K tokens~100 pagesA short book
Nova Pro300K tokens~230 pagesA long report
Claude Sonnet 4200K tokens~150 pagesA full policy manual
💡 For GS Support: A typical D365 case + a few SOPs + your prompt template fits easily within any model's context window. You'd only hit limits with very long chat transcripts (40+ turns in THA / VNM) or when reading the entire SOP corpus. When you do, use RAG to feed only the relevant SOPs — that's exactly what the SOP Lookup agent does.

What You Control in Each Tool

ToolWho picks the modelWhat you control
Claude (Cowork)Anthropic (by plan tier)Your prompt quality
KiroAuto-selected by taskYour prompt quality
CursorYou choose per conversationModel + prompt quality
Bedrock PlaygroundYou choose explicitlyModel + prompt + parameters
✅ Key takeaway: In most AI tools, you don't choose the model — the tool does. Focus on writing great prompts and designing good workflows. The prompt engineering skills you learn today work regardless of which model or tool you use. When you DO have model choice (Cursor, Bedrock), use the tier framework.

Workshop Connection

Concept from this pageWhere you'll apply it
Token estimationUnderstanding why prompt length matters for cost and quality
Model tiersDay 1 Demo: Model Arena — compare 3 models on the same task
Cost optimizationMaking the business case for AI adoption in your team
Context windowsManaging long conversations — knowing when to start fresh
Decision frameworkDay 2: Planning your first agent's cost profile