How AI pricing works, why different models cost differently, and how to make smart selection decisions — the GS Cyborg builder's guide to AI economics at GS scale.
AI models don't read words like you do. They break text into tokens — small pieces that the model processes one at a time. Think of it like a cash register that counts items, not bags.
Common words = 1 token. Long or rare words split into pieces.
Both what you send (input) and what AI generates (output) cost tokens.
"$4,200.50" = 5+ tokens. Financial data costs more than narrative text.
Markdown uses 60% fewer tokens than HTML for the same content.
| Content | Approximate tokens | Analogy |
|---|---|---|
| A short question ("Summarise this case") | ~5 tokens | A single sentence on a Post-it |
| A 10-line D365 case description | ~150 tokens | Half a page of notes |
| Your engineered Case Summarizer template | ~400 tokens | A one-page playbook |
| A full case context summary (Symptom · Severity · Booking · Action · Next Step) | ~800 tokens | A two-page brief |
| Total per case (input + output) | ~1,350 tokens | A three-page document |
BK-2026-4821, IRT, P1, 22:14:00, SGD 24.50, refunded uses ~25 tokens — while the same length of English text uses only ~12 tokens. Booking IDs, timestamps, and amounts are "expensive" because the tokenizer never learned to compress them efficiently. At 400k MIWI cases / month, picking the right format saves real money.
AI pricing is simple: you pay per token, both in and out. Think of it like a taxi meter — the meter runs while you talk (input) AND while the AI responds (output). Output tokens cost 3–5× more because generating text is computationally harder than reading it.
Prices vary dramatically — from fractions of a cent to dollars per million tokens. Here's the landscape of models available on Amazon Bedrock:
| Model | Provider | Input / 1M | Output / 1M | Best for |
|---|---|---|---|---|
| Nova Micro | Amazon | $0.035 | $0.14 | Classification, routing |
| Nova Lite | Amazon | $0.06 | $0.24 | Drafts, summaries |
| Llama 4 Maverick 17B | Meta | $0.24 | $0.97 | Multimodal, cost-effective |
| DeepSeek V3.2 | DeepSeek | $0.27 | $1.10 | Coding, general tasks |
| Mistral Large 3 | Mistral AI | $0.50 | $1.50 | Multilingual, structured |
| Llama 3.3 70B | Meta | $0.72 | $0.72 | Open-weight balanced |
| Nova Pro | Amazon | $0.80 | $3.20 | Reports, analysis |
| Claude Haiku 4.5 | Anthropic | $1.00 | $5.00 | Quality + speed balance |
| Nova Premier | Amazon | $2.50 | $10.00 | Complex multimodal |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | Complex reasoning |
| Claude Opus 4.7 | Anthropic | $5.00 | $25.00 | Deepest multi-step tasks |
Pricing as of May 2026 (on-demand, US regions). Check aws.amazon.com/bedrock/pricing for current rates. Additional models available: Qwen3, Kimi K2, NVIDIA Nemotron, Writer Palmyra, and more.
Model names change every few months. Instead of memorizing names, think in tiers — match your task complexity to the right level of capability. It's like hiring: you don't need a senior consultant for data entry.
Simple tasks
Pattern matching
$0.04–$1/M tokens
Junior analyst
Moderate reasoning
Quality + speed
$1–$5/M tokens
Senior analyst
Complex analysis
Multi-step logic
$3–$75/M tokens
Expert consultant
| GS task | Tier | Why |
|---|---|---|
| PAC tagging (Symptom L1/L2/L3) | ⚡ Fast | Classification, no deep reasoning — speed and cost matter most at 400k cases / mo |
| SORT public-comment filter (Marketing vs Real Support) | ⚡ Fast | Binary classification at high volume — Project Snapshoot territory |
| Pax / Mex first-response email drafts | 🎯 Balanced | Needs empathy and nuance, not deep analysis |
| Case Context Summarizer (IRT, MIWI, DSAT) | 🎯 Balanced | Structured synthesis, moderate complexity — your Day 1 anchor build |
| SOP Lookup with citation (RAG) | 🎯 Balanced | Retrieval-augmented Q&A — your Day 2 anchor build |
| True-Safety vs downgrade decision (high-stakes triage) | 🧠 Deep | Judgement call with regulatory exposure — Opus territory until trust is earned |
| Bulk MIWI auto-tagging (400k / month) | ⚡ Fast | Cost-effective at scale — cheapest model that meets the QA accuracy bar |
Models on Amazon Bedrock grouped by tier. Click a tier to explore the models inside.
Explore the 3 tiers: Fast & Cheap (under $1), Balanced ($1–$2), and Deep Reasoning ($2–$5). Each tier has multiple models from different providers.
Pick a task, adjust the volume, and watch how cost changes across tiers. The right model choice can save your team thousands per month.
Once you've picked the right model tier, these strategies reduce cost further. Ordered by impact:
The biggest lever. Use Nova Micro for classification, Sonnet for complex analysis. Model choice matters more than anything else.
Up to 90% savings. Cache your template — pay full price once, 10% for every reuse. Perfect for repeated tasks.
50% savings. Submit requests in bulk (not real-time). Ideal for monthly portfolio assessments.
Up to 30% savings. Bedrock auto-routes simple tasks to cheaper models, complex ones to powerful models.
10–40% savings. Remove redundant instructions, use shorter examples, constrain output length. Markdown instead of HTML saves 60% on formatting tokens.
The context window is the maximum text the model can process at once — your prompt + the AI's response must fit within it.
| Model | Context window | Equivalent | Practical meaning |
|---|---|---|---|
| Nova Micro | 128K tokens | ~100 pages | A short book |
| Nova Pro | 300K tokens | ~230 pages | A long report |
| Claude Sonnet 4 | 200K tokens | ~150 pages | A full policy manual |
| Tool | Who picks the model | What you control |
|---|---|---|
| Claude (Cowork) | Anthropic (by plan tier) | Your prompt quality |
| Kiro | Auto-selected by task | Your prompt quality |
| Cursor | You choose per conversation | Model + prompt quality |
| Bedrock Playground | You choose explicitly | Model + prompt + parameters |
| Concept from this page | Where you'll apply it |
|---|---|
| Token estimation | Understanding why prompt length matters for cost and quality |
| Model tiers | Day 1 Demo: Model Arena — compare 3 models on the same task |
| Cost optimization | Making the business case for AI adoption in your team |
| Context windows | Managing long conversations — knowing when to start fresh |
| Decision framework | Day 2: Planning your first agent's cost profile |