Tokenization, Pricing & Model Selection — AnyCompany Support Workshop

What Are Tokens?

AI models don't read words like you do. They break text into tokens — small pieces that the model processes one at a time. Think of it like a cash register that counts items, not bags.

📝

1 token ≈ ¾ word

Common words = 1 token. Long or rare words split into pieces.

💵

You pay per token

Both what you send (input) and what AI generates (output) cost tokens.

🔢

Numbers are expensive

"$4,200.50" = 5+ tokens. Financial data costs more than narrative text.

📊

Format matters

Markdown uses 60% fewer tokens than HTML for the same content.

Token Estimation for GS Support Work

Content	Approximate tokens	Analogy
A short question ("Summarise this case")	~5 tokens	A single sentence on a Post-it
A 10-line D365 case description	~150 tokens	Half a page of notes
Your engineered Case Summarizer template	~400 tokens	A one-page playbook
A full case context summary (Symptom · Severity · Booking · Action · Next Step)	~800 tokens	A two-page brief
Total per case (input + output)	~1,350 tokens	A three-page document

⚠️ Key insight for GS: A datalake row like BK-2026-4821, IRT, P1, 22:14:00, SGD 24.50, refunded uses ~25 tokens — while the same length of English text uses only ~12 tokens. Booking IDs, timestamps, and amounts are "expensive" because the tokenizer never learned to compress them efficiently. At 400k MIWI cases / month, picking the right format saves real money.

💡 Why this matters for your team: When you send a spreadsheet to AI, you're paying for every comma, dollar sign, and decimal point. Summarizing data in narrative form ("Revenue grew 271% from $4,200 to $15,600") is cheaper than pasting raw tables — and often produces better AI output too.

How AI Pricing Works

AI pricing is simple: you pay per token, both in and out. Think of it like a taxi meter — the meter runs while you talk (input) AND while the AI responds (output). Output tokens cost 3–5× more because generating text is computationally harder than reading it.

THE PRICING FORMULA

Cost = (Input tokens × Input price) + (Output tokens × Output price) Example: Case Context Summarizer on Claude Sonnet 4 Input: 550 tokens × $3.00/million = $0.00165 Output: 800 tokens × $15.00/million = $0.01200 Total per assessment: $0.01365 That's 1.4 cents per assessment. An analyst takes 30 minutes ($25).

The Price Spectrum

Prices vary dramatically — from fractions of a cent to dollars per million tokens. Here's the landscape of models available on Amazon Bedrock:

Model	Provider	Input / 1M	Output / 1M	Best for
Nova Micro	Amazon	$0.035	$0.14	Classification, routing
Nova Lite	Amazon	$0.06	$0.24	Drafts, summaries
Llama 4 Maverick 17B	Meta	$0.24	$0.97	Multimodal, cost-effective
DeepSeek V3.2	DeepSeek	$0.27	$1.10	Coding, general tasks
Mistral Large 3	Mistral AI	$0.50	$1.50	Multilingual, structured
Llama 3.3 70B	Meta	$0.72	$0.72	Open-weight balanced
Nova Pro	Amazon	$0.80	$3.20	Reports, analysis
Claude Haiku 4.5	Anthropic	$1.00	$5.00	Quality + speed balance
Nova Premier	Amazon	$2.50	$10.00	Complex multimodal
Claude Sonnet 4.6	Anthropic	$3.00	$15.00	Complex reasoning
Claude Opus 4.7	Anthropic	$5.00	$25.00	Deepest multi-step tasks

Pricing as of May 2026 (on-demand, US regions). Check aws.amazon.com/bedrock/pricing for current rates. Additional models available: Qwen3, Kimi K2, NVIDIA Nemotron, Writer Palmyra, and more.

✅ The key insight: The same task can cost 1 cent or 1 dollar depending on which model you choose. Picking the right model for each task is the single biggest cost lever — far more impactful than optimizing prompt length.

Data Privacy — Why Bedrock Is Different

💡 With Amazon Bedrock: Your data stays in your AWS account — it is not used to train the models. You control the region, encryption, and access. All API calls are logged and auditable via CloudTrail. This is different from using ChatGPT or Claude.ai directly — Bedrock provides enterprise-grade data isolation.

The 3 Model Tiers

Model names change every few months. Instead of memorizing names, think in tiers — match your task complexity to the right level of capability. It's like hiring: you don't need a senior consultant for data entry.

⚡

Fast & Cheap

Simple tasks
Pattern matching
$0.04–$1/M tokens
Junior analyst

🎯

Balanced

Moderate reasoning
Quality + speed
$1–$5/M tokens
Senior analyst

🧠

Deep Reasoning

Complex analysis
Multi-step logic
$3–$75/M tokens
Expert consultant

Which Tier for Which GS Task?

GS task	Tier	Why
PAC tagging (Symptom L1/L2/L3)	⚡ Fast	Classification, no deep reasoning — speed and cost matter most at 400k cases / mo
SORT public-comment filter (Marketing vs Real Support)	⚡ Fast	Binary classification at high volume — Project Snapshoot territory
Pax / Mex first-response email drafts	🎯 Balanced	Needs empathy and nuance, not deep analysis
Case Context Summarizer (IRT, MIWI, DSAT)	🎯 Balanced	Structured synthesis, moderate complexity — your Day 1 anchor build
SOP Lookup with citation (RAG)	🎯 Balanced	Retrieval-augmented Q&A — your Day 2 anchor build
True-Safety vs downgrade decision (high-stakes triage)	🧠 Deep	Judgement call with regulatory exposure — Opus territory until trust is earned
Bulk MIWI auto-tagging (400k / month)	⚡ Fast	Cost-effective at scale — cheapest model that meets the QA accuracy bar

✅ The golden rule: Start with the cheapest tier that might work. Test it. If quality isn't good enough, move up one tier. Don't start with Deep Reasoning for a task that Fast can handle — you'll pay dollars for something that costs pennies.

💡 Why models perform differently: More parameters = more "knowledge" stored, but also slower and more expensive. A 70B-parameter model has seen more patterns than a 7B model. Some models use mixture-of-experts (MoE) where only a fraction of parameters activate per token — making them faster without losing quality.

Cost vs. Capability Spectrum

Models on Amazon Bedrock grouped by tier. Click a tier to explore the models inside.

Click a tier or model to see details

Explore the 3 tiers: Fast & Cheap (under $1), Balanced ($1–$2), and Deep Reasoning ($2–$5). Each tier has multiple models from different providers.

5 models in Fast tier 3 models in Balanced tier 3 models in Deep tier

Amazon Anthropic Meta Mistral AI DeepSeek Data: Artificial Analysis · May 2026

Model Selection Simulator

Pick a task, adjust the volume, and watch how cost changes across tiers. The right model choice can save your team thousands per month.

1. What's the task?

🎧Case Summarizer

🏷️PAC Auto-Tagging

💬First-Response Email

🔍SOP Lookup

📂Doc Classification

2. How many per month?

Volume

200

Best value

⚡

Fast & Cheap

$0.02

per month

Quality fit

Recommended

🎯

Balanced

$0.54

per month

Quality fit

Premium

🧠

Deep Reasoning

$2.64

per month

Quality fit

💸 vs. manual processing: An analyst costs $5,000/month for this task

99.9% saved

☕ Put it this way: 200 case summaries with Claude Sonnet 4 costs less than a single cup of kopi at the office ($2.64). The same work would take a TL ~100 hours of reading and writing.

💡 Recommendation: For Case Context Summarizer, use Balanced (Claude Sonnet 4). The task requires structured synthesis and policy-aware drafting, but doesn't need frontier reasoning. Lightweight models miss nuance in long chat transcripts; Opus is overkill for routine MIWI cases.

5 Cost Optimization Levers

Once you've picked the right model tier, these strategies reduce cost further. Ordered by impact:

🎚️

1. Right-size your model

The biggest lever. Use Nova Micro for classification, Sonnet for complex analysis. Model choice matters more than anything else.

💾

2. Prompt Caching

Up to 90% savings. Cache your template — pay full price once, 10% for every reuse. Perfect for repeated tasks.

📦

3. Batch Processing

50% savings. Submit requests in bulk (not real-time). Ideal for monthly portfolio assessments.

🔀

4. Intelligent Routing

Up to 30% savings. Bedrock auto-routes simple tasks to cheaper models, complex ones to powerful models.

✂️

5. Optimize Prompts

10–40% savings. Remove redundant instructions, use shorter examples, constrain output length. Markdown instead of HTML saves 60% on formatting tokens.

Context Windows: How Much Can the Model "See"?

The context window is the maximum text the model can process at once — your prompt + the AI's response must fit within it.

Model	Context window	Equivalent	Practical meaning
Nova Micro	128K tokens	~100 pages	A short book
Nova Pro	300K tokens	~230 pages	A long report
Claude Sonnet 4	200K tokens	~150 pages	A full policy manual

💡 For GS Support: A typical D365 case + a few SOPs + your prompt template fits easily within any model's context window. You'd only hit limits with very long chat transcripts (40+ turns in THA / VNM) or when reading the entire SOP corpus. When you do, use RAG to feed only the relevant SOPs — that's exactly what the SOP Lookup agent does.

What You Control in Each Tool

Tool	Who picks the model	What you control
Claude (Cowork)	Anthropic (by plan tier)	Your prompt quality
Kiro	Auto-selected by task	Your prompt quality
Cursor	You choose per conversation	Model + prompt quality
Bedrock Playground	You choose explicitly	Model + prompt + parameters

✅ Key takeaway: In most AI tools, you don't choose the model — the tool does. Focus on writing great prompts and designing good workflows. The prompt engineering skills you learn today work regardless of which model or tool you use. When you DO have model choice (Cursor, Bedrock), use the tier framework.

Workshop Connection

Concept from this page	Where you'll apply it
Token estimation	Understanding why prompt length matters for cost and quality
Model tiers	Day 1 Demo: Model Arena — compare 3 models on the same task
Cost optimization	Making the business case for AI adoption in your team
Context windows	Managing long conversations — knowing when to start fresh
Decision framework	Day 2: Planning your first agent's cost profile

💰 Tokens, Cost & Model Selection

What Are Tokens?

1 token ≈ ¾ word

You pay per token

Numbers are expensive

Format matters

Token Estimation for GS Support Work

How AI Pricing Works

The Price Spectrum

Data Privacy — Why Bedrock Is Different

The 3 Model Tiers

Fast & Cheap

Balanced

Deep Reasoning

Which Tier for Which GS Task?

Cost vs. Capability Spectrum

Model Selection Simulator

1. What's the task?

2. How many per month?

5 Cost Optimization Levers

1. Right-size your model

2. Prompt Caching

3. Batch Processing

4. Intelligent Routing

5. Optimize Prompts

Context Windows: How Much Can the Model "See"?

What You Control in Each Tool

Workshop Connection