← Back to Workshop Home

Day 2 Afternoon — Agent Design & AI Judge Competition

🤖 Agent Design Canvas

Design an AI agent for a real process from your team. Submit for AI Judge scoring. The canvas becomes your brief to your country GM and GSTF reviewer.

⏱ 45 minutes · Claude Cowork · No coding · Pairs

Why This Exercise Matters

This morning you learned the 4 workflow patterns and built your first simple agent. Now you'll design a bigger, more strategic agent for a real process from your team — the kind of automation that saves hours per week.

The Agent Design Canvas is a one-page strategic document — not code, not a technical spec. It captures: what the agent does, how it works, what it must never do, and what success looks like. This is the document you hand to your tech team and say: "Build this."

What You'll DoTime
1️⃣Pick a workflow from your team5 min
2️⃣Choose the right workflow pattern3 min
3️⃣Fill the Agent Design Canvas (with Claude's help)25 min
4️⃣Submit for AI Judge scoring + leaderboard5 min
5️⃣Translate to Claude Cowork (your next step)7 min
🎯 Deliverable: A completed Agent Design Canvas (markdown) scored by AI Judge. Top designs get recognized. Your canvas becomes the brief you share with your tech team next week.

🔍 How AI Judge Scores Your Canvas (Full Transparency)

Before you start, here's exactly what the AI evaluates. No hidden criteria — design for these 6 dimensions:

DimensionWhat the AI Looks ForScore
Problem Clarity Is the mission one clear sentence? Can someone outside your team understand what this agent does? Is the trigger event specific (not "when needed")? / 5
Pattern Fit Does the chosen pattern (chaining/parallel/routing/orchestration) actually match the workflow? Are the steps logical and complete? Would a different pattern work better? / 5
Guardrails & Safety Are there clear "must NOT" rules? Are escalation thresholds specific (numbers, not vague)? Is there a human-in-the-loop for high-stakes decisions? / 5
Implementation Readiness Could a tech team start building from this? Are data sources named? Are skills/steps specific enough to implement? Is the autonomy level realistic? / 5
Business Impact Is time savings quantified (hours/week, not "saves time")? Are success metrics measurable? Is there a realistic first milestone? / 5
AnyCompany Relevance Is this grounded in GS SupportIs this grounded in AnyCompany's actual operations? Does it reference real data (SGD, merchants, PayLater, markets)? Would this actually help the CFO Office?#39;s actual operations? Does it reference real metrics (AHT, FCR, DSAT, SLA, PAC tags) and your market? Would this actually help your country GM? / 5

💡 Scoring Guide

  • 25–30: 🥇 Production-Ready — your tech team could start building this week
  • 19–24: 🥈 Strong Design — solid foundation, address the feedback and it's ready
  • 13–18: 🥉 Good Start — right direction, needs more specificity
  • Below 13: 🔄 Needs Rework — add more detail, check the AI feedback
Pro tip: The same principles from Day 1 apply — be specific (not vague), ground everything in data (not assumptions), include guardrails (not just the happy path), and quantify impact (not just "saves time").
1 Pick Your Workflow

Step 1: Choose a Real Process

Click a workflow below — or pick your own. Choose something your team does at least weekly that takes at least 30 minutes each time.

🎧 Case Context Summarizer ⭐
Pull D365 → assess severity → draft summary → post back
Recommended: 🔗 Chaining
📚 SOP Lookup with Citation ⭐
Question → search SOPs → cite article ID → escalate ambiguity
Recommended: 🔗 Chaining (RAG)
⚠️ L1 Email Triage ⭐
Classify Safety / MIWI / Fraud → route → draft 1st response
Recommended: 🔀 Routing
📝 DSAT Write-up Generator
Read transcript → generate 4 DSAT fields → save to QA queue
Recommended: 🔗 Chaining
🔍 Multi-lens Safety Triage
3 lenses in parallel → aggregate → unified handling plan
Recommended: ⚡ Parallelization
🏷️ PAC Auto-Tagging
Read case → classify Symptom + Action + Root Cause → write to D365
Recommended: 🔗 Chaining
2 Choose Pattern

Step 2: Confirm the Workflow Pattern

The recommended pattern is pre-selected. Change it if you think a different one fits better:

🔗 Chaining
A → B → C → D
⚡ Parallelization
3 views → combine
🔀 Routing
Classify → right path
🎯 Orchestration
If/then + human gates
3 Fill the Canvas

Step 3: Design Your Agent

Open Claude Cowork and use the hints below to fill each section of the canvas. Don't copy-paste a prompt — write it yourself using the hints. The AI Judge rewards specificity and original thinking.

🎯 The Canvas Template — Fill Each Section

Ask Claude to help you complete this canvas. Give it your workflow choice and pattern, then work through each section together:

Section 1: Agent Mission

Hints

  • One sentence: "This agent [does what] for [whom] by [how]"
  • Include the trigger: what event starts it? (new file arrives, weekly schedule, manual request)
  • Name it something descriptive — "Case Summarizer" not "Agent 1"

Section 2: Workflow Steps

Hints

  • List 3–5 steps. Each step = one clear action with one clear output
  • For each step: what goes IN, what comes OUT, what could go wrong?
  • Label which pattern each step uses (if combining patterns)
  • Think: which steps need AI judgment vs which are just data lookup?

Section 3: Data Requirements

Hints

  • Inputs: What data does the agent need? Be specific — "D365 case data + chat transcript" not "some data"
  • Outputs: What does it produce? Report, notification, decision, dashboard?
  • Knowledge: What reference documents should it always have access to? (policies, thresholds, templates)

Section 4: Guardrails & Escalation

Hints — This is where leaders add the most value

  • Must NOT: What should the agent NEVER do? (auto-approve above $X, share PII, skip checks)
  • Escalate when: Specific thresholds — "$25K+", "confidence below 80%", "3+ risk flags"
  • Autonomy level: Start at L1 (suggest only) or L2 (act on routine, ask on exceptions)?
  • Think: what would make you uncomfortable if the agent did it without asking?

Section 5: Business Impact

Hints

  • Current state: How many hours/week does this take today? How many people?
  • With agent: Estimate time savings (be realistic — 60-80% reduction, not 100%)
  • Success metrics: 2-3 measurable KPIs (processing time, error rate, throughput)
  • First milestone: What's the smallest version that proves value in 2 weeks?
✅ Checkpoint: Your canvas should be 300-500 words. Every section filled. Specific numbers, not vague language. If you can't quantify something, say "estimate: X" — that's better than leaving it blank.
4 Submit & Score

Step 4: Submit for AI Judge Scoring

Paste your completed canvas below. The AI Judge (Claude on Amazon Bedrock) evaluates it against the 6 dimensions shown above and returns a score with specific feedback.

📤 Submit Your Canvas

Resubmitting with the same name replaces your previous entry — iterate and improve!

💡 How the AI Judge works (behind the scenes)

Your canvas is sent to Amazon Bedrock (Claude Sonnet). The model receives a structured rubric with the 6 scoring dimensions and evaluates your canvas against each one. It returns:

  • A score (1-5) per dimension with brief justification
  • Top strengths — what you did well
  • Specific improvements — what would raise your score

This is the same LLM-as-Judge technique from Day 1 — using AI to evaluate AI outputs. You can resubmit as many times as you want. Iterate based on the feedback.

5 Translate to Cowork

Step 5: From Canvas to Claude Cowork

Your canvas is the design. Here's how each section translates to something you can build in Claude Cowork today:

🏠 Guardrails → Project Instructions

Your "Must NOT" rules and escalation thresholds become the Custom Instructions in your Cowork project. These apply to every conversation.

Canvas section → Cowork: Project Settings → Custom Instructions

📋 Workflow Steps → Knowledge Files

Each workflow step becomes a Knowledge file uploaded to the project. Include the step instructions, expected input/output format, and quality criteria.

Canvas section → Cowork: Project → Add Knowledge

📊 Data Requirements → Uploaded Files

Reference documents (policies, thresholds, templates) get uploaded as Knowledge files. Live data gets pasted into conversations.

Canvas section → Cowork: Project → Add Knowledge + paste data in chat

⏰ Trigger → Scheduled Task

If your trigger is time-based ("every Monday"), set up a Scheduled Task in Cowork. If event-based ("when file arrives"), you'll need your tech team's help later.

Canvas section → Cowork: Scheduled Tasks (time-based) or manual trigger

🎯 Your homework: Set this up after the workshop

  • Create a new Claude Cowork project named after your agent
  • Paste your guardrails into Custom Instructions
  • Upload your reference documents as Knowledge files
  • Write a "kickoff prompt" that runs your workflow steps in sequence
  • Test with one real example from last week

That's your first working agent — steering + skills, no code required. Add a Scheduled Task when you're ready for automation.

🏆 Leaderboard

All submissions ranked by score. Click "Refresh" to see new entries.

Loading leaderboard...