How to Read AI Agency Case Studies (Without Getting Fooled)

Every AI agency has case studies. Almost all of them are heavily curated. Many are misleading. Some are fabricated. The ability to read a case study critically — to separate what actually happened from what the agency wants you to believe happened — is one of the most valuable skills you can develop before signing a contract.

This is not a cynical take. Good agencies do real work and produce real results. The problem is that good case studies and bad case studies look almost identical on the surface. Here's how to tell them apart.

The Anatomy of a Vague Case Study

Most agency case studies follow a template:

*"[Client, often unnamed or referred to as 'a Fortune 500 retailer'] came to us with a challenge: [vague problem]. We applied our [proprietary methodology] and built [impressive-sounding system]. The result: [percentage improvement] in [metric]."*

The red flags are concentrated in three places: the problem description, the methodology description, and the result claim.

Vague problem description: "Manual processes were slowing down their operations" tells you nothing. What processes? Manual in what sense? How slow, and compared to what baseline? The specificity of problem description is a proxy for the specificity of the agency's actual engagement. Agencies that understood the problem deeply can describe it precisely. Agencies that didn't can't.

Methodology black box: "We applied our AI platform and machine learning expertise" is content-free. A real case study describes the actual approach: what model architecture was used, what data it was trained on, how the output was integrated into existing workflows, what the evaluation criteria were. If you can't find any of this, the agency either doesn't want to disclose it (often legitimate) or didn't do anything particularly sophisticated (also common).

Result that floats free from context: "40% improvement in efficiency" means nothing without answering: efficiency of what, measured how, over what time period, compared to what baseline? The most inflated case study claims are usually technically true but contextually misleading — a 40% reduction in time to complete one specific task in a process that takes 5% of total work time is not a 40% efficiency improvement.

The Five Questions That Expose a Case Study

When you're evaluating an agency's case studies, ask these five questions. If you can't answer them from the case study, ask the agency directly. Their response (or evasion) is itself informative.

1. What was the baseline?

Results require a before. "Reduced processing time to 2 minutes" is meaningless without knowing it was previously 5 minutes vs. 10 hours. "Improved accuracy to 94%" needs the starting accuracy to mean anything. Good case studies always specify the baseline. When they don't, ask.

2. How was the metric measured?

"Accuracy" can mean precision, recall, F1 score, or task-completion accuracy — and these can differ dramatically. "Efficiency" can be measured in time, cost, error rate, or throughput. "ROI" can be calculated dozens of different ways. A legitimate case study uses a specific, defensible measurement methodology. Vague metric names are usually hiding something.

3. How long did it take to see the results?

AI projects often show strong early results that decay as the novelty effect wears off and the model encounters real-world conditions not represented in training data. A case study that shows 6-week results is a very different story from one that shows 12-month results. Agencies with long-term success track records will include timeline data. Agencies showcasing early wins won't.

4. Who at the client can confirm this?

Named contacts at the client company are the gold standard. Not just "John S., VP of Operations" — an actual person who can be reached and asked about the engagement. Unnamed testimonials, client logos without contacts, and case studies where the client is anonymized for "confidentiality" reasons range from normal (large enterprises frequently request confidentiality) to suspicious (the client might not know they're being cited this way).

5. What went wrong?

No legitimate AI project runs perfectly from kickoff to delivery. There are always data problems, unexpected edge cases, model failures, integration challenges, or scope changes. A case study that describes a smooth, triumphant journey from problem to solution is almost certainly missing the real story. Ask the agency: "What was the hardest part of this engagement? What would you do differently?"

What Real Results Look Like

Legitimate AI project outcomes tend to share certain characteristics:

They're specific and bounded: "Reduced false positive rate in fraud detection from 8.3% to 2.1% on a test set of 500,000 transactions over Q3 2024" vs. "dramatically improved fraud detection."

They acknowledge limitations: "The model performs well on text in English and Spanish but hasn't been tested on other languages we may encounter." Real teams know what their systems can't do.

They connect to business outcomes: Not just "model accuracy improved 12%" but "this accuracy improvement translated to $340,000 in recovered revenue in year one based on our average transaction value."

They include caveats about data quality: "We required 3 weeks of additional data cleaning before modeling could begin, which extended the initial timeline" is a sign that the agency actually engaged with real data.

They mention ongoing work: AI systems require maintenance. Real case studies often mention post-deployment model monitoring, retraining schedules, or ongoing support engagements. A case study that ends at launch is describing a sprint, not a production system.

Inflated Marketing: Common Patterns

The cherry-picked metric: The AI system improved in one dimension that was easy to optimize. Every other dimension is absent from the case study. If a recommendation engine improved click-through rate by 35% but also increased returns by 25% (suggesting users were clicking on irrelevant products), the click-through improvement is not a success.

The lab-to-production gap: The results shown were measured in a controlled test environment, not production. Model performance in controlled conditions is consistently better than in production, sometimes dramatically so. Look for case studies that specify "in production" or "live system" rather than "in our testing environment."

The attribution problem: "Revenue increased 22% after we deployed the AI system" does not mean the AI system caused a 22% revenue increase. Correlation without controls is a meaningless result. Good case studies acknowledge confounding factors and explain why the improvement is attributable to the AI system specifically.

The definitional stretch: "AI-powered" is often applied to systems that use simple rule-based logic with a thin ML layer on top. "Machine learning" is sometimes applied to linear regression. "Deep learning" covers everything from a two-layer neural network to a frontier model. Ask what specifically the system does and how it works.

The internal testimonial: A quote from the agency's own project manager praising the project is not a testimonial. A quote from the client contact who signed the check is.

How to Verify Case Study Claims

Request an introduction to the client contact: A confident agency will facilitate this. An agency that says "our clients prefer to remain anonymous in all cases" is not giving you something you should accept. Some clients legitimately don't want public attribution — but the agency should be able to arrange a reference call even if the client doesn't want their name on the website.

Ask for raw data on outcomes: Not a polished PDF. The actual measurement logs, dashboards, or reports that show the results. If the agency "can't share client data," they should at least be able to describe what the measurement system looked like.

Cross-reference with the client's public filings: If the client is a public company, look at their investor communications, press releases, and earnings calls for the period when the AI project was supposedly transformative. If their annual report makes no mention of a major AI initiative, the project probably wasn't major.

Check LinkedIn for the claimed project team: If a case study says "our 12-person team spent 8 months on this," you can check the LinkedIn profiles of people who worked at the agency during that period to see if the team size and timeline check out. This is imperfect but can reveal obvious fabrications.

Ask what the current state of the system is: Is the AI system still running? Is the client still using it? What happened in year two? Agencies with genuinely successful deployments will be proud to tell you the system is still operating. Agencies that built something that was quietly deprecated won't bring it up.

The Reference Call Is More Important Than the Case Study

At some point, case studies are marketing and references are data. The reference call with a past or current client is where you get the real story: whether the agency communicated well, whether timelines and budgets were honored, whether the technical team was strong, and whether the client would hire them again.

Ask past clients specifically:

Did the project deliver what was promised?
Did the timeline and budget hold?
How did the agency handle problems when they came up?
Would you hire them again for a similar project?
What should I know about working with them that isn't obvious from their marketing?

The last question often produces the most useful answers.

Browse case studies to build your initial shortlist, but treat the reference call as the actual evaluation. The aiagencymap.com directory surfaces agencies by specialty and geography — use it to find candidates, then use reference calls to make your decision.