Research Paper

Synthetic Consumers Are
Not Fake Humans

We tested whether AI-generated synthetic consumers can predict what real people buy. 11,000 synthetic interviews. Six models. Three countries. Real grocery data from 2,500 US households. Here is what we found — and what it means for the research industry.

Adriana Rocha Wisdom Beyond Technology / Wortya ESOMAR Research World, May 2026
11,000+
Synthetic Interviews
6
AI Models Tested
3
Countries
2,500
Real Households
3
Validation Stages

Methodology

Three stages, progressively harder

Stage 1 — Personality Expression (Brazil, Peru, US): A 25-item personality inventory administered to 2,700 synthetic panelists. Mean correlation between intended and expressed personality: r = 0.83. Removing personality specification collapsed correlation to zero.

Stage 2 — Decision Anchoring (Brazil, Peru, US): 12 decision-making parameters — loss aversion, temporal discounting, exploration tendency — derived from a Nature-published dataset of 10.6 million real human decisions. Explicit decision parameters: r = 0.54. Personality alone: r = 0.04.

Stage 3 — Behavioral Curation (US): Real purchase data from the Dunnhumby "Complete Journey" dataset. 2,500 households, two years of grocery transactions, 39 behavioral attributes per household, 22 FMCG scenarios, five data configurations.

Key Findings

Three laws nobody expected

1
The right data depends on the right question
Personality profiles excel at identity-expressive tasks (r = 0.90). Decision parameters excel at deliberative choices (r = 0.64). Raw behavioral data excels at habitual purchases (r = 0.37). Each layer wins in its domain and fails outside it.
2
Mixing layers hurts
Adding personality profiles to raw behavioral data dropped accuracy by 20–105% across every model. The AI treats personality as a "character sheet" and role-plays accordingly — overriding the behavioral evidence.
3
Question design matters 25× more than model choice
Scenario framing accounts for ~76% of variance in prediction accuracy. Behavioral category: 15%. Model choice: only 3%. The industry conversation about which AI to use is the wrong conversation.

Core Discovery

The Identity-Operation Gradient

Synthetic populations replicate who someone is with high fidelity, but systematically fail at how they transact. This three-tier taxonomy maps the boundary between what AI consumers can and cannot simulate.

Identity-Explicit Price sensitivity, brand loyalty
0.318
Identity-Adjacent Lifestyle, health-consciousness
0.259
Operational-Behavioral Visit frequency, basket size
0.121

This gradient was consistent across all six models and mirrored the identical pattern observed in Stage 2 with abstract decision parameters. Identity-adjacent parameters like moral sensitivity (r = 0.71) dramatically outperformed operational parameters like loss aversion (r = 0.16).

“The quality of the question you ask the AI matters twenty-five times more than which AI you ask.”

— Adriana Rocha

For Practitioners

What this means for concept testing

Concept testing, message evaluation, and brand positioning operate in identity territory — where synthetic populations showed their strongest performance (r = 0.25–0.50 for well-designed scenarios).

Purchase prediction operates in operational territory, where synthetic fidelity remains insufficient for quantitative forecasting.

The practical positioning: not replacement, but augmentation. 100 concepts tested synthetically, 10 survivors validated with real consumers, 3 winners launched.

Study Overview

Validation architecture

Stage Domain Instrument Key r Sample
1 Personality expression 25-item IPIP-FFM 0.83 2,700 panelists × 3 models × 3 countries
2 Decision anchoring 24 vignettes, 12 parameters 0.54 3,600 panelists × 4 models × 3 countries
3 Behavioral curation 22 FMCG scenarios 0.37 100 real households × 6 models × 5 conditions

Models tested: Claude Opus, Claude Sonnet, Claude Haiku (Anthropic), GPT-5.4, GPT-4o-mini (OpenAI), Gemini Flash (Google), GPT-4.1 (OpenAI).

Request the full paper

Complete methodology, statistical tables, per-vignette correlation matrices, verbatim model responses across all models, and the Stage 2 and Stage 3 instruments in full.

Request Paper via Email Read the Research World Article

About the Author

Adriana Rocha is the founder of Wisdom Beyond Technology and Wortya, and author of Refactoring the Firm: Building Intelligence-Native Organizations for the AI Age. The qualitative field study that motivated this research was conducted in collaboration with Datum International (Urpi Torrado) and presented at ESOMAR LATAM 2026.

Correspondence: adrianar@wortya.com  ·  LinkedIn