Methodology
Three stages, progressively harder
Stage 1 — Personality Expression (Brazil, Peru, US): A 25-item personality inventory administered to 2,700 synthetic panelists. Mean correlation between intended and expressed personality: r = 0.83. Removing personality specification collapsed correlation to zero.
Stage 2 — Decision Anchoring (Brazil, Peru, US): 12 decision-making parameters — loss aversion, temporal discounting, exploration tendency — derived from a Nature-published dataset of 10.6 million real human decisions. Explicit decision parameters: r = 0.54. Personality alone: r = 0.04.
Stage 3 — Behavioral Curation (US): Real purchase data from the Dunnhumby "Complete Journey" dataset. 2,500 households, two years of grocery transactions, 39 behavioral attributes per household, 22 FMCG scenarios, five data configurations.
Key Findings
Three laws nobody expected
Core Discovery
The Identity-Operation Gradient
Synthetic populations replicate who someone is with high fidelity, but systematically fail at how they transact. This three-tier taxonomy maps the boundary between what AI consumers can and cannot simulate.
This gradient was consistent across all six models and mirrored the identical pattern observed in Stage 2 with abstract decision parameters. Identity-adjacent parameters like moral sensitivity (r = 0.71) dramatically outperformed operational parameters like loss aversion (r = 0.16).
“The quality of the question you ask the AI matters twenty-five times more than which AI you ask.”
— Adriana RochaFor Practitioners
What this means for concept testing
Concept testing, message evaluation, and brand positioning operate in identity territory — where synthetic populations showed their strongest performance (r = 0.25–0.50 for well-designed scenarios).
Purchase prediction operates in operational territory, where synthetic fidelity remains insufficient for quantitative forecasting.
The practical positioning: not replacement, but augmentation. 100 concepts tested synthetically, 10 survivors validated with real consumers, 3 winners launched.
Study Overview
Validation architecture
| Stage | Domain | Instrument | Key r | Sample |
|---|---|---|---|---|
| 1 | Personality expression | 25-item IPIP-FFM | 0.83 | 2,700 panelists × 3 models × 3 countries |
| 2 | Decision anchoring | 24 vignettes, 12 parameters | 0.54 | 3,600 panelists × 4 models × 3 countries |
| 3 | Behavioral curation | 22 FMCG scenarios | 0.37 | 100 real households × 6 models × 5 conditions |
Models tested: Claude Opus, Claude Sonnet, Claude Haiku (Anthropic), GPT-5.4, GPT-4o-mini (OpenAI), Gemini Flash (Google), GPT-4.1 (OpenAI).
Request the full paper
Complete methodology, statistical tables, per-vignette correlation matrices, verbatim model responses across all models, and the Stage 2 and Stage 3 instruments in full.
About the Author
Adriana Rocha is the founder of Wisdom Beyond Technology and Wortya, and author of Refactoring the Firm: Building Intelligence-Native Organizations for the AI Age. The qualitative field study that motivated this research was conducted in collaboration with Datum International (Urpi Torrado) and presented at ESOMAR LATAM 2026.
Correspondence: adrianar@wortya.com · LinkedIn