Your Guide to Synthetic Respondents in Market Research
What are synthetic respondents and what do they mean for market research projects? Learn more in our latest guide.
The Intersection of Architecture and Market Research: A Study of B2B and B2C Dynamics
Architects are a key demographic for market researchers. Learn how both B2B and B2C sectors can take advantage of this essential group. Inside...
The Role of the Insights Industry in a Democratic Society: Ensuring Free, Fair, and Impartial Elections
The insights industry plays an important role in the electoral process across democratic societies. Learn more about their role and the challenges they face....
Introduction
Sometimes, a big challenge in market research is finding the right respondents or participants, or even enough of them, for any given study or survey. But what if you could partially complete a survey or an entire project without actually having to reach out to real people?
This is where synthetic respondents come in, helping solve the issue of finding real human participants. Synthetic respondents are custom profiles that can respond to market research surveys in the same way a similar human profile might.
However, as with all new artificial intelligence (AI) technology, there are a lot of questions about trust. Throughout this guide, we’re going to talk more about what synthetic respondents are, benefits and risks, and how you can use them.
What Are Synthetic Respondents?
Synthetic respondents are created using a combination of advanced data analytics and machine learning algorithms. The process begins by collecting large datasets from various sources, such as surveys, social media, and transaction records.
This data is then analyzed to identify patterns and behaviors typical of real respondents. Using these insights, algorithms generate synthetic profiles that mimic the demographic, psychographic, and behavioral attributes of real individuals.
Though there’s still much debate on what these respondents should be called—with some market research experts preferring transparent terms like “autoresponders” and others opting for something more neutral like “subjects”—the truth is that they’re artificially created profiles meant to act as replacements for human respondents.
These subjects can then answer market research surveys in addition to real respondents to further flesh out surveys and findings.
How Do Synthetic Respondents Work?
Synthetic respondents can be adjusted and scaled to represent specific target audiences or scenarios, allowing researchers to conduct simulations and predict outcomes without relying solely on real-world data collection. The goal is to create highly accurate and reliable models that provide valuable insights while maintaining privacy and ethical standards
Here’s an example that survey tool Conjointly shared:
Pros and Cons of Using Synthetic Respondents
As with any new technology, you need to weigh the benefits and risks, or pros and cons, to see if it makes sense for your business or your market research project. Let’s cover a few of the main pros and cons.
Pros
There are a few benefits to incorporating synthetic data into your market research projects.
Increased study participation
First, bringing synthetic respondents into your project can help you increase overall participation, especially if you’re struggling to get a good human sample size. You can take the demographics of the people who have already responded—or that you created to find the initial human respondents—and use that to create the profile for your synthetic respondents.
Faster and cheaper
When you’re using real humans to fill out your surveys and participate in your studies, you’re paying for each individual complete. That can add up, causing major market research projects to cost thousands, or even tens of thousands, of dollars.
However, bringing an advanced statistical modeling into the fold can help you generate the right mixture of human and synthetic results for a fraction of the cost. Plus, it can help your project move along much more quickly.
Data privacy
Because you’re not using real people, you don’t have quite as many data privacy concerns. However, this can still be a bit complicated.
To build out a comprehensive synthetic customer profile, you’re often still leveraging PII, or personally identifiable information. You need to ensure that data doesn’t fall into the wrong hands while you create your synthetic data.
On the other hand, if you’re instead using a detailed profile like the example in our last section, your data privacy concerns are much lower. However, if you’re using existing customer data in order to create a duplicate audience, you still need to be careful.
Cons
Now, let’s look at the other hand. Just because there are several benefits doesn’t mean synthetic data isn’t without its downsides.
Questions around trust
First, we have to face questions around trust. Can we trust this synthetic data? With how new AI tools are, can we really trust these outputs that they’re giving us?
The answer to this lies in what you do with your data once you receive it. We’ll cover this more a bit later, but validating your synthetic data against real data is key to building trust.
Data quality
Next, we have to talk about data quality. The data outputed by advanced modeling may be incredibly similar to data generated by real humans, but there are still a few issues that many companies testing this out have discovered.
For one, AI respondents tend to have a positivity bias. This means that when given the choice, AI tends to lean towards a more positive response.
Emporia Research did a test that compared real survey responses to two different synthetic audiences, comparing the overall results. And in both instances, the synthetic responses tended to respond with much more favorable responses than real answers.
Here’s one graph showcasing the differences in responses:
Best Practices for Using Synthetic Respondents
Considering the use of synthetic data within your next market research project? Let’s walk through a few best practices to keep in mind that can help you balance both the benefits and risks.
Create Comprehensive Personas
The persona you create that you ask the AI to emulate is the most important piece of the puzzle. Without a fully accurate, comprehensive profile, your AI output isn’t going to consist of data that will work for your needs.
Start by gathering demographic data and key customer information from your existing customers and market research results. Then, build out a profile that includes these attributes, alongside key behaviors and interests that you know your audience has.
Look back on the example we shared from Conjointly’s experiment. It included basic demographic information as well as some behavioral data to help draw the picture of the exact respondent they’re hoping to reach.
Ensure Transparency
While AI is still so new, you always need to consider any ethical implications that can arise. Remain transparent around your use of AI and synthetic respondents so that anyone referencing your data is aware. Transparency is going to be key until AI is more thoroughly adopted.
Use the Right Tools
It’s going to be important to choose the right AI tools for creating your synthetic data and each of the following tools provide some, all, or specialized components for generating synthetic data.
Some of the top LLMs include:
- BERT
- Claude
- Ernie
- Falcom
- Gemini
- GPT 3, 3.5, 4, and 4o
- Lamda
- Mistral
- Orca
Do your research to make sure the tool can fit your needs, then start feeding it your existing data to train it to create accurate synthetic respondents.
Validate Your Data
We find ourselves at a time where there is debate and searching for parity between real world data and synthetic models and and the best way to generate subjects. But there may be greater concerns than parity at play here:
Each time you create a set of synthetic data, you need to validate it by running it against your real data. By comparing the two data sets, you can get an idea of how similar the synthetic results are to your actual results. If the differences are miniscule, you can easily use the artificial data alongside your real results. However, if there’s an obvious difference between the two, you may need to revisit your initial customer profile.
There are two main ways to do this—manually, for small data sets, and by using a GAN (Generative Adversarial Network) model to automate the process for much larger data sets.
If you choose manually, you can simply compare the output to your data on your own. But a GAN model can do a great job of this as well by using two neural networks: a generator and a discriminator.
The generator works to create synthetic responses based on the initial prompts and profile provided. The discriminator compares those synthetic responses to the real responses, accepting ones that are similar and rejecting those that aren’t. However, they also provide clues on how to improve responses so that in the end, even the discriminator can’t find differences between real and synthetic data.
Focus on Basic Data Sets
Start by targeting basic audiences, rather than complex, niche ones. Because LLMs are learning based on what they find online, incredibly niche audiences that are difficult to reach already may not be able to be duplicated, giving you inaccurate and low-quality results.
Consider Synthetic Respondents for Your Market Research Projects
Is synthetic data the future? While we highly doubt artificial survey respondents will replace real ones, we can see a future where the two co-mingle.
As AI becomes increasingly prevalent in market research, learn more about what our new AI tool can help your business accomplish. Contact one of our sales reps to learn more.
Let's Work on Your Next Market Research Project
Get started with your next initiative
Follow
OvationMR
Need help with new insights?
We are ready to offer you: