Your Guide to Synthetic Respondents in Market Research

 

What are synthetic respondents and what do they mean for market research projects? Learn more in our latest guide.

 

Ask an Expert | Get Quote
Market Research » Your Guide to Synthetic Respondents in Market Research
You might also like…

Introduction

 

Sometimes, a big challenge in market research is finding the right respondents or participants, or even enough of them, for any given study or survey. But what if you could partially complete a survey or an entire project without actually having to reach out to real people?

This is where synthetic respondents come in, helping solve the issue of finding real human participants. Synthetic respondents are custom profiles that can respond to market research surveys in the same way a similar human profile might.

 

However, as with all new artificial intelligence (AI) technology, there are a lot of questions about trust. Throughout this guide, we’re going to talk more about what synthetic respondents are, benefits and risks, and how you can use them.

What Are Synthetic Respondents?

 

Synthetic respondents are created using a combination of advanced data analytics and machine learning algorithms. The process begins by collecting large datasets from various sources, such as surveys, social media, and transaction records.

This data is then analyzed to identify patterns and behaviors typical of real respondents. Using these insights, algorithms generate synthetic profiles that mimic the demographic, psychographic, and behavioral attributes of real individuals. 

Though there’s still much debate on what these respondents should be called—with some market research experts preferring transparent terms like “autoresponders” and others opting for something more neutral like “subjects”—the truth is that they’re artificially created profiles meant to act as replacements for human respondents.

These subjects can then answer market research surveys in addition to real respondents to further flesh out surveys and findings.

 

How Do Synthetic Respondents Work?

Synthetic respondents can be adjusted and scaled to represent specific target audiences or scenarios, allowing researchers to conduct simulations and predict outcomes without relying solely on real-world data collection. The goal is to create highly accurate and reliable models that provide valuable insights while maintaining privacy and ethical standards

Here’s an example that survey tool Conjointly shared:

 

Prompts input into an LLM to create a synthetic response

They input the following as a customer profile:

“You are a 45 year old mother of three children. You eat bread for breakfast, you eat spaghetti for lunch and you like to vary your meals for dinner. Your budget for everything is limited because you are on government allowance. Next please answer several questions asked of you. In giving answers you will strictly follow the required format and will not add any additional words.”

Then, they asked questions like:

  • How many loaves of bread do you buy per week?
    (Response format: Numeric)
  • What types of bread do you normally buy?
    1. White bread
    2. Wholemeal bread
    3. Rye bread
    4. Other type of bread
    (Response format: Integer corresponding to the selected option)
  • Describe the taste of the bread that you like the most.
    (Response format: Open-ended text response up to 255 characters)

This is an example of a synthetic respondent. However, this can also be done at scale, generating a report full of synthetic responses for a market research survey.

Pros and Cons of Using Synthetic Respondents

 

As with any new technology, you need to weigh the benefits and risks, or pros and cons, to see if it makes sense for your business or your market research project. Let’s cover a few of the main pros and cons.

 

Pros

There are a few benefits to incorporating synthetic data into your market research projects.

Increased study participation

First, bringing synthetic respondents into your project can help you increase overall participation, especially if you’re struggling to get a good human sample size. You can take the demographics of the people who have already responded—or that you created to find the initial human respondents—and use that to create the profile for your synthetic respondents.

Faster and cheaper

When you’re using real humans to fill out your surveys and participate in your studies, you’re paying for each individual complete. That can add up, causing major market research projects to cost thousands, or even tens of thousands, of dollars.

However, bringing an advanced statistical modeling into the fold can help you generate the right mixture of human and synthetic results for a fraction of the cost. Plus, it can help your project move along much more quickly.

Data privacy

Because you’re not using real people, you don’t have quite as many data privacy concerns. However, this can still be a bit complicated.

To build out a comprehensive synthetic customer profile, you’re often still leveraging PII, or personally identifiable information. You need to ensure that data doesn’t fall into the wrong hands while you create your synthetic data.

On the other hand, if you’re instead using a detailed profile like the example in our last section, your data privacy concerns are much lower. However, if you’re using existing customer data in order to create a duplicate audience, you still need to be careful.

 

Cons

Now, let’s look at the other hand. Just because there are several benefits doesn’t mean synthetic data isn’t without its downsides.

Questions around trust

First, we have to face questions around trust. Can we trust this synthetic data? With how new AI tools are, can we really trust these outputs that they’re giving us?

The answer to this lies in what you do with your data once you receive it. We’ll cover this more a bit later, but validating your synthetic data against real data is key to building trust.

Data quality

Next, we have to talk about data quality. The data outputed by advanced modeling may be incredibly similar to data generated by real humans, but there are still a few issues that many companies testing this out have discovered.

For one, AI respondents tend to have a positivity bias. This means that when given the choice, AI tends to lean towards a more positive response.

Emporia Research did a test that compared real survey responses to two different synthetic audiences, comparing the overall results. And in both instances, the synthetic responses tended to respond with much more favorable responses than real answers.

Here’s one graph showcasing the differences in responses:

A graph comparing real versus synthetic respondents

When questioned about satisfaction in their current role, 47% of real respondents answered “Somewhat Satisfied,” while 69% of synthetic responses selected that as their answer. This obviously poses a potential problem with data quality.

Another study found that synthetic respondents “struggled to capture sub-group trends effectively,” with responses veering away from their actual preferences, and that responses seemed to lack in variety, especially for more qualitative questions.

The main thing we can do here is using validation techniques (more on that later) and simply try to train AI to provide better and better outputs.

Ensuring diverse audiences

Finally, diversity, equity, and inclusion (DEI) is a major concern for well-rounded data. However, many LLMs tend to lean towards the culture of where it was initially created, and don’t have the ability to represent a large, diverse population.

Instead, you end up with skewed results that may not properly represent the audience you’re targeting. One example of this is the use of Common Crawl, a repository of data from across the internet that’s become a major training ground for AI.

Because 45% of its data is in English, this will likely lead AI tools to better understand and generate content in English, rather than other languages.

Best Practices for Using Synthetic Respondents

 

Considering the use of synthetic data within your next market research project? Let’s walk through a few best practices to keep in mind that can help you balance both the benefits and risks.

Create Comprehensive Personas

The persona you create that you ask the AI to emulate is the most important piece of the puzzle. Without a fully accurate, comprehensive profile, your AI output isn’t going to consist of data that will work for your needs.

Start by gathering demographic data and key customer information from your existing customers and market research results. Then, build out a profile that includes these attributes, alongside key behaviors and interests that you know your audience has.

Look back on the example we shared from Conjointly’s experiment. It included basic demographic information as well as some behavioral data to help draw the picture of the exact respondent they’re hoping to reach.

Ensure Transparency

While AI is still so new, you always need to consider any ethical implications that can arise. Remain transparent around your use of AI and synthetic respondents so that anyone referencing your data is aware. Transparency is going to be key until AI is more thoroughly adopted.

Use the Right Tools

It’s going to be important to choose the right AI tools for creating your synthetic data and each of the following tools provide some, all, or specialized components for generating synthetic data. 

Some of the top LLMs include:

  • BERT
  • Claude
  • Ernie
  • Falcom
  • Gemini
  • GPT 3, 3.5, 4, and 4o
  • Lamda
  • Mistral
  • Orca

Do your research to make sure the tool can fit your needs, then start feeding it your existing data to train it to create accurate synthetic respondents.

An illustration of a robot touching a human finger in a collaborative manner

Validate Your Data

 

We find ourselves at a time where there is debate and searching for parity between real world data and synthetic models and and the best way to generate subjects. But there may be greater concerns than parity at play here:

If we do not have a good understanding of the human processes that underpin the way a question is assimilated, and an answer given, then we are in danger of assuming synthetic respondents are always equivalent to human respondentsColin Strong, IPSOS

 

Each time you create a set of synthetic data, you need to validate it by running it against your real data. By comparing the two data sets, you can get an idea of how similar the synthetic results are to your actual results. If the differences are miniscule, you can easily use the artificial data alongside your real results. However, if there’s an obvious difference between the two, you may need to revisit your initial customer profile.

There are two main ways to do this—manually, for small data sets, and by using a GAN (Generative Adversarial Network) model to automate the process for much larger data sets.

If you choose manually, you can simply compare the output to your data on your own. But a GAN model can do a great job of this as well by using two neural networks: a generator and a discriminator.

The generator works to create synthetic responses based on the initial prompts and profile provided. The discriminator compares those synthetic responses to the real responses, accepting ones that are similar and rejecting those that aren’t. However, they also provide clues on how to improve responses so that in the end, even the discriminator can’t find differences between real and synthetic data.

Focus on Basic Data Sets

Start by targeting basic audiences, rather than complex, niche ones. Because LLMs are learning based on what they find online, incredibly niche audiences that are difficult to reach already may not be able to be duplicated, giving you inaccurate and low-quality results.

Consider Synthetic Respondents for Your Market Research Projects

 

Is synthetic data the future? While we highly doubt artificial survey respondents will replace real ones, we can see a future where the two co-mingle.

As AI becomes increasingly prevalent in market research, learn more about what our new AI tool can help your business accomplish. Contact one of our sales reps to learn more.

 

How to leverage Brand Awareness Studies

Let's Work on Your Next Market Research Project

Get started with your next initiative

Follow

OvationMR


generative ai in qualitative research with OvationMR Logo

Need help with new insights?

We are ready to offer you:

N

A project estimate/proposal

+1.212.653.8750

39 Broadway, Suite 2010, New York, NY 10006 USA