👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts, and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 14,135, +185 since last week
Brought to you by VESTBERRY - The Portfolio Intelligence Software for Data-driven VCs

Watch our 7-minute product demo, showcasing the platform's powerful features and intuitive interface. Gain valuable insights and make data-driven decisions with unparalleled ease.
Piece 2 of 2
Welcome back to another episode around biases in the world of venture capital. While last week’s episode centered around cognitive biases and how they impact human decision-making across the VC investment process, today’s episode is all about AI and data biases and how they are prone to mirror the past into the future.

Generated with DALLE-3
How Can Data-driven Approaches Help Investors Overcome Cognitive Biases?
Let’s look back at my very first episode “Why VC is Broken and Where to Start Fixing It” more than a year ago:
… the VC investment/decision-making process is manual, inefficient, non-inclusive, subjective and biased which leads not only to a huge waste of resources but more importantly to sub-optimal outcomes and missed opportunities.
Double-clicking on the “subjective and biased” part, we find that cognitive biases are the key driver leading investors to pattern matching and oftentimes suboptimal outcomes. Pattern matching on the basis of limited sample size is dangerous and unfortunately, no investor had the opportunity to experience all of the success cases in the world firsthand.
Data-driven approaches may balance these shortcomings if done right. By assembling a comprehensive time-series dataset about all companies out there, we can analyze how different features of successful vs unsuccessful companies have evolved over time, just as described in this paper.
Following the extraction of “success patterns” (check out this related piece on “Patterns of Successful Startups”), investors cannot only translate them into an algorithmic selection of new investment opportunities but leverage these findings to challenge their cognitive biases.
Firsthand experiences with a limited sample of successful vs unsuccessful companies shape subjective cognitive biases. Creating awareness for these biases and balancing them with more objective feature patterns identified through a significantly more comprehensive data sample merges the best of both worlds: Subjective/human + objective/data.
What Is Data Bias?
While humans are prone to cognitive biases, data-driven approaches and machine-learning models are prone to data bias. Data bias occurs when an information set is inaccurate and fails to represent the entire population.
For example, when looking at successful vs unsuccessful startups, one might over-index a specific industry or geography in the training data. As a result, extracted “success patterns” might only partially apply to the full universe of opportunities out there, limiting your ability to spot all success candidates.
Data bias is a significant concern as it can lead to biased responses and skewed outcomes, resulting in inequality and ineffectiveness in the screening/investment selection process.
How to Mitigate Data Bias?
As for cognitive biases, the first step is to create awareness of data biases too. Only thereafter, we can leverage techniques like stratified sampling, oversampling and undersampling, or moderator variables. The latter is something that has proven to be extremely valuable in the context of startup screening, so let’s dive into this topic in a bit more detail.
Understanding Moderator Variables: In the context of research and statistics, a moderator variable affects the strength or direction of the relationship between an independent variable (=cause) and a dependent variable (=effect). In simpler terms, it can change how one factor affects another.
Addressing Data Bias with Moderators:
Clarifying Relationships: By examining moderator variables, we can better understand under which conditions certain relationships hold or don't hold. For instance, if we're studying the relationship between startup success (dependent variable) and investment received (independent variable), a moderator like "region of operation" might reveal that the relationship is stronger in urban areas compared to rural areas.
Identifying Hidden Biases: Sometimes, biases aren't evident until you introduce a moderator. For example, a dataset might show that a tech bootcamp improves job placement rates for all participants. But when the moderator "gender" is introduced, it could reveal a significant discrepancy in placement rates between men and women, indicating a potential bias.
Limitations:
Doesn't Eliminate Bias: Introducing moderator variables can help reveal and understand biases, but it doesn't inherently eliminate them. This requires additional initiatives like the sampling techniques mentioned above.
Requires Thoughtful Selection: Not all variables serve effectively as moderators. Researchers must have a theoretical or empirical reason to believe that a certain variable can act as a moderator.
Subscribe to DDVC to read the rest.
Join the Data Driven VC community to get access to this post and other subscriber-only content.
Join the Community