👋 Hi, I’m Andre and welcome to my newsletter Data Driven VC which is all about becoming a better investor with Data & AI. ICYMI, check out some of our most read episodes:

Brought to you by Foresight - Unify and talk to your data

Private market data was disconnected, which made getting a single view of a company, a fund, or a portfolio challenging. Foresight was the first to use AI to connect your data - fund accounting, cap tables, CRM, KPIs and 3rd party - which revolutionized sourcing, diligence, and portfolio management.

Now we’re thrilled to unveil yet another first: letting you talk with your unified data through AI. Learn what you can do here:

Learn more

Welcome back to another “Essays” episode where we dive into the tools & playbooks of modern investors. The big question for every investor is how to predict startup success. While we’ve tackled this topic from various angles, mostly algorithmic with data science and AI, I’m excited to share a new, very simple and manual approach with you today.

TL;DR: Disagreement in your IC seems to be a great predictor of success ;)

Every investor knows the moment. You walk out of an IC where a deal has just split the room. One partner is convinced it’s a category-defining company. Another is equally convinced it’s nonsense. The default reaction in most firms: “Too controversial, let’s pass.”

A new 2024 paper from MIT Sloan by Luca Gius suggests that this instinct may be exactly backwards. Using data from 67 startup competitions and 2,650 startups, he finds a striking pattern: the more judges disagree about a startup, the more likely that startup is to succeed in the future. Even more interesting: this holds even after controlling for how “good” the startup looked on average at the time of evaluation.

So let’s dive in, go through a brief summary of the paper, and condense it into actionable takeaways for investors.

From Airbnb to a general rule

The paper starts from a now-classic story: Airbnb. Fred Wilson famously passed on the deal because his team “couldn’t wrap our heads around air mattresses on the living room floors as the next hotel room.” Paul Graham, on the other hand, loved it. The important point isn’t just that someone believed; it’s that reasonable experts violently disagreed.

Gius argues that Airbnb is not an isolated anecdote but an instance of a broader regularity: valuable, unique startup ideas tend to be polarizing. Uncontroversial ideas are either obvious and easy to copy, or mediocre and correctly ignored. In contrast, genuinely novel strategies clash with existing mental models. They are unfamiliar, hard to evaluate, and therefore spark disagreement.

The key claim is simple: if an idea has real potential for sustained competitive advantage, it probably cannot be universally comfortable at the moment of evaluation. Common opinion cannot be a source of edge.

What the dataset looks like

To move from theory to evidence, the paper relies on a rich data set. It covers 67 venture competitions and 118 funding rounds, most of them using highly standardized rubrics. Each startup is evaluated by at least three judges, typically five, and each judge assigns scores between 1 and 7 on up to 24 dimensions such as team, product, market, business model and so on. Judges are mostly former founders, angels, VCs, and industry experts.

These competition records are then matched to Crunchbase and PitchBook data to follow outcomes through 2023: how much funding the startups raised, whether they reached at least 1 million USD in annual revenue, how they rank in Crunchbase’s prominence measures, whether they exited, and whether they shut down. The result is a panel that connects early, structured expert assessments to long-run performance.

Two numbers already stand out before looking at outcomes. First, around 62% of the variation in scores comes from judges disagreeing about the same startup rather than differences between startups. Second, a standard inter-rater reliability measure, Cohen’s kappa, is just 0.13, a level that in medical research would be considered extremely low agreement. In plain English: experts looking at the same pitch very often see very different things.

Measuring disagreement and showing it matters

The central metric Gius uses is simple and intuitive. For each startup in each competition, he computes two things:

The average score across judges = a standard measure of perceived quality.
The standard deviation of those scores = a measure of how much the judges disagreed.

Imagine two startups with the same average score of 4.0 out of 7. One receives scores that are all 4s. The other receives a 2, a couple of 3s, a 5 and a 6. On paper they are equally attractive if you only look at the average. Yet the second is clearly more polarizing.

Across the dataset, a 20% increase in this disagreement measure is associated with roughly one-third more future funding, a higher probability of achieving at least one million in annual revenue, a higher forecasted probability of exit, and a better rank in Crunchbase. This relationship remains strong after controlling for average score, competition and industry fixed effects, founding year, and whether the startup already held a patent.

The nuance that matters most for practitioners: disagreement predicts upside without predicting more downside. Polarizing startups are not more likely to fail or close down. The pattern is asymmetric: more disagreement means a better chance of unusually positive outcomes, but not a higher chance of catastrophe.

This is fundamentally different from a simple “risk” story where high dispersion just means high variance.

Upgrade your subscription to access our premium content & join the Data Driven VC community

Upgrade

The uniqueness paradox and why distinctiveness matters

The paper embeds these findings in what strategy scholars call the “uniqueness paradox.” Managers and founders face a basic trade-off. Familiar strategies are easy to understand and finance, but easy for others to copy. Unique strategies may be powerful, but they are harder to explain, analyze, and underwrite. Investors, faced with uncertainty and career risk, tend to discount unfamiliar ideas.

If we believe that real, durable edge comes from unique ways of combining and deploying resources, then truly valuable entrepreneurial “theories” of the world must initially be held by a minority. They should have both champions and detractors. That is exactly what the disagreement measure is picking up.

To test whether this mechanism is really about uniqueness rather than something else, Gius constructs a text-based distinctiveness score. Using modern embedding models, he measures how dissimilar each startup’s short application description is from all other descriptions in the dataset. A higher score means a more unique value proposition compared with the competition set.

Two important facts follow from this:

First, more distinct startups attract more disagreement. A one standard deviation increase in distinctiveness leads to a four to seven percent increase in score dispersion.
Second, the predictive power of disagreement essentially disappears among the half of startups with the least distinct descriptions. For these “normal” ideas, disagreement looks mostly like noise. For the more unique half, disagreement becomes a strong indicator of future success.

For investors, the implication is clear. It is not enough to “love contrarian deals” in the abstract. The deals where disagreement is most informative are those where the underlying proposition is genuinely different from the rest of your pipeline, not just divisive for random reasons.

Where judges (dis)agree most

Because the competitions use structured rubrics, the paper can also ask what people disagree about.

When scores are broken down by dimension, judges tend to agree most on team quality. They are far more aligned on whether this looks like a strong founding team than on other aspects of the business. By contrast, they disagree most on business model and related strategic questions: scalability, pricing, potential to create downstream value, and the power of incumbents.

That finding resonates with everyday venture practice. Evaluating people has a strong intuitive and experiential component; investors have internalized similar heuristics for what an “A-team” looks like. But evaluating a novel way of making money – especially when it is intertwined with a new technology or market structure – forces people to lean heavily on their own mental models and priors. Those priors differ, so scores diverge.

An additional pattern underscores this point. After a startup is granted its first patent, disagreement about it tends to increase, even when controlling for other factors. Patents here are a proxy for technological novelty. They appear to widen the range of interpretations about commercial potential rather than shrinking it.

Who are the contrarians?

The judges themselves are heterogeneous too. By matching judges to their LinkedIn profiles, the paper examines which backgrounds correlate with more frequent dissent.

The strongest predictor is having been a founder. Former entrepreneurs are significantly more likely to deviate from the consensus score on a given startup. Their grades are, on average, seven percent further from the other judges’ average than those of non-founders. Traditional educational markers such as MBAs or PhDs do not show similarly strong patterns.

This is intuitively appealing. People who have self-selected into starting companies in the past are more willing to back a view of the world that others don’t share. They are used to acting on non-consensus beliefs. In a selection process, that means they are more likely to see something in a startup that the rest of the panel does not, for better or worse.

For investors designing evaluation processes, this suggests something practical: ex-founders are valuable not only as operators or scouts but also as structured sources of disagreement. The question then becomes how to harness that contrarian signal without letting noise dominate.

Share this article with others who might benefit.

Takeaways

So what should a VC, accelerator, or competition organizer change in practice?

The first step is simply to start measuring votes after the pitch. At Earlybird, we introduced a process in 2018 where everyone who joins an IC meeting (partner, principal, associate, analyst) receives an automated survey after the end of each meeting to vote 0=bad to 5=strong across market, product, business model, traction, team, defensibility, round structure, etc. + most importantly “would you make the investment” with 1=strong no, 2=no, 3=weak no, 4=weak yes, 5=yes, 6=strong yes.

That’s the first action after every meeting to collect unbiased perspectives and only thereafter we start the discussion. With the votings at hand, you can now compute a disagreement score per dimension and then collectively per startup - providing the foundation for your subsequent discussion.

Once you have these metrics, there are a few clear implications:

You should be especially cautious about rigid consensus rules, particularly for deals that are clearly distinct from your usual pattern. If a startup is both textually or conceptually unique and polarizing, your prior should not be “this is too controversial to touch” but “this might be exactly where our edge lies, let’s understand why we disagree.”
At the same time, you should resist the temptation to adopt a naive champion rule where any polarizing deal can be pushed through if one person loves it. The paper shows that disagreement is informative primarily for more unique ideas. For generic SaaS company xyz, disagreement is just as likely to be noise as signal.
You should consciously include ex-founders and other “constructive contrarians” in your evaluation process and track over time which dissenters are actually predictive. A world-class investment process does not only ask “Did we agree?” but also “When we didn’t agree and we went ahead anyway, how did those bets perform?”

Finally, there is a psychological and cultural shift implied here. In many firms, harmony in IC is seen as a sign of quality: we did our homework, debated, converged. This work suggests a different framing. If you never have high-disagreement deals in your portfolio, you may have optimized away exactly the type of opportunities that create outlier returns.

A simple question you can ask yourself as a founder or investor, inspired by Peter Thiel’s famous line, is not just “What do you believe that others don’t?” but “Is this idea polarizing among reasonable experts?” If the answer is yes, and the idea is genuinely distinct, the data now suggests you should lean in rather than shy away.

Hope you enjoyed this new research paper as much as I did. Food for thought, I guess.

Stay driven,
Andre

PS: Check out Foresight to unify and talk to all your data here

🚀This Factor Predicts Startup Success