👋 Hi, I’m Andre and welcome to my newsletter Data Driven VC which is all about becoming a better investor with Data & AI. ICYMI, check out some of our most read episodes:

Brought to you by FORESIGHT - The End-to-End Data Platform for Investors

You had the best intentions when you bought all those data sources like Crunchbase, Dealroom, Pitchbook, and Harmonic. You thought everything gets logged with that powerful CRM called Affinity. You wanted to connect it all to ChatGPT and start asking questions.

But then you realized that you’re missing the middle layer between your data and your LLM that connects the data siloes, eliminates the duplicates, and normalizes the data swamp.

This is why you need an “entity resolution” data infrastructure, aka the glue. But do you build it or buy it?

Here’s the answer

Happy Thanksgiving y’all! To celebrate this special day, I’m offering a 20% limited discount across all paid plans for Data Driven VC until Sunday night.

Since many of you are off for the weekend and might have some time to read, I’m happy to share an exciting new paper with you: “Generative AI-powered venture screening: Can large language models help venture capitalists?” by researchers from TU Munich and University of Bergamo, among others.

The Problem

… is clear: VCs see thousands of opportunities but can only invest in very few of them, knowing that even from the once they end up investing, it will be a small subset that drives the majority of the returns. The beauty of Power-Law distributions.

To find the needle in the haystack, most firms still filter deals with some combination of network, inbound noise and analyst time, topped with partner “gut feel.” Obviously, this is neither efficient nor (in most cases) effective. Reason to innovate.

I published a paper 5 years ago “Human Versus Computer: Benchmarking Venture Capitalists and Machine Learning Algorithms for Investment Screening” and keep tracking the research field ever since. So you can only imagine my excitement when I found this new paper 🤩

From raw dealflow to LLM-readable structure

The study leverages a real dataset of 61,814 startups from German early-stage VC Freigeist Capital, representing their full dealflow from 2018 to 2023 - not just funded companies, but everything that ever made it into their pipeline. For each venture, the researchers pull website content and feed it through GPT-3.5 to generate a 5–7 sentence description in natural language that captures what the company does, in which market, with what technology and at what maturity.

These descriptions are then embedded into a 1,536-dimensional vector space where semantically similar companies (say, modular robotics startups for SMEs) naturally cluster near one another, while traditional offline SMEs sit somewhere else. The authors even show that when you compress those 1,536 dimensions down to just two principal components for visualization, deep-tech ventures and traditional SMEs still separate cleanly.

In other words: even short, LLM-generated descriptions contain enough signal to meaningfully structure dealflow.

On top of this vector space, the authors build an LLM agent. It’s not just a chat interface, but a system that can plan, call tools, and query databases. The agent takes a human-defined investment hypothesis (for example, “bootstrapped B2B software with early proof of product-market fit” or “modular robotics for industrial SMEs”), breaks it down into steps, queries venture databases and other APIs, uses embedding search over the 61k companies, and returns a shortlist or clusters that match the thesis.

Importantly, the agent is not inventing companies or synthetic data. It is “just” structuring and filtering Freigeist’s real dealflow.

Upgrade your subscription to access our premium content & join the Data Driven VC community

Upgrade

Efficiency: 537× faster and roughly 1,000× cheaper

The first comparison is purely about efficiency. How long would a human analyst need to search for companies that fit a given thesis, do quick first-pass evaluations, and create a ranked shortlist?

Based on prior research and industry practice, the paper estimates that such a hypothesis-driven search would consume about two hours of a VC analyst’s time. That includes understanding the thesis, scanning a large universe, and iterating on a list of promising ventures.

The LLM agent performs the same conceptual task in about 13.4 seconds on average across 67 real queries. That implies a speed-up of roughly:

7,200 seconds (2 hours) / 13.4 seconds ≈ 537×

The cost difference is just as striking. A German VC analyst is roughly priced at $35 per hour, so a two-hour search costs about $70 in human time. The LLM agent, using GPT-3.5/4 priced at current API rates, costs about $0.069 per request.

That’s a ~1,000× reduction in marginal cost for each hypothesis-driven screening run.

If your firm deals with thousands of startups per year, the economic logic is obvious: you either use agents to pre-structure dealflow, or you accept that you’re systematically leaving opportunities uninspected or over-indexing on whatever happens to be in your network and inbox.

Effectiveness #1: Does it actually cluster ventures as well as humans?

Of course, none of this matters if the LLM simply produces useless groupings. So the paper rigorously tests categorization quality.

Because startup “success” is hard to measure and highly endogenous once VCs invest, the authors avoid direct outcome-based metrics at this stage. Instead, they evaluate the quality of clusters produced by the LLM and by human analysts using two standard unsupervised learning metrics:

The Silhouette score, which captures how similar companies are within the same cluster versus across clusters. Higher is better.
The Calinski–Harabasz Index, which compares between-cluster variance to within-cluster variance. Again, higher means clearer, better-separated structure.

As a sanity check, they first create a random clustering: companies are randomly assigned to six buckets. The result, unsurprisingly, is terrible – a slightly negative Silhouette score and a tiny Calinski–Harabasz value. There is essentially no interpretable structure.

Human VC analysts, by contrast, do very well. When they are asked to cluster ventures based on similarity, they achieve a Silhouette score around 0.37, which shows that they are good at forming coherent, distinct groups. However, their Calinski–Harabasz score is only about 8.43. Given the size and complexity of the dataset, it is hard for a human to maintain a global view of the 61k companies while also forming fine-grained clusters.

The LLM agent is then evaluated on the same task, using the full dataset and tuned to a similar number of clusters to allow a direct comparison. The result:

The Silhouette score is 0.35, essentially as good as the human analysts’ 0.37.
The Calinski–Harabasz Index jumps to 14.32, around 70% higher than the human benchmark.

This suggests a nuanced picture.

Humans are excellent at spotting local patterns and grouping similar companies once they’ve seen them. But they are overwhelmed by the global structure of a very large dataset.

The LLM agent, armed with embeddings and full-dataset access, is capable of both: it forms clusters that are about as internally coherent as those created by humans, and it organizes the entire venture landscape into globally well-separated, compact groups.

This is precisely what you want in a modern VC data stack: globally consistent, thesis-aligned structure at scale, which humans can then interrogate, refine, and eventually override where necessary.

Effectiveness #2: Do LLM-picked companies actually perform better?

To move beyond abstract clustering metrics, the authors also run a validation exercise on a subset of 9,911 ventures that Freigeist screened during October–November 2023.

For this subsample, they track two ex post outcomes up to April 2025:

Survival: whether the company is still active, proxied by website activity (HTTP 200 responses).
Funding: whether the company raised a new round after the screening.

They then check which ventures were:

selected by at least one human analyst,
selected by the LLM agent,
selected by both.

The result is reassuring for anyone betting on this technology. Companies selected by human analysts are, as expected, more likely to raise follow-on funding. But companies selected by the LLM agent are:

significantly more likely to survive,
and also more likely to raise funding than the baseline.

Interestingly, ventures picked by both humans and the LLM tend to show the strongest performance. This suggests that human and machine do not simply replicate each other’s judgment. They pick up on partially different quality signals, and the intersection of those signals may be particularly powerful.

For a data driven VC, this is an important design principle: you don’t necessarily want an agent that mimics your analysts; you want one that sees the world differently but in a way that is still grounded in your thesis - so that both can complement each other.

Share this article with others who might benefit.

How this maps into a VC workflow

The practical implications for VC operations are fairly clear: Augmented VC is the future.

First, LLM agents are best deployed at the very top of the funnel. They excel at turning a large, messy universe of startups into thesis-aligned clusters and ranked lists. This is exactly the part of the process that used to be offloaded to interns and junior analysts. Freigeist explicitly reports that after deploying their system, they no longer needed interns to pre-filter dealflow; partners and senior investors can now work directly with a structured view generated by the agent and cut out about 70% of non-fit startups automatically.

Second, humans remain central in the later stages. During deep due diligence, the same models are still used, but more as elastic research tools – summarising documents, exploring technical topics, or checking competitive landscapes – rather than as the main decision engine. Team quality, market dynamics, product nuance and deal-making are still fundamentally human activities.

Third, the paper underlines that hypothesis-driven, thesis-led use is crucial. The LLM agent performs best when it is asked to “look at the world through the lens of our thesis” rather than to optimize for some generic notion of a “good startup.” That means the real edge is less in the model itself and more in:

the clarity of the firm’s investment theses,
the quality and breadth of the underlying data, and
the design of the agent (tools, prompts, routing logic).

Finally, there is a note of caution. If many funds rely on similar models and similar underlying data, “mechanized convergence” becomes a risk: everyone’s screening outputs start to look the same. Over time, this can compress diversity of thought and make it harder to spot the truly contrarian opportunities. To avoid this, firms will need proprietary data sources, differentiated theses, and deliberate human challenge to the agent’s suggestions.

The emerging equilibrium: LLM-native vs. legacy VC

Smaller, LLM-native funds with a strong data stack can now scan and structure markets at a scale that previously required a big brand, a big team or both. This levels part of the playing field and erodes the advantage of pure headcount. At the same time, founders outside core ecosystems gain at least a chance to be surfaced by a systematic, thesis-led agent rather than being filtered out simply because they are not in the right network.

The paper does not claim that LLMs can pick unicorns better than a top-tier partner. What it shows is more modest, but also more actionable:

LLM agents are orders of magnitude faster and cheaper at hypothesis-driven screening.
They match human analysts on clustering quality and beat them on global structure.
Their selections correlate with better survival and funding outcomes.
They work best in combination with experienced investors, not as a replacement.

For anyone building the next generation of DDVC infrastructure, this is a pretty clear blueprint: put an LLM agent on top of your venture data stack, let it own the early structuring and screening, and let humans focus their scarce time and judgment where it really matters: conviction, calibration and winning the right deals.

Stay driven,
Andre

PS: Get 20% limited discount across all our paid plans until Sunday night to access all our premium content, community, and more

The 10x Analyst: LLM-Powered Startup Screening Experiment