Predicting Startup Success With Company Descriptions and a Fused LLM

👋 Hi, I’m Andre and welcome to my newsletter Data-Driven VC which is all about becoming a better investor with Data & AI. Join 27,710 thought leaders from VCs like a16z, Accel, Index, Sequoia, and more to understand how startup investing becomes more data-driven, why it matters, and what it means for you.

Subscribe now

Brought to you by Deckmatch - Agentic Workflows and APIs for Data-Driven VCs

Connect your top-of-funnel to Deckmatch and transform pitch decks and URLs into structured and insightful data. Get detailed firmographic and people data, in-depth competitive and market analysis, and personalized investment memo without lifting a finger. The cherry on the cake? It's all seamlessly synced to your preferred tools like Affinity through our API integrations.

Never miss a deal, ditch the donkey work, and build meaningful relationships faster.

Try Deckmatch

“VC is a Finding and Picking the Winners Game”

This statement might feel repetitive to the long-term readers among you as I not only pointed this out in the very first episode but in many other occasions too. Yet, it’s crucial to keep this in mind when critically rethinking the VC investment process. You need to start somewhere and focus is key.

❝

Morten Sorensen (2007, “How smart is smart money”) found in his study that about 2/3 of the VC value is created in the sourcing and screening stages of the investment process.

Following this value-oriented approach, the majority of my early DDVC episodes were focused on the sourcing and the subsequent data processing stages.

❝

Get 40% discount and access 180+ deep dive articles (like the ones below), automation templates, AI copilots, our database benchmarking, conference recordings, webinars, and a lot more via “The Lab”

Get 40% discount

Equally important, however, are the screening and the initial decision making stages. I dedicated one of my PhD papers to this topic and summarized the most important insights in the “How to automate startup screening” episode.

While I’ve recognized an increasing number of innovative sourcing and data collection approaches, I’ve seen little progress in the screening and decision making stages.

Until recently where I came across a paper from our neighbours at LMU Munich “A Fused Large Language Model for Predicting Startup Success” who found that these models can predict startup success with textual company descriptions from databases such as Crunchbase.

In light of the importance of this paper for the overall VC investment process, I decided to spotlight the study and share the most relevant insights with you today!

Why Does it Matter?

The authors trained and evaluated a fused ML model that combines structured data (e.g., founder details, funding history) with unstructured textual descriptions from commercial startup data providers to predict startup success. The study finds that incorporating textual self-descriptions significantly enhances the predictive power of the model, providing a more accurate decision-making tool for investors.

Subscribe to DDVC to read the rest.

Join the Data Driven VC community to get access to this post and other subscriber-only content.

Join the Community

Predicting Startup Success With Company Descriptions and a Fused LLM

“VC is a Finding and Picking the Winners Game”

Why Does it Matter?

Subscribe to DDVC to read the rest.

Keep Reading

Become a better investor with data & AI