👋 Hi, I’m Andre and welcome to my newsletter Data Driven VC which is all about becoming a better investor with Data & AI. Join 34,930 thought leaders from VCs like a16z, Accel, Index, Sequoia, and more to understand how startup investing becomes more data-driven, why it matters, and what it means for you.
ICYMI, check out our most read episodes:
Brought to you by Kruncher - The Most Comprehensive AI Analyst for VCs
Imagine a partner who remembers every deal your firm has ever seen and tracks every company you care about. Kruncher is that partner, without the headcount.
By combining years of your firm knowledge with premium external data, our AI analyst delivers instant insights, investment memos, LP reporting, and continuous monitoring.
I’ve been working on the transformation of VC with AI & automation for more than 8 years. It started with my “ML for VC” research in Cambridge/UK and my subsequent PhD on the same topic at TU Munich, but only became reality when I joined Earlybird VC fulltime as an investor in early 2018 and got my hands dirty actually building stuff.
Throughout my journey, a lot has changed. And I’ve learned tons of lessons. Oftentimes the hard way.
In this post, I’d like to share my view on the evolution of “tech for VC” and why it’s the best time to start your own journey now.
Let’s jump in!
5 Phases of the “Tech for VC” Evolution
Zooming out and looking at evolution of tech for VC, I’d segment the last 80 years into 5 distinct phases:
1. The “Old World” (1950s - 2010ish)
In this period, it’s all manual, inefficient and exclusive. Your network was your net worth. Data was inaccessible. The only innovation was pen & paper to mouse & keyboard about 3 decades ago.
Sourcing? Via Network. No warm connection? Sorry for you.
Screening & Due Diligence? Via personal experience or experts in your close network. Objectivity? Dream on. Gut feeling, instincts, personal experiences. That’s the magic.
Portfolio value creation? Manual, 1 hour at a time. The true definition of a service business. Scaling advise was only possible with huge portfolio value creation teams and operating partners.
2. The “Big Data” Era (2010 - 2017ish)
At the beginning of the last decade, “big data” was THE thing.

What we meant by “Big Data”? Well, at least in my mind it was all about digitizing offline into online information at scale. Besides digitizing company registrations in public registers or publishing funding news online and not in newspapers anymore, one of the most important developments for investors was the digitization of personal networks via LinkedIn, Twitter, Facebook, Instagram, and more. Suddenly, you could identify and search people at scale while monitoring headcount or job postings of corporate accounts.
I started tracking relevant sources for investors in 2017 or so and stopped sometime in 2024 as we surpassed 500+ entries with an explosive growth. A manual list was just not the right approach to keep track anymore..
Accompanied by the digitization of offline information and the resulting rise of mass online data in the 2010s, new services evolved to enable targeted data collection and processing: the web scrapers. I’ve written before about how to scrape alternative data sources here.

In retrospect, very few investors adopted web scraping at scale, likely due to the relatively high technical entry barriers for non developers. To bridge this gap, we saw intermediaries evolve - the so-called commercial database providers. Companies that have the mission to collect first party data, match and verify it, and make it accessible to the relevant audiences - in this case investors. Crunchbase, Pitchbook, CB Insights, Dealroom - just to name a few that evolved during that period.
For the first time ever, investors could source startups and experts beyond their naturally limited human networks. The problem? Their human time was still limited and far from enough to sift through the masses of information. Not only did they face the unknown unknowns, the startups that exist but they never saw before, but they also got overwhelmed by data, losing direction and becoming inefficient.