Data-driven VC #29: Beyond sourcing and screening - how data-driven approaches create alpha in the long run
Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 7,183+, +170 since last week
Brought to you by Affinity - Find, manage, and close more deals with Affinity
Affinity Campfire brings together industry-leading dealmakers to explore what it means to be a part of the leading relationship intelligence ecosystem. Dive deeper into the importance of data-driven sourcing and what dealmaking will look like in 2023 as we navigate a changing global landscape.
“VC is a finding and picking the winners game”
This statement might feel repetitive to the long-term readers among you as I not only pointed this out in the very first episode but in many other occasions too. Yet, it’s crucial to keep this in mind when critically rethinking the VC investment process. You need to start somewhere and focus is key.
Morten Sorensen (2007, “How smart is smart money”) found in his study that about 2/3 of the VC value is created in the sourcing and screening stages of the investment process.
Following this value-oriented approach, the majority of my newsletter thus far was focused on the sourcing and screening stages:
Difference between human-centric and data-centric sourcing approaches
Make versus buy and the results of my startup database benchmarking
A list of alternative data sources and how to scrape them at scale
Feature engineering, missing data and how to make sense of it all
…
Getting out of the sourcing and screening rabbit hole (although there’s certainly more to explore here in the future), I want to dedicate today’s post to answering two closely related and frequently asked questions:
“Will data-driven approaches go beyond sourcing and screening?”
“Can data-driven approaches actually create alpha in the long run?”
My short answer to both: For sure! Once a scalable scraping infrastructure, reliable data pipelines, inter and intra-entity matching, creative feature engineering, deterministic and ML-based screening approaches as well as an intuitive UI/UX got set up, we can take the next step and rethink the due diligence (DD) and portfolio value creation (PVC) stages of the VC investment process.
Due diligence as you know it, just without the pain
Market sizing, competitor benchmarking, extensive data requests, slow responses, back-and-forth clarifications, and endless follow-up questions. You know it, I know it, we all know it: DD is painful and nobody likes it. Nevertheless, it’s an important part of the investment process.
Looking to solve the well-known pain points with data-driven approaches, I don’t want to reinvent the wheel but automate the pain away. Some examples:
Market sizing: While bottom-up calculations tend to follow a deterministic logic and require more assumptions on Price and Quantity (as recently described by my friend
from in his exceptional “TAM Masterclass”), top-down approaches and considering variety of market studies is more of a data collection job. Clearly, the latter can be done by LLMs to serve as a starting point for further analyses.
Competitive landscapes: Who has a similar product offering? Are there differences? How much funding did they raise? From which investors? These and many more questions need to be answered as part of competitive benchmarking. Days and weeks of manual data collection work can now be done with a simple prompt and LLMs. Even non-fine-tuned models like GPT-4 provide a great starting point, not to speak about our internal experiments with more advanced vector search, fine-tuned models, and tons of structured and unstructured startup data.
Traction and KPI analysis: Excel sheets come in different forms and shapes. Thankfully, today’s parsing algorithms allow us to extract data from every document, feed it into a standardized table and leverage it for large-scale benchmarking. An enterprise company with a top-down sales motion selling infrastructure software with an ARR between 0-20M? Easy. Once the data is stored in a structured form, benchmarks like the a16z growth guide can be codified and automatically leveraged to assess new investment opportunities at scale.