Data-Driven VC

Data-Driven VC

Share this post

Data-Driven VC
Data-Driven VC
Data-driven VC #29: Beyond sourcing and screening - how data-driven approaches create alpha in the long run
Copy link
Facebook
Email
Notes
More
Essays

Data-driven VC #29: Beyond sourcing and screening - how data-driven approaches create alpha in the long run

Where venture capital and data intersect. Every week.

Andre Retterath's avatar
Andre Retterath
Mar 30, 2023
∙ Paid
10

Share this post

Data-Driven VC
Data-Driven VC
Data-driven VC #29: Beyond sourcing and screening - how data-driven approaches create alpha in the long run
Copy link
Facebook
Email
Notes
More
3
1
Share

👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.

Current subscribers: 7,183+, +170 since last week


Brought to you by Affinity - Find, manage, and close more deals with Affinity

Affinity Campfire brings together industry-leading dealmakers to explore what it means to be a part of the leading relationship intelligence ecosystem. Dive deeper into the importance of data-driven sourcing and what dealmaking will look like in 2023 as we navigate a changing global landscape.

Watch on-demand now


“VC is a finding and picking the winners game”

This statement might feel repetitive to the long-term readers among you as I not only pointed this out in the very first episode but in many other occasions too. Yet, it’s crucial to keep this in mind when critically rethinking the VC investment process. You need to start somewhere and focus is key.

Morten Sorensen (2007, “How smart is smart money”) found in his study that about 2/3 of the VC value is created in the sourcing and screening stages of the investment process.

Following this value-oriented approach, the majority of my newsletter thus far was focused on the sourcing and screening stages:

  • Difference between human-centric and data-centric sourcing approaches

  • Make versus buy and the results of my startup database benchmarking

  • A list of alternative data sources and how to scrape them at scale

  • Entity matching and how to create a single source of truth

  • Feature engineering, missing data and how to make sense of it all

  • ML-based startup screening at scale

  • Knowledge graphs and social networks of founders

  • …

Getting out of the sourcing and screening rabbit hole (although there’s certainly more to explore here in the future), I want to dedicate today’s post to answering two closely related and frequently asked questions:

  1. “Will data-driven approaches go beyond sourcing and screening?”

  2. “Can data-driven approaches actually create alpha in the long run?”

My short answer to both: For sure! Once a scalable scraping infrastructure, reliable data pipelines, inter and intra-entity matching, creative feature engineering, deterministic and ML-based screening approaches as well as an intuitive UI/UX got set up, we can take the next step and rethink the due diligence (DD) and portfolio value creation (PVC) stages of the VC investment process.

Due diligence as you know it, just without the pain

Market sizing, competitor benchmarking, extensive data requests, slow responses, back-and-forth clarifications, and endless follow-up questions. You know it, I know it, we all know it: DD is painful and nobody likes it. Nevertheless, it’s an important part of the investment process.

Looking to solve the well-known pain points with data-driven approaches, I don’t want to reinvent the wheel but automate the pain away. Some examples:

  • Market sizing: While bottom-up calculations tend to follow a deterministic logic and require more assumptions on Price and Quantity (as recently described by my friend

    CJ Gustafson
    from
    Mostly metrics
    in his exceptional “TAM Masterclass”), top-down approaches and considering variety of market studies is more of a data collection job. Clearly, the latter can be done by LLMs to serve as a starting point for further analyses.

  • Competitive landscapes: Who has a similar product offering? Are there differences? How much funding did they raise? From which investors? These and many more questions need to be answered as part of competitive benchmarking. Days and weeks of manual data collection work can now be done with a simple prompt and LLMs. Even non-fine-tuned models like GPT-4 provide a great starting point, not to speak about our internal experiments with more advanced vector search, fine-tuned models, and tons of structured and unstructured startup data.

  • Traction and KPI analysis: Excel sheets come in different forms and shapes. Thankfully, today’s parsing algorithms allow us to extract data from every document, feed it into a standardized table and leverage it for large-scale benchmarking. An enterprise company with a top-down sales motion selling infrastructure software with an ARR between 0-20M? Easy. Once the data is stored in a structured form, benchmarks like the a16z growth guide can be codified and automatically leveraged to assess new investment opportunities at scale.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Andre Retterath
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More