Data-driven VC #20: The VC digitization journey
Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 5,238, +325 since last week
I started this newsletter in September last year with a clear thesis in mind: VC is fundamentally broken and technology can help us fix it. Initially, I shared a variety of learnings from my PhD research as well as my own journey at Earlybird including an overview of different manual and data-driven sourcing approaches, the “make versus buy question”, the most valuable commercial database providers, a step-by-step guide to scrape alternative data sources, the different approaches for de-duplication and entity matching, the importance of feature engineering for proper success scoring… and a lot more.
While diving deeper into this magic rabbit hole in the intersection of VC and data, I occasionally shared some technically rather high-level but still related experiments such as my 10x productivity guide with ChatGPT or a process overview for augmented VCs that attracted surprisingly much attention. Reading your diverse feedback DMs and comments (thank you so much!! always helps me to improve my content), I took a step back and tried to understand why these IMO high-level posts resonated so incredibly well whereas some other IMO super valuable deep dives performed comparably bad.
Seeking product-market fit
Assuming that post performance correlates with the value perceived by the readers (ceteris paribus), there must be a disconnect between the depth of the many deep dive posts I shared earlier last year and the level of technical sophistication of most VCs out there. Said differently, the further the distance between the level of sophistication of the reader from the level of sophistication of the content, the less valuable the content becomes and the worse it performs.
An analogy: If a Professor is supposed to learn from a primary school book, she’ll get bored right away. Little value. The same goes the other way around. If a primary school student is supposed to read academic papers, he’ll probably understand very little and at most takes away some inspiration. Again, little value. Both scenarios share that the level of sophistication of the content is too far away from the level of sophistication of the reader.
Yes, this insight seems obvious and is no rocket science but visualizing it helped me to make sense of the varying performance of different content pieces, i.e. my technically less sophisticated posts performed better than technical deep dives. With this rationale in mind, I tried to draw some conclusions about the VC audience and, more specifically, the status quo of the level of technical sophistication of VCs.
3 stages of the VC digitization journey
The majority of VCs still live in a world with manual workflows, a simple tool stack with a generic CRM system like Salesforce, basic Email and note-taking tools like Apple Mail and Notes, Slack and/or WhatsApp for ad-hoc communication, Teams or Zoom for calls, G-Suite or MS-Suite for document storing and that’s more or less it. This is the “Old-school” extreme on the left side of the figure above where seemingly the majority of VCs are still stuck.
Next, we see the “Productivity VCs” in the middle of the figure above who are either setup with a modern off-the-shelf tool stack from day one or, and this seems to be the majority of VCs, firms that have been pushing for productivity and migrated from the old world. They successfully took the first leap and upgraded their “Old-school” stack with modern VC-focused CRM systems like Affinity or Attio, automated their workflows with Zapier, leveraged Notion for knowledge sharing and are hungry to explore further automation potential via tools such as ChatGPT.
On the right side of the figure above and at the end (?) of the journey, there exist comparably few technically sophisticated “Data-driven VCs” developing their own solutions with scalable web crawlers and proxy servers such as Phantombuster or ScraperAPI, pipeline scheduling tools like Airflow, databases like PostgreSQL or Neo4j, proprietary back- and frontends, and a lot more. While I consider true “Data-driven VCs” as the ones building proprietary solutions and transforming the core of their business in-house, the leap from a “Productivity VC” to a “Data-driven VC” can also be taken via external solutions such as Specter, Harmonic, Gravity, SourceScrub or any other provider out there.
The “Number of VCs” on the y-axis across the three phases above is inversely related to the degree of digital sophistication. This trend seems not only intuitive but has also been confirmed by the data collected as part of the “Data-driven VC Landscape 2023” (stay tuned, crunching the data to publish the report soon). Please honestly participate in the anonymous poll below as it helps me to better calibrate my content.
While my technical deep dive posts are probably most useful to a smaller group of sophisticated “Data-driven VCs” as well as some innovative “Productivity VCs” that are willing to take the next step, the rather high-level posts seem to resonate well with a larger audience of “Old-school” and newbie “Productivity VCs” that are still earlier in their digitization journey.
Mapping my content performance data with the different phases of the VC digitization journey, I couldn’t have framed my conclusion better than my friend Pietro from EQT who commented on my 5k milestone post earlier this week: