šĀ Hi, Iām Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers:Ā 6,360+, +250 since last week
Following your great feedback on last weekās guest post covering Hustle Fundās data-driven journey, Iām happy to take a completely different angle and have Dries Faems contribute todayās episode. Dries is a Professor for Entrepreneurship, Innovation and Technological Transformation at the WHU Otto Beisheim School of Management, one of the leading entrepreneurial universities in Europe that is lucky to count the founders of Zalando, Rocket Internet, Forto, Flixbus, HelloFresh and many more unicorns to its alumni.

Iām particularly excited about this episode as it perfectly exemplifies how data-driven approaches can be leveraged outside of VC, for example in academic research, M&A or corporate innovation scouting. Thank you, Dries, for sharing your innovative work with us and providing a blueprint in your guest post below šš»
At the Chair of Entrepreneurship, Innovation and Technological Transformation of WHU, we have started building the WHU Founder Database, a data infrastructure which allows us to address exactly these kind of research questions. In this guest contribution, I want to provide a blueprint that will allow any data enthusiast to build a similar data infrastructure for his or her own organization. In this contribution, I will describe the following steps:
(i)Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Step 1: Identifying founders
(ii)Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Step 2: Collecting company data
(iii)Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Step 3: Collecting investor data
(iv)Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Step 4: Merging founder, company and investor data
(v)Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Ā Step 5: Developing use cases for your data infrastructure
Step 1: Identifying founders
A valuable data source for collecting Founder Data is LinkedIn. Doing a search in LinkedIn Sales Navigator or LinkedIn Recruiter on the terms āFounderā and āCo-Founderā in the category Job Title and your organization in the category āCompanyā or āSchoolā will give you a good overview of all the founders in your ecosystem.
Some people are quite proactive in claiming a founder role. As an organization, for instance, you might not be really interested in people, who have been the founder of the local synchronized swimming club in their village (yes this is a real exampleā¦). Another issue is that employees in corporates might claim āfounderā roles for specific activities within the company (i.e., I am the founder of the feminist book club at Googleā¦). This requires careful cleaning to make sure that only relevant founders are identified.
Whereas LinkedIn is a valuable tool for identifying founders, it cannot be used for unauthorized scraping of founder profiles. LinkedIn defines unauthorized scraping as āthe use of code and automated collection methods to make (up to) thousands of queries per second and evade technical blocks in order to take data without permission.ā Andre has provided more info on the doās and donāts of web scraping in this newsletter post.
Step 2: Collecting Company data: Ā
When you have identified the founders in your ecosystem, the next step is to retrieve more information on the founded companies. Today, quite some data providers exist, providing access to legal information on formally founded companies. NorthData is one potential data provider that you can consider.
NortData: This data provider gives access to structured company information, allowing you to collect information on topics such as when was the company founded, who are the legal owners of the company, when did legal changes in the company occur, ā¦. The good news is that NorthData has an API service that allows you to automate the search for all the companies that you have identified. The API also provides the opportunity for name matching.
Another option is to go directly to the ultimate data source yourself. In Germany, for instance, you can go to the website of the Unternehmensregister. Here you can collect information on the companies for free. However, this website meets the reputation of Germany as a country where the fax machine is still a relevant communication toolš

I have also seen the first experiments in leveraging ChatGPT to collect company information. My experience, however, is that ChatGPT can very convincingly ācreateā nonsense information about companies. It is important to realize that Large Language Models are predicting information instead of retrieving information and that, given the current status, predictions on company information are quite inaccurate
Step 3: Collecting Investor Data
Subscribe to DDVC to read the rest.
Join the Data Driven VC community to get access to this post and other exclusive subscriber-only content.
Join the CommunityA subscription gets you:
- 1 paid weekly newsletter
- Access our archive of 300+ articles
- Annual ticket for the virtual DDVC Summit
- Discounts to productivity tools
- Database Benchmarking Report
- Virtual & physical meetups
- Masterclasses & videos
- Access to AI Copilots

