Data-driven VC #25: A blueprint to map the entrepreneurial footprint of organizations
🔥Inside WHU's data-driven journey
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 6,360+, +250 since last week
Following your great feedback on last week’s guest post covering Hustle Fund’s data-driven journey, I’m happy to take a completely different angle and have Dries Faems contribute today’s episode. Dries is a Professor for Entrepreneurship, Innovation and Technological Transformation at the WHU Otto Beisheim School of Management, one of the leading entrepreneurial universities in Europe that is lucky to count the founders of Zalando, Rocket Internet, Forto, Flixbus, HelloFresh and many more unicorns to its alumni.
I’m particularly excited about this episode as it perfectly exemplifies how data-driven approaches can be leveraged outside of VC, for example in academic research, M&A or corporate innovation scouting. Thank you, Dries, for sharing your innovative work with us and providing a blueprint in your guest post below 🙏🏻
At the Chair of Entrepreneurship, Innovation and Technological Transformation of WHU, we have started building the WHU Founder Database, a data infrastructure which allows us to address exactly these kind of research questions. In this guest contribution, I want to provide a blueprint that will allow any data enthusiast to build a similar data infrastructure for his or her own organization. In this contribution, I will describe the following steps:
(i) Step 1: Identifying founders
(ii) Step 2: Collecting company data
(iii) Step 3: Collecting investor data
(iv) Step 4: Merging founder, company and investor data
(v) Step 5: Developing use cases for your data infrastructure
Step 1: Identifying founders
A valuable data source for collecting Founder Data is LinkedIn. Doing a search in LinkedIn Sales Navigator or LinkedIn Recruiter on the terms ‘Founder’ and ‘Co-Founder’ in the category Job Title and your organization in the category ‘Company’ or ‘School’ will give you a good overview of all the founders in your ecosystem.
Some people are quite proactive in claiming a founder role. As an organization, for instance, you might not be really interested in people, who have been the founder of the local synchronized swimming club in their village (yes this is a real example…). Another issue is that employees in corporates might claim ‘founder’ roles for specific activities within the company (i.e., I am the founder of the feminist book club at Google…). This requires careful cleaning to make sure that only relevant founders are identified.
Whereas LinkedIn is a valuable tool for identifying founders, it cannot be used for unauthorized scraping of founder profiles. LinkedIn defines unauthorized scraping as ‘the use of code and automated collection methods to make (up to) thousands of queries per second and evade technical blocks in order to take data without permission.’ Andre has provided more info on the do’s and don’ts of web scraping in this newsletter post.
Step 2: Collecting Company data:
When you have identified the founders in your ecosystem, the next step is to retrieve more information on the founded companies. Today, quite some data providers exist, providing access to legal information on formally founded companies. NorthData is one potential data provider that you can consider.
NortData: This data provider gives access to structured company information, allowing you to collect information on topics such as when was the company founded, who are the legal owners of the company, when did legal changes in the company occur, …. The good news is that NorthData has an API service that allows you to automate the search for all the companies that you have identified. The API also provides the opportunity for name matching.
Another option is to go directly to the ultimate data source yourself. In Germany, for instance, you can go to the website of the Unternehmensregister. Here you can collect information on the companies for free. However, this website meets the reputation of Germany as a country where the fax machine is still a relevant communication tool😉
I have also seen the first experiments in leveraging ChatGPT to collect company information. My experience, however, is that ChatGPT can very convincingly ‘create’ nonsense information about companies. It is important to realize that Large Language Models are predicting information instead of retrieving information and that, given the current status, predictions on company information are quite inaccurate