Automate Startup Scouting With AI: Identify, Enrich, Classify & Reach Out
Full Step-by-Step Guide
👋 Hi, I’m Andre and welcome to my newsletter Data-driven VC which is all about becoming a better investor with data & AI. Every Tuesday, I publish “Insights” to digest the most relevant startup research & reports. Every Thursday, I publish “Essays” that cover hands-on insights about data-driven innovation & AI in VC, and every Sunday, I publish “Picks” to spotlight the hottest Stealth, Early, and Growth Startups. Follow along to understand how startup investing becomes more data-driven, why it matters, and what it means for you.
Current subscribers: 26,385, +130 since last week
Brought to you by Affinity – Webinar with OpenAI & Data-Driven VC
AI tools from OpenAI, Google, Affinity, and others are changing how dealmakers work—especially in their CRM—but there are both challenges and opportunities ahead as adoption grows.
Join Affinity’s co-Founder Ray Zhou alongside Adam Perelman, Engineering Manager at OpenAI, and myself 👋🏻 to unpack the best ways of incorporating AI in private capital workflows. Tune in on September 5.
Vlastimil Vodicka is the Founder & CEO of Leadspicker and ranked in the top 20 thought leaders list of the Data-Driven VC Landscape 2024. I’ve been closely following his content and recently came across one of his LinkedIn posts where Vlasti demonstrated how he managed to extract, classify, and reach out to over 6,000 AI startup founders from New York using web scraping and AI.
Today, I’m excited to have him share the full workflow as a step-by-step guide with us in the guest post below. At the end of this article you can find instructions on how to claim your free access to scrape thousands of startup founders. Thank you Vlasti!
If you're a fan of the Data-Driven VC newsletter, you've likely experimented with various tools to scrape LinkedIn and employed GPT for classification and categorization. In my previous guest post, I detailed the process of integrating multiple tools and custom Google Sheets app script to connect the GPT API together with prompts, enabling startup classification within a Google Sheet.
After many months and numerous meetings with VCs eager to be more data-driven, we realized that our low-code experiment connecting OpenAI's GPT to Google Sheets is still quite complicated. While creating a workflow that connects 5-7 different tools in low-code might seem straightforward for tech-savvy individuals, it still requires a considerable amount of technical skill and knowledge. To complicate matters further, the tools and AI models are constantly evolving, rendering much of what I previously wrote outdated.
Given my background as a VC before becoming a startup founder, I decided to transform a complex AI-enabled startup scouting workflow—originally reliant on multiple complicated tools—into a simple, pre-built workflow that still allows for customization. Today, I will share this with you, including the exact prompts used and the technology behind it.
By reading this blog post, you will learn:
How we scrape startup founders from LinkedIn Sales Navigator
How to scrape more than 2500 people
The exact GPT prompts used to evaluate the fit of each company.
How we gather additional data from websites and LinkedIn.
Multiple GPT prompts for crafting hyper-personalized outreach messages.
How to use the tool for free
For context, our company, Leadspicker, has evolved from a lead generation agency supporting over 100 startup accelerators with scouting campaigns into an AI-powered platform that streamlines sales and marketing workflows by automating customer research, data enrichment, and personalized messaging.
This work, which previously required tens of people to complete manually, has been replaced by our automation and AI solutions. Leadspicker integrates various scrapers and data sources, using AI as a decision engine to create personalized, targeted outreach.
Extracting Data from LinkedIn Sales Navigator
In the first step of my video, I extracted data from LinkedIn Sales Navigator by focusing on individuals with job titles such as founders and co-founders of companies with fewer than 50 employees in New York state over the past few years.
To achieve this, we utilized basic search filters in Sales Navigator and employed a boolean query “AI” OR “Artificial Intelligence” to further narrow down the search for our specific purpose.
If you want to scrape data from Linkedin Sales Navigator there are few things you need to keep in mind:
Your account might be at risk unless...
Anyone who has used automation tools like PhantomBuster, Dux-Soup, Browse.ai, or similar tools that execute activities from your account will eventually realize that LinkedIn can identify these scraping activities.
Depending on the strength of your account and whether you pay for the premium version of LinkedIn, you might get blocked. Most people can perform only a few hundred visits before LinkedIn detects the activity and takes action.
This risk is mitigated when using tools like Leadspicker or third-party data providers such as Proxycurl. The reason is that these tools employ a proprietary method of scraping data without the need to use your profile for making any visits, ensuring your account remains safe.
Sales Navigator doesn’t let you scrape more than 2,500 profiles, unless..
Sales Navigator is a powerful tool for finding leads on LinkedIn, offering advanced search filters to target ideal prospects. However, a significant limitation is its paging feature, which caps search results at 100 pages or 2,500 results per search, making it challenging to gather a large number of leads.
To work around this limitation, you can split your search into multiple searches by adding an additional filter. One effective method is to add a geographic filter and create separate searches for each state or region. This process can be time-consuming if done manually.
That's why I recommend this simple Google Spreadsheet template by my friend and ProductHunt enthusiast Fabian Maume. It allows you to split any search results into smaller batches efficiently.
With this spreadsheet, you can easily split any founders by criteria such as country or tenure in their current job or company. Each filter results in fewer than 2,500 entries, making the data easily scrapable.
All you need to do is copy and paste the link to the Sales Navigator filter into our robot. The data will be scraped automatically when you click the start button.
GPT Prompting: How to Determine If a Company Is a Startup or Not
Not every individual LinkedIn identifies as a founder is actually leading a startup. To accurately determine if a company is a startup, we implemented the GPT-4o API directly into our platform to analyze both scraped LinkedIn data and website summaries.