First steps as a data-driven VC without coding skills: AI-powered Google Sheet to track LinkedIn profiles
DDVC #32: Where venture capital and data intersect. Every week.
šĀ Hi, Iām Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers:Ā 7,747+, +162 since last week
Brought to you by VESTBERRY - the future of portfolio management.
Harness real-time data to leapfrog in the investment game and uncover hidden opportunities. Make data-driven decisions with VESTBERRY's intuitive platform.
Incentives to become more data-driven are obvious, yet many firms are stuck in a buy versus build trade-off and end up doing nothing.
āWhat are the first steps on the journey from productivity VC to data-driven VC?ā Today, Iām incredibly excited to have Vlastimil VodiÄka, CEO and Founder of Leadspicker, share his step-by-step guide to start leveraging AI as a VC - without coding skills - in his guest post below.
If you're a fan of Andreās newsletter, chances are you're already intrigued by using LinkedIn for startup sourcing and taking advantage of GPT's powerful classification and categorization capabilities, even if you lack coding skills.
As part of a little no-code experiment, we've put together a comprehensive guide on how to connect OpenAI's GPT to your Google Spreadsheet. This guide will show you how to evaluate companies scraped from LinkedIn and determine if they're a startup or not, and how to categorize them into predefined categories directly in your spreadsheet using GPT.
By reading this blog post, you will learn:
How to Add OpenAI's GPT to Your Google Spreadsheet
Challenges We Faced When Scraping Data from LinkedIn
How to Classify and Categorize Startups with GPT via Your Google Spreadsheet
Outcomes of Our Little No-Code Experiment
Extracting Data from Linkedin Sales Navigator
For this experiment, we extracted data from LinkedIn Sales Navigator, focusing on new founders in the Central and Eastern (CEE) region within the last two years. To achieve this, we utilized advanced search filters in Sales Navigator, targeting specific job titles. We recommend using a boolean query such as "founder" OR "co-founder" OR "CEO" OR "CTO" instead of the filter options that LinkedIn offers, as it can provide more accurate and comprehensive results.
We chose to focus on the CEE region, but you can select any region according to your specific interests.
To extract data from Sales Navigator, we recommend using no-code tools such as PhantomBuster, Duck-soup or Apify.
Challenges we faced:Ā
Splitting your search region into smaller data samples of a maximum of 2,000 contacts per batch can help ensure that you're able to extract all relevant data. LinkedIn doesn't display the exact number of people who match your search criteria in their database, so this approach can be useful in making sure that you retrieve all the necessary data.
False positive: LinkedIn can also display completely irrelevant profiles that don't match your search criteria, and it's unclear why this happens. This can make it difficult to accurately categorize and analyze extracted data. As a result, it's important to carefully clean and filter the data before using it for further analysis. While it can be time-consuming, this step is crucial to ensure the accuracy and reliability of your data.
Data cleaning: it's important to deduplicate and clean the data by removing obviously irrelevant profiles. This step is necessary to ensure that the final data set is accurate.
Data enrichment may be necessary depending on the tool you used for extracting data. Make sure that you have scraped the information from LinkedIn company profiles, such as the company description, headquarters location, and the number of employees. PhantomBuster and Duck-soup can do the trick
Keep in mind the Linkedin profile visit limits to avoid your LinkedIn account from getting blocked, as already described in his article.Ā
Outcome: We were able to export a total of 29,763 profiles. After some basic deduplication and data cleaning, we ended up with 21,311 unique firms in the dataset.Ā
Now, let's find out if they really are new startups and in which industries they can be classified to.
How to add GPT-3.5 to Google Spreadsheet
Go to Google Sheet, where you want to add GPT-3 -> go to Extensions -> Apps Script
Copy and paste the attached code into your Google Apps Script Project
/**
* Generates text using OpenAI's GPT-3 model
* @param {string} prompt The prompt to feed to the GPT-3 model
* @param {string} cell The cell to append to the prompt
* @param {number} [maxWords=10] The maximum number of words to generate
* @return {string} The generated text
* @customfunction
*/
function runOpenAI(prompt, cell, maxWords) {
const API_KEY = "YourAPIkey";
maxTokens = 100
if (maxWords){maxTokens = maxWords * 0.75}
model = "gpt-3.5-turbo"
prompt = prompt+cell+":"
temperature= 0
Ā // Set up the request body with the given parameters
Ā const requestBody = {
Ā Ā Ā "model": model,
Ā Ā Ā "messages": [
Ā Ā Ā Ā Ā Ā Ā {"role": "system", "content": "You are a helpful assistant that answers questions."},
Ā Ā Ā Ā Ā Ā Ā {"role": "user", "content": prompt},
Ā Ā Ā ],
Ā Ā Ā "temperature": temperature,
Ā Ā Ā "max_tokens": maxTokens
Ā };
Ā console.log(requestBody)
Ā // Set up the request options with the required headers
Ā const requestOptions = {
Ā Ā Ā "method": "POST",
Ā Ā Ā "headers": {
Ā Ā Ā Ā Ā "Content-Type": "application/json",
Ā Ā Ā Ā Ā "Authorization": "Bearer "+API_KEY
Ā Ā Ā },
Ā Ā Ā "payload": JSON.stringify(requestBody)
Ā };
Ā // Send the request to the GPT-3 API endpoint for completions
Ā const response = UrlFetchApp.fetch("https://api.openai.com/v1/chat/completions", requestOptions);
Ā console.log(response.getContentText())
Ā // Get the response body as a JSON object
Ā const responseBody = JSON.parse(response.getContentText());
Ā //let answer= responseBody.choices[0]["text"].text
Ā let answer= responseBody.choices[0]["message"]["content"]
Ā // Return the generated text from the response
Ā return answer
}