Data-driven VC #15: What social graphs of founders and VCs tell us
Where venture capital and data intersect. Every week.
👋 Hi, I’m Andre and welcome to my weekly newsletter, Data-driven VC. Every Thursday I cover hands-on insights into data-driven innovation in venture capital and connect the dots between the latest research, reviews of novel tools and datasets, deep dives into various VC tech stacks, interviews with experts and the implications for all stakeholders. Follow along to understand how data-driven approaches change the game, why it matters, and what it means for you.
Current subscribers: 3,340, +220 since last week
Today’s episode is a dedicated deep dive on social network analysis of founders and their investors, and its ability to reveal otherwise hidden ecosystem insights. Let’s start with the basics.
Constructing a social graph
A social network can be defined as a social structure comprising a set of people, groups and organizations, sets of ties and interactions between the actors. A social graph is a diagram taken from graph theory that illustrates a social network with two basic components:
Nodes represent entities in the network (such as “people” like founders or individual investors or “companies” like startups and VC firms) and can hold self-properties (such as weight, size, color and any other attribute) and network-based properties (such as Degree- number of neighbors or Cluster- a connected component the node belongs to etc.).
Edges represent the connections between the nodes (like money invested or social media following of each other) and might also hold properties (such as weight representing the amount invested or direction of a relationship).
These two basic elements can describe multiple phenomena, such as physical electricity networks, roads network, social connections or - most important for us - a Startup/VC ecosystem. An example graph on an entity level might contain VC firms and startups as two different node types that are connected via unidirectional edges that have weights representing the investment rounds (#1=pre-seed, #2=Seed, #3=SerA etc.).
Another example graph might be on the people level where individual VC investors and founders are two different node types with different sizes representing their Twitter following and that are connected via uni- or bidirectional edges representing Twitter following of each other. Although one could theoretically add new node types, edges and weights without limitation to create a single comprehensive graph, I personally prefer to keep it well-organized with individual graphs for different purposes.
Below is some basic but broadly applicable code that I wrote as part of my research in Cambridge in 2017 to create a social graph based on a CSV file where, following the examples above, every row represents either an investment from a VC into a startup or an individual VC following a founder on Twitter. Node size represents the number of connections. Node color is red for investors and blue for startups, just for plotting and visualization purposes.
import csv
import networkx as nx
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels
import statsmodels.api as sm
df1 = pd.read_csv("FileName.csv", delimiter=";", decimal=",", encoding = "ISO-8859-1") # Read CSV file with VC-startup relationship into Pandas data frame; decimal changes comma into dots
G1 = nx.from_pandas_dataframe(df1, source="Company name", target="SH Name") # Return a graph from Pandas data frame
# Sets node size equal number of connections
# Sets node color to red if Investor and blue if company
node_sizes = []
node_colors = []
for knoten in G1:
if knoten.startswith('INV'):
node_colors.append("r")
if knoten in node_sizes: continue
else:
node_sizes.append(8*len(G1.edges(knoten)))
else:
node_colors.append("b")
if knoten in node_sizes: continue
else:
node_sizes.append(1*len(G1.edges(knoten)))
# DRAW NETWORK
nx.draw(G1, node_size=node_sizes,node_color=node_colors, with_labels=False)
What can we do with a Startup/VC social graph?
Before we start with the analysis, let’s distinguish between a snapshot at a specific point in time (static) and a time series of snapshots across time (dynamic). This will be important later on. Probably the most essential method for social network analysis, independent of static versus dynamic, are centrality measures that address the question: "Who is the most important or central person in this network?"
Depending on what we mean by “importance”, there are many different answers to this question. Although Freeman claimed in 1979 that "There is certainly no unanimity on exactly what centrality is or on its conceptual foundations", things have converged in the meantime and researchers seem aligned that centrality measures (BC, CC, DC, EC) are the most suitable measure to analyze social networks. They are calculated for every individual node:
Betweenness Centrality (BC)— the number of times lying on the shortest path between other nodes
When to use it: For finding the nodes that influence the flow around a system
Closeness Centrality (CC)— the level of closeness to all of the nodes
When to use it: For finding the nodes that are best placed to influence the entire network most quickly
Degree Centrality (DC) — the number of direct neighbors of the node
When to use it: For finding very connected nodes that are likely to hold most information or that can quickly connect with the wider network
EigenVector Centrality (EC) — like degree centrality but also considering how many links their neighbors have etc.
When to use it: For identifying nodes with influence over the whole network
