My last post showed areas of startup activity across the US delimited by zip code. I chose zip codes because they are a relatively small area that have been standardized. I also decided to display a heat graph like coloring to the graph.
In this post, I will re-examine the data on the state scale and specifically from the last 18 months, April 2008 through September 2009. For mapping purposes the data has undergone a few transformations in order to improve understanding of the underlying data.
It is no surprise that California has more startups than anyone else. In order to give any other states credit a large transformation was implemented.
But, as Matt commented last post, there are other factors to consider. One major factor is that CrunchBase is a moderated wiki, and much of its audience contributors are from certain states, California among those. More importantly California has an enormous population.
For the first time I ventured outside the friendly confines of CrunchBase to get some state population data, which I read straight into R and cleaned up a bit.
I hesitate to use regression analysis because it can be extremely misleading, but I also tend to use it as a quick check for relevance, right after my common sense check. So, a quick regression analysis shows that population could account for ~74% of the number of startups in each state. If we assume that the relationship is linear in nature we can account for that influence by standardizing our metric to startups per capita.
Even per capita California has a healthy lead over other states. States like Florida and Texas are affected most. There are other things of interest beside population of course. One issue that has gotten a lot of attention recently is the importance of immigration reform to the startup world and Newt Gingrich.
To find out what CrunchBase thought about the issue I once again ventured out into the world to find some immigration data. I chose to work with legal permanent resident (LPR) data from 2008.
And now per capita:
There is a clear positive correlation between the number of startups and the amount of LPRs per capita in each state. A logarithmic transformation was necessary to deal with California as an outlier. Each state is plotted with its state abbreviation on the below plot. The green line is the plot’s regression line.
It is clear that when controlling for population there is a strong positive correlation at the state level between the number of LPRs and the number of startups. Immigration reform that opens America’s borders more would likely reflect the above data and add to America’s illustrious entrepreneurial history. Immigration reform that specifically targets founders, engineers or PhDs would be even more effective.