Methodology for Assigning Gender to Profiles

To cast a better light on diversity in the startup community, CrunchBase published two reports on gender this summer.  The first focused on growth of female founded startups since 2009 followed up with our report on top universities graduating the most female founders.  Both analysis were powered by the CrunchBase Dataset.

Determining gender for hundreds of thousands of profiles was no small undertaking. Some of our users, including Nicole Yeary (@nicoleyeary), have asked what went into the analysis. Here’s an overview of our methodology:

Step 1: Auto-assign gender based on first name.  We started by automatically assigning gender using a database of 92,000 first names that are predominantly associated with a specific gender.  This database allowed us to set gender for about 94% of the people profiles on CrunchBase.

Step 2: Hand review gender for subset of our dataset.  Because our published research focused on investors and founders of funded startups, we chose to further review those profiles manually.  Our team looked at profiles images and gender descriptive pronouns (e.g. him, her, she, he, Mr, Mrs, and Miss) appearing in bio’s for about 54,000 profiles.  When we identified mis-assignments, we adjusted our name database accordingly.

Step 3: Review inconsistencies between gender descriptive profiles and the gender assigned to profiles.  Next we searched for profiles where the assigned gender conflicted with gender specific pronouns appearing in the CrunchBase profile.  We manually reviewed these profiles and again updated the name database accordingly.

Was the analysis error free?  No.  When executing data projects at this scale errors are bound to occur in both the automated and human driven processes.  In this case, we now observe an error rate of 0.6% (+/- 0.3%) in assigning gender.

The CrunchBase Dataset is constantly expanding through contributions from our community of users, investment firms, and network of global partners. As for gender data, since publishing our first report in May 2015, CrunchBase users have been filling in the gaps and, in a handful of situations, pointing out errors that we’ve been quick to fix.   Anyone can edit profiles on CrunchBase. If you see information that is inaccurate, please correct the error or send us an email.

  • Originally published August 24, 2015