Measuring the success of open source projects – a case study around mongodb

Posted April 22, 2011
Filed under: Uncategorized |

Given my job running 10gen, the company behind mongodb, I spend a lot of time thinking about the right measures for success. I wanted to share some of how I measure our success; I’ll try to use mongodb data where it is public to illustrate some of the nuances around the different metrics.

Revenue related

Like most private companies we don’t share revenue numbers publicly, so I am limited in terms of examples. As far as principles of measuring the revenue side, Bessemer Ventures has a great white paper on running cloud and SaaS software companies. Because open source companies are also often subscription based, much of what they discuss is quite relevant.

I believe Committed Monthly Recurring Revenue is an important one to watch, especially once you have a solid year of selling under your belt. I still believe bookings are important as a driver of cash flow, which matters whether open source or closed and whether license or AAS. Just be sure (as with other software businesses) that you define bookings in a way which will link closely to cash receipts. Two other metrics I watch carefully: sales rep ramp time and sales productivity. Nothing specific to open source, but very important in a high growth environment.

Lots has been written about managing the funnel for open source, I won’t repeat that here, but you might check out this post by David Skok.

Community-related

Now, the less well-charted territory: measuring the size and vibrancy of the community. While I’d much rather have a million downloads than a thousand I don’t believe downloads are a good primary measure for two reasons:

You can’t get reliable competitive data
They can be badly skewed by release frequency and contents (lots of critical patches aren’t necessarily good but make more downloads) and availability of cloud-based services (lots of cloud hosters are good but make less downloads)

The primary things I look at are search interest, discussion traffic, and job postings. Why? Because they’re available, they’re organic, and they measure meaningful user activity.

Search interest

I use Google Insights. You need to be a bit careful about naming and domain; I will include links to the queries I use as examples of what you need to be careful of. I believe you should compare to:

Your most direct head-to-head open source competitors; for us that’s this chart.
If you’re leading your direct competitors (as we are, by about 3 to 1), you also need to look at successful technologies in similar spaces as a point of reference (for us, hadoop and lucene/solr make an interesting comparison). If you’re behind your direct competitors, don’t look at much else until you’ve caught them or given up and decided to re-target.

You can see that some alternative terms are not useful to search on; for example couch, pig, and hive all have relevant meanings but there is way too much noise. Search with a longer time period to check for noise if you’re worried that the data will be “polluted” by unrelated searches.

Forum activity

On markmail, you can see how much discussion there is around a technology. Be careful to look at user activity; dev activity can be apples and oranges depending whether markmail indexes the development mailing lists or just the user mailing lists for that technology. At this level it is important to include all the different related pieces; for example, to get a full picture of hadoop for this metric you should include hadoop core, hbase, pig, and hive.

Jobs

One great indicator of adoption is job postings. We use indeed.com (coincidentally a mongoDB shop, and more relevantly a good aggregator of job posts) to measure this.

Be careful when similar (competitive) technologies are listed as alternatives. For example, compare mongoDB and couchdb on indeed. At the time I ran these queries, mongoDB was ahead 1244 jobs to 389 (similar to the roughly 3:1 ratio on Google insights) . However, many of those are general document database or noSQL jobs: 240 of them when I ran this report. Factoring those out, 1004 jobs mentioned mongoDB without couchdb (or any of its incarnations/merged products), whereas only 149 mentioned couchdb (and related stuff) without mongoDB, so among job postings where they’ve already selected a document database, the preference for mongoDB is over 6:1.

When the metrics don’t align

It is nice and convenient when the metric all line up and are marching in the same direction. That’s not always the case. For example:

Compared to couchdb, we are about 3:1 ahead in search traffic, 4.5:1 in discussion activity, and 6.5:1 in jobs (excluding posts which mention both)
Compared to cassandra, we also about 3:1 ahead in search traffic, but only 2.5:1 ahead in discussion activity and 1.5:1 ahead in jobs (again excluding overlapping job posts)
Compared to hadoop, we’re about 1.5:1 ahead in search traffic, about 1.5:1 ahead in forum activity (I included the hbase, pig and hive groups when I measured forum activity), but behind by almost 3:1 in jobs.

Why the differences? I really don’t know.

I hope this post is useful to the open source software community, and I hope that you’ll respond with some new metrics or new interpretations of my existing metrics. Love to hear your comments,

— Max

6 comments so far

Ajay Ohri on April 23, 2011

I did a follow up post on this sometime back measuring open source software in analytics and business intelligence (that discussion started by pentaho’s Jamie Dixon at http://jamesdixon.wordpress.com/2010/11/02/comparing-open-source-and-proprietary-software-markets/. Basically unless you start making money you are basically selling , drinking and evangelizing kool aid. food stamps dont buy diapers and neither does search traffic,or keywords (unless you putting some ads http://www.mongodb.org/). read my earlier post on http://decisionstats.com/2010/10/31/jim-goodnight-on-open-source-and-why-he-is-right-sigh/

Reply
Free and Open Source cannot get basic economics correct « DECISION STATS on April 23, 2011

[…] Accordingly I disagree with the sentiments but not the maths at https://maxschireson.com/2011/04/22/measuring-the-success-of-open-source-projects-a-case-study-around… and http://jamesdixon.wordpress.com/2010/11/02/comparing-open-source-and-proprietary-software-markets/ or […]

Reply
CnotC on April 25, 2011

Since one of google search auto-complete’s for ‘mongodb’ is ‘mongodb losing data’, couldn’t the google search data be a Bad indicator as well? Maybe ‘mongodb’ is searched for a lot because there a lot of problems with it. Metrics like search stats and mailing list traffic aren’t an effective correlation to revenue and success.

I’m sure if google was around in 1978, ‘pinto’ would be a pretty popular search topic and a pretty active mailing list.

Reply
- Max Schireson on April 25, 2011
  
  Yes, some of the searches and forum discussions will be negative. A good next level of digging would be to analyze the positive/negative mix.
  
  — Max
  
  Reply
‘Ugly’ MongoDB defies NoSQL death rumour | Technology News on March 27, 2012

[…] numbers are particularly impressive once you abstract out generic NoSQL job postings (which call for a variety of NoSQL technology […]

Reply
‘Ugly’ MongoDB defies NoSQL death rumour | todaysoftwarereview on March 28, 2012

[…] numbers are particularly impressive once you abstract out generic NoSQL job postings (which call for a variety of NoSQL technology […]

Reply

Max Schireson's blog