Measuring the success of open source projects – a case study around mongodb

Given my job running 10gen, the company behind mongodb, I spend a lot of time thinking about the right measures for success. I wanted to share some of how I measure our success; I’ll try to use mongodb data where it is public to illustrate some of the nuances around the different metrics.

Revenue related

Like most private companies we don’t share revenue numbers publicly, so I am limited in terms of examples. As far as principles of measuring the revenue side, Bessemer Ventures has a great white paper on running cloud and SaaS software companies. Because open source companies are also often subscription based, much of what they discuss is quite relevant.

I believe Committed Monthly Recurring Revenue is an important one to watch, especially once you have a solid year of selling under your belt. I still believe bookings are important as a driver of cash flow, which matters whether open source or closed and whether license or AAS. Just be sure (as with other software businesses) that you define bookings in a way which will link closely to cash receipts. Two other metrics I watch carefully: sales rep ramp time and sales productivity. Nothing specific to open source, but very important in a high growth environment.

Lots has been written about managing the funnel for open source, I won’t repeat that here, but you might check out this post by David Skok.

Community-related

Now, the less well-charted territory: measuring the size and vibrancy of the community. While I’d much rather have a million downloads than a thousand I don’t believe downloads are a good primary measure for two reasons:

  • You can’t get reliable competitive data
  • They can be badly skewed by release frequency and contents (lots of critical patches aren’t necessarily good but make more downloads) and availability of cloud-based services (lots of cloud hosters are good but make less downloads)

The primary things I look at are search interest, discussion traffic, and job postings. Why? Because they’re available, they’re organic, and they measure meaningful user activity.

Search interest

I use Google Insights. You need to be a bit careful about naming and domain; I will include links to the queries I use as examples of what you need to be careful of. I believe you should compare to:

  • Your most direct head-to-head open source competitors; for us that’s this chart.
  • If you’re leading your direct competitors (as we are, by about 3 to 1), you also need to look at successful technologies in similar spaces as a point of reference (for us, hadoop and lucene/solr make an interesting comparison). If you’re behind your direct competitors, don’t look at much else until you’ve caught them or given up and decided to re-target.

You can see that some alternative terms are not useful to search on; for example couch, pig, and hive all have relevant meanings but there is way too much noise. Search with a longer time period to check for noise if you’re worried that the data will be “polluted” by unrelated searches.

Forum activity

On markmail, you can see how much discussion there is around a technology. Be careful to look at user activity; dev activity can be apples and oranges depending whether markmail indexes the development mailing lists or just the user mailing lists for that technology. At this level it is important to include all the different related pieces; for example, to get a full picture of hadoop for this metric you should include hadoop core, hbase, pig, and hive.

Jobs

One great indicator of adoption is job postings. We use indeed.com (coincidentally a mongoDB shop, and more relevantly a good aggregator of job posts) to measure this.

Be careful when similar (competitive) technologies are listed as alternatives. For example, compare mongoDB and couchdb on indeed. At the time I ran these queries, mongoDB was ahead 1244 jobs to 389 (similar to the roughly 3:1 ratio on Google insights) . However, many of those are general document database or noSQL jobs: 240 of them when I ran this report. Factoring those out, 1004 jobs mentioned mongoDB without couchdb (or any of its incarnations/merged products), whereas only 149 mentioned couchdb (and related stuff) without mongoDB, so among job postings where they’ve already selected a document database, the preference for mongoDB is over 6:1.

When the metrics don’t align

It is nice and convenient when the metric all line up and are marching in the same direction. That’s not always the case. For example:

  • Compared to couchdb, we are about 3:1 ahead in search traffic, 4.5:1 in discussion activity, and 6.5:1 in jobs (excluding posts which mention both)
  • Compared to cassandra, we also about 3:1 ahead in search traffic, but only 2.5:1 ahead in discussion activity and 1.5:1 ahead in jobs (again excluding overlapping job posts)
  • Compared to hadoop, we’re about 1.5:1 ahead in search traffic, about 1.5:1 ahead in forum activity (I included the hbase, pig and hive groups when I measured forum activity), but behind by almost 3:1 in jobs.

Why the differences? I really don’t know.

I hope this post is useful to the open source software community, and I hope that you’ll respond with some new metrics or new interpretations of my existing metrics. Love to hear your comments,

— Max

6 comments so far

  1. Ajay Ohri on

    I did a follow up post on this sometime back measuring open source software in analytics and business intelligence (that discussion started by pentaho’s Jamie Dixon at http://jamesdixon.wordpress.com/2010/11/02/comparing-open-source-and-proprietary-software-markets/. Basically unless you start making money you are basically selling , drinking and evangelizing kool aid. food stamps dont buy diapers and neither does search traffic,or keywords (unless you putting some ads http://www.mongodb.org/). read my earlier post on http://decisionstats.com/2010/10/31/jim-goodnight-on-open-source-and-why-he-is-right-sigh/

  2. […] Accordingly I disagree with the sentiments but not the maths at https://maxschireson.com/2011/04/22/measuring-the-success-of-open-source-projects-a-case-study-around… and http://jamesdixon.wordpress.com/2010/11/02/comparing-open-source-and-proprietary-software-markets/ or […]

  3. CnotC on

    Since one of google search auto-complete’s for ‘mongodb’ is ‘mongodb losing data’, couldn’t the google search data be a Bad indicator as well? Maybe ‘mongodb’ is searched for a lot because there a lot of problems with it. Metrics like search stats and mailing list traffic aren’t an effective correlation to revenue and success.

    I’m sure if google was around in 1978, ‘pinto’ would be a pretty popular search topic and a pretty active mailing list.

    • Max Schireson on

      Yes, some of the searches and forum discussions will be negative. A good next level of digging would be to analyze the positive/negative mix.

      — Max

  4. […] numbers are particularly impressive once you abstract out generic NoSQL job postings (which call for a variety of NoSQL technology […]

  5. […] numbers are particularly impressive once you abstract out generic NoSQL job postings (which call for a variety of NoSQL technology […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: