Measuring the success of open source projects – a case study around mongodb
Given my job running 10gen, the company behind mongodb, I spend a lot of time thinking about the right measures for success. I wanted to share some of how I measure our success; I’ll try to use mongodb data where it is public to illustrate some of the nuances around the different metrics.
Like most private companies we don’t share revenue numbers publicly, so I am limited in terms of examples. As far as principles of measuring the revenue side, Bessemer Ventures has a great white paper on running cloud and SaaS software companies. Because open source companies are also often subscription based, much of what they discuss is quite relevant.
I believe Committed Monthly Recurring Revenue is an important one to watch, especially once you have a solid year of selling under your belt. I still believe bookings are important as a driver of cash flow, which matters whether open source or closed and whether license or AAS. Just be sure (as with other software businesses) that you define bookings in a way which will link closely to cash receipts. Two other metrics I watch carefully: sales rep ramp time and sales productivity. Nothing specific to open source, but very important in a high growth environment.
Lots has been written about managing the funnel for open source, I won’t repeat that here, but you might check out this post by David Skok.
Now, the less well-charted territory: measuring the size and vibrancy of the community. While I’d much rather have a million downloads than a thousand I don’t believe downloads are a good primary measure for two reasons:
- You can’t get reliable competitive data
- They can be badly skewed by release frequency and contents (lots of critical patches aren’t necessarily good but make more downloads) and availability of cloud-based services (lots of cloud hosters are good but make less downloads)
The primary things I look at are search interest, discussion traffic, and job postings. Why? Because they’re available, they’re organic, and they measure meaningful user activity.
I use Google Insights. You need to be a bit careful about naming and domain; I will include links to the queries I use as examples of what you need to be careful of. I believe you should compare to:
- Your most direct head-to-head open source competitors; for us that’s this chart.
- If you’re leading your direct competitors (as we are, by about 3 to 1), you also need to look at successful technologies in similar spaces as a point of reference (for us, hadoop and lucene/solr make an interesting comparison). If you’re behind your direct competitors, don’t look at much else until you’ve caught them or given up and decided to re-target.
You can see that some alternative terms are not useful to search on; for example couch, pig, and hive all have relevant meanings but there is way too much noise. Search with a longer time period to check for noise if you’re worried that the data will be “polluted” by unrelated searches.
On markmail, you can see how much discussion there is around a technology. Be careful to look at user activity; dev activity can be apples and oranges depending whether markmail indexes the development mailing lists or just the user mailing lists for that technology. At this level it is important to include all the different related pieces; for example, to get a full picture of hadoop for this metric you should include hadoop core, hbase, pig, and hive.
One great indicator of adoption is job postings. We use indeed.com (coincidentally a mongoDB shop, and more relevantly a good aggregator of job posts) to measure this.
Be careful when similar (competitive) technologies are listed as alternatives. For example, compare mongoDB and couchdb on indeed. At the time I ran these queries, mongoDB was ahead 1244 jobs to 389 (similar to the roughly 3:1 ratio on Google insights) . However, many of those are general document database or noSQL jobs: 240 of them when I ran this report. Factoring those out, 1004 jobs mentioned mongoDB without couchdb (or any of its incarnations/merged products), whereas only 149 mentioned couchdb (and related stuff) without mongoDB, so among job postings where they’ve already selected a document database, the preference for mongoDB is over 6:1.
When the metrics don’t align
It is nice and convenient when the metric all line up and are marching in the same direction. That’s not always the case. For example:
- Compared to couchdb, we are about 3:1 ahead in search traffic, 4.5:1 in discussion activity, and 6.5:1 in jobs (excluding posts which mention both)
- Compared to cassandra, we also about 3:1 ahead in search traffic, but only 2.5:1 ahead in discussion activity and 1.5:1 ahead in jobs (again excluding overlapping job posts)
- Compared to hadoop, we’re about 1.5:1 ahead in search traffic, about 1.5:1 ahead in forum activity (I included the hbase, pig and hive groups when I measured forum activity), but behind by almost 3:1 in jobs.
Why the differences? I really don’t know.
I hope this post is useful to the open source software community, and I hope that you’ll respond with some new metrics or new interpretations of my existing metrics. Love to hear your comments,