Archive for the ‘Uncategorized’ Category

The web is your new resume

If you’re not planning to look for a job or to hire someone at some point in the future, don’t read this post.

Five years ago, I would study a candidate’s resume as the first step in assessing their fit for a job.

Now, the first place I go is the internet.

Linkedin typically has your standard resume information, but that’s only the beginning. I’ll read their blog, if they have one. I might look at their tweets. Usually I can find some other type of residue of their internet lives: a presentation, a white paper, activity in forums, something. Recently I saw a very creatively built presentation from a marketing candidate. Before I had interviewed her I already wanted to hire her.

My advice for job seekers:

  • Invest at least a little time in keeping your public profile on Linkedin up to date
  • Google yourself and see what people find
  • Long term, think about what you’re leaving behind
  • Careful about what you post; while as a manager I try hard to ignore employees personal lives I am quite sure some of the things I have stumbled upon were not meant to be viewed by prospective employers
  • Thoroughly check out the folks you are interviewing with on the internet and see if you want to work for them

One prime example is forum activity. Our development team also does a lot of support. I am pretty confident that I can tell who on that team will do well should they ever need another job. I just look at their posts. Did they respond quickly? Did they communicate clearly? Was the user happy with the response and did it solve their problem?

My advice for hiring managers:

  • See what you can learn about job candidates on the internet
  • Expect them to do likewise for you; interviews are a two way street
The days of sending out dozens (or hundreds during down economic times) of resumes are over, but the days of being Googled in the job search process are just beginning.
— Max

How to build a good plan – and when to ignore it

The Basics

I believe planning serves three primary purposes:

  1. Establishes, builds consensus around, and communicates a set of goals for a company
  2. Creates accountability for the senior executive team around an agreed set of goals for the company
  3. Provides a framework from which to create a budget against which to operate the company

If you have a plan which represents the result of real examination of goals by the executive team and a budget that flows from that plan, you’re already in at least the 75th percentile of planning and budgeting.

My friend Dave Kellogg has a good blog post on budgeting that’s worth reading; I also like Chip Hazard’s thoughts on the topic (good that I like his approach because he’s on the board of 10gen).

Now that you have a good plan, when do you ignore it?

You need to periodically review results versus plan, and based on what you see, revise your objectives. If you’re running ahead of plan and believe you can productively deploy them, hire more people. If you’re running behind, sometimes you’ll want to cut some hires (but not always). Do this review frequently . Using this approach, I was able to take a (people-based) consulting organization which was planned for 30% growth and deliver 90% growth.

How frequently can you re-assess? If you get meaningful objective results monthly, I’d make small tweaks every month and take a hard look every quarter. If your meaningful results come in quarterly, I’d make small tweaks quarterly and make a significant mid-year plan revision.

How dynamic should you be? That’s a bit harder. There are a number of factors to consider:

  • How confident are you that incremental resource will yield incremental results
  • What’s the ramp time to see those results
  • What’s the financial impact of adding those resources, not just on top line but on margins and on cash? Of course for different businesses the relative importance will differ, but you should understand all 3 impacts before you make a major plan revision

One rule of thumb that I use is that I’ll typically spread my changes over two planning periods. If my business generates $2 million extra in a given quarter versus plan and I want to reinvest all of it in growth, I’ll add $1 million in incremental expense in each of the next 2 quarters.

Some of this is financial and quantitative; some involves more market and organizational judgement. There have been teams where I didn’t re-invest because I thought the leadership was at the breaking point and needed time to ramp the previous set of hires before adding more. There have been teams where I didn’t re-invest because I believed the success was a fluke and I wanted to see it sustained longer before I thought there was potential to profitably grow that market. Note in both these cases I delayed the change, but I didn’t kill it. If you start deciding that the market is tapped out and you can’t re-invest, how will you know if you’re wrong?

While I’ve talked here primarily about overperformance in companies oriented primarily towards growth, similar principles apply to other situations – but I’m hoping all your planning problems are over-performance related.

Good luck,

— Max

An even better way to get a job working on mongodb

On Friday night late I posted a puzzle, with an invitation to apply for a job if you solved it.
The response was overwhelming. Over 6000 people visited my blog; the post was #3 on hacker news and #33 on wordpress (its hard for algorithm questions to compete with stabbings and royal weddings). A few days later, I have gone through hundreds of comments and we have a few interview cycles going. Thanks to everyone who participated; I hope you had fun, and I hope we hire someone from this contest.
However, if you’re really interested in working on mongodb, there’s an even better way: contribute something useful to mongodb. A number of our developers have been hired from the community. While I am a former math guy and I like smart programmers, in the end the most important thing is your ability to create working code. Puzzles are interesting and I expect to continue to post them – and yes, solving them will help you get a job, but it won’t help as much as coding something useful.
— Max

Why Yahoo should spin out Hadoop team – but not how you think

Recent commentary in the Wall Street Journal (left out of the link out of irritation at the WSJ paywall) and ReadWriteWeb (http://www.readwriteweb.com/cloud/2011/04/yahoo-weighs-spinning-out-hado.php) suggests that Yahoo is in discussions with Benchmark Capital to spin out their Hadoop development team and form a new company.

In my opinion, this is a great idea. Why?

  1. I think there’s a big market there. A quick look on indeed.com shows over 3000 jobs posted related to Hadoop. Big data analytics is a large and growing space, and Hadoop has established a clear leadership position there.
  2. I think there is zero chance of Yahoo capturing this value as an internal project. Yahoo has some great properties but it has enough challenges in managing its media business. Running two different businesses (media and software) is very hard, and Yahoo would be ill-advised to try.
The only question is how it should be spun out. In my opinion there are two sensible approaches:
  1. Spin it out on its own, with funding from a top-teir VC. Rumor has it that discussions with Benchmark are in progress.
  2. Join forces with Cloudera; figure out a fair value for the assets being spun out and work with the Cloudera team to share ownership based on that valuation.

There are fewer moving parts with the standalone spinout so its easier to do, but I’m not convinced its the right thing. I just don’t think two rival firms commercializing one open source project will work well, and I’m afraid both would have the critical mass (and cash) to last a while, which would create a rift in the Hadoop world just as it’s picking up momentum.

Here’s to hoping (however unlikely) that Yahoo, Cloudera, and their VC’s can find a way to create one great company that can help Hadoop realize its full potential.

Thoughts?

— Max

6/27 addendum: looks like this happened, gigaom is now reporting the spinout will go forward under the name HortonWorks (ugh).

Want a job working on mongoDB? Your first online interview is in this post

I like smart programmers.

So, I thought I’d put out a programming puzzle. If you an solve it, you are well on your way to a very cool job. If you can’t solve it, send it to a smart friend who can. Even if you don’t want a job, I promise you’ll enjoy the solution when you find it.

Here’s the puzzle:

You’re given an array of N 64 bit integers. N may be very large. You know that every integer 1-N appears once in the array, except there is one integer missing and one integer duplicated.

I’d like a linear time algorithm to find the missing and duplicated numbers. Further, your algorithm should run in small constant space and leave the array untouched.

— Max

PS I love smart people in other jobs too; while I think this problem will appeal mostly to programmers, 10gen is a company where solving a problem like this helps you get a job in marketing, finance, or any other area that fits your skills. Send me a solution and tell me what job you want to interview for!

NOTE 4/24 4am: I have caught up with all the responses but plan to take Sunday off and spend it with my family. Thanks for making a job at 10gen more popular than baby tips from Elton John today.

Measuring the success of open source projects – a case study around mongodb

Given my job running 10gen, the company behind mongodb, I spend a lot of time thinking about the right measures for success. I wanted to share some of how I measure our success; I’ll try to use mongodb data where it is public to illustrate some of the nuances around the different metrics.

Revenue related

Like most private companies we don’t share revenue numbers publicly, so I am limited in terms of examples. As far as principles of measuring the revenue side, Bessemer Ventures has a great white paper on running cloud and SaaS software companies. Because open source companies are also often subscription based, much of what they discuss is quite relevant.

I believe Committed Monthly Recurring Revenue is an important one to watch, especially once you have a solid year of selling under your belt. I still believe bookings are important as a driver of cash flow, which matters whether open source or closed and whether license or AAS. Just be sure (as with other software businesses) that you define bookings in a way which will link closely to cash receipts. Two other metrics I watch carefully: sales rep ramp time and sales productivity. Nothing specific to open source, but very important in a high growth environment.

Lots has been written about managing the funnel for open source, I won’t repeat that here, but you might check out this post by David Skok.

Community-related

Now, the less well-charted territory: measuring the size and vibrancy of the community. While I’d much rather have a million downloads than a thousand I don’t believe downloads are a good primary measure for two reasons:

  • You can’t get reliable competitive data
  • They can be badly skewed by release frequency and contents (lots of critical patches aren’t necessarily good but make more downloads) and availability of cloud-based services (lots of cloud hosters are good but make less downloads)

The primary things I look at are search interest, discussion traffic, and job postings. Why? Because they’re available, they’re organic, and they measure meaningful user activity.

Search interest

I use Google Insights. You need to be a bit careful about naming and domain; I will include links to the queries I use as examples of what you need to be careful of. I believe you should compare to:

  • Your most direct head-to-head open source competitors; for us that’s this chart.
  • If you’re leading your direct competitors (as we are, by about 3 to 1), you also need to look at successful technologies in similar spaces as a point of reference (for us, hadoop and lucene/solr make an interesting comparison). If you’re behind your direct competitors, don’t look at much else until you’ve caught them or given up and decided to re-target.

You can see that some alternative terms are not useful to search on; for example couch, pig, and hive all have relevant meanings but there is way too much noise. Search with a longer time period to check for noise if you’re worried that the data will be “polluted” by unrelated searches.

Forum activity

On markmail, you can see how much discussion there is around a technology. Be careful to look at user activity; dev activity can be apples and oranges depending whether markmail indexes the development mailing lists or just the user mailing lists for that technology. At this level it is important to include all the different related pieces; for example, to get a full picture of hadoop for this metric you should include hadoop core, hbase, pig, and hive.

Jobs

One great indicator of adoption is job postings. We use indeed.com (coincidentally a mongoDB shop, and more relevantly a good aggregator of job posts) to measure this.

Be careful when similar (competitive) technologies are listed as alternatives. For example, compare mongoDB and couchdb on indeed. At the time I ran these queries, mongoDB was ahead 1244 jobs to 389 (similar to the roughly 3:1 ratio on Google insights) . However, many of those are general document database or noSQL jobs: 240 of them when I ran this report. Factoring those out, 1004 jobs mentioned mongoDB without couchdb (or any of its incarnations/merged products), whereas only 149 mentioned couchdb (and related stuff) without mongoDB, so among job postings where they’ve already selected a document database, the preference for mongoDB is over 6:1.

When the metrics don’t align

It is nice and convenient when the metric all line up and are marching in the same direction. That’s not always the case. For example:

  • Compared to couchdb, we are about 3:1 ahead in search traffic, 4.5:1 in discussion activity, and 6.5:1 in jobs (excluding posts which mention both)
  • Compared to cassandra, we also about 3:1 ahead in search traffic, but only 2.5:1 ahead in discussion activity and 1.5:1 ahead in jobs (again excluding overlapping job posts)
  • Compared to hadoop, we’re about 1.5:1 ahead in search traffic, about 1.5:1 ahead in forum activity (I included the hbase, pig and hive groups when I measured forum activity), but behind by almost 3:1 in jobs.

Why the differences? I really don’t know.

I hope this post is useful to the open source software community, and I hope that you’ll respond with some new metrics or new interpretations of my existing metrics. Love to hear your comments,

— Max

Debunking SSD lifespan and random write performance concerns

A number of customers have been concerned about durability of SSDs; some have worried about random write performance as well. Since many mongoDB users have reported good results, I thought I’d take the time to analyze some of the concerns to assist others who are considering a move to SSD. Now, not all SSDs are created equal; I would recommend high-end enterprise-grade SSDs (like the Intel X25-E) for database applications; my analysis will focus on this grade of SSD.

Durability

The short answer is I wouldn’t worry about it.

For applications which are heavy on random writes, you’re OK (meaning a life span of over 5 years) up to about 25 million writes per day per drive, which is nearly double the IO capacity of the fastest hard disk drives. For sequential write heavy applications (which benefit far less from SSDs), you’re OK (same 5 year life cycle) assuming the application re-writes each block on average no more than once per half hour; the smallest size of the latest fast HDD’s can barely manage this (take a Seagate Cheetah 15k.7 at 300GB for example, which has a claimed sustained write throughput of 171 MBps), and it gets harder as disks get larger.

I should point out that applications which do nothing but sustained disk writes are unusual to say the least; most applications mix in some writes and have a varying load which doesn’t keep the disk subsystem pegged continually. But even in this pathological case, you have a lifespan of 5 years for any application within the performance envelope of HDDs. And with an even mix of reads and writes and a (fairly typical) 5:1 ratio of peak to average usage, your SSD will last 5 years at 10x-20x the load that an HDD can handle. For applications with a higher mix of reads, the situation looks even better.

If you’d like to analyze this yourself, I will introduce you to two concepts you need to consider: wear leveling and write amplification. Both can increase workload/decrease lifespan so I’ll explain them here.

  • Wear leveling is an important process where the SSD ensures that the same blocks are not being re-written constantly. Intel quotes a 1.1x wear leveling factor for the X25-E; that means there’s an overhead of an additional 10% of writes going on to prevent hotspots. Because its not hard to come up with algorithms that lazily balance wear with minimal overhead (not hard, but not too easy, I might use it as an interview question), I tend to trust that estimate and not worry much about that factor. I did leave room in my lifecycle calculations for this 10% factor.
  • Write amplification is the ratio of data physically written to the SSD relative the data logically written by the user. One large factor in write amplification is the large number of pages (typically 4-8k) in a block (typically 128-256k) and the fact that all erasing is done a a block level, potentially causing significant garbage collection and re-writing overhead. While Intel claims 1.1x write amplification for the X25-E, I’d be a bit cautious about accepting that as gospel; minimizing write amplification depends in large part on separating static and dynamic data, which can be very hard to do (and impossible given worst-case random IO patterns). That said, my previous calculations don’t depend on any specific efficiency in that regard – I baked in a worst-case assumption for the random writes, and for sequential writes there is much less of an issue here.
For those who don’t trust my math on durability, how did I calculate these numbers? For the random writes, a 64GB SSD has 500,000 blocks of 128k. Assuming a lifespan of 100,000 write cycles per block, you get a total of 50 billion writes for the disk, minus 10% for wear leveling give 45 billion usable writes. Divide that by the number of days in 5 years and I get 25 million per day. For the sequential writes, I allowed 90,000 usable write cycles of the whole disk (again allowing 10% for wear leveling). Dividing 5 years (which happens to be just over 43000 hours) by 90,000, and you get just over 29 minutes per cycle. For the Cheetah comparison, I divided 300GB by 171MBps and again got 29 minutes per cycle.

Random write performance

I’ve heard complaints that SSDs (even high end ones) are slow for random writes. Short PG-rated answer: baloney.

Longer answer: “slow” is relative: SSDs are much slower (around 11x for our old standby X25-E) at random writes than they are at random reads, but they are much faster (15-20x difference between the X25-E and a Seagate Cheetah 15.7) at random writes than are hard drives.

It’s kind of like complaining that cars are slow in reverse: they’re quite a bit slower in reverse than in drive, but they still back up a lot faster than a horse backs up, so don’t use that as a reason to stay with a horse for your transportation needs.

Some more detail: SSDs are around 10x slower at reads than they are at writes: 3300 IOPS vs 35,000 IOPS for the X25-E using Intel’s numbers, with similar results duplicated by independent testing (for example, ACM Transactions on Storage reported 3120 vs 33,400 in September 2010). With 2ms of rotational latency and around 4ms to seek, expect around 150-200 IPS for a 15k RPM HDD.

Hope this post helps clarify the situation and gets more people taking advantage of this technology.

— Max

Creating success vs avoiding failure

I am a huge believer that businesses need to strive to create successes rather than avoid failures. There are certainly some areas where failure avoidance is necessary: Boeing needs to do an outstanding job of shipping planes that don’t have safety flaws, for example. But the successes are what drives a business forward: creating the jumbo-jet or winning a big order from a big airline, for example.

One key challenge in building a success-seeking culture is that failure is so much more visible than the absence of success. When a team goes hard after a deal and loses it, everyone sees it and is disappointed. Depending on the culture, there might be a productive post-mortem (good) or a witch hunt (not so good). But when the team doesn’t make the extra call that creates that opportunity, nobody notices. On the engineering side, the revolutionary new feature that misses the release is a visible failure; the absence of the revolutionary new feature ever being invented is a non-event.

One would hope the visibility of success could be a balancing factor, but there are two problems: spectacular success is rare, and a few extra routine successes are hard to notice. What’s the incentive for an employee to risk (non-catastrophic) failure to deliver a few more routine successes? In most companies, none. The developer who tries to get 7 features into the release and only gets 5 in is a goat for missing 2 committed features, while the developer who tries for 3 and gets them all is a hero. In sales, the team that doggedly pursues every qualified opportunity they see is heroic, whereas the team that prioritizes the best deals and trades off some pursuit of less likely deals for more new prospecting may have more “losses” but close more deals.

How do you manage failure to create a success-seeking culture? Don’t ignore it, failure has far too much to teach us. Instead,

  • Clearly distinguish catastrophic failure (building an unsafe airplane) from routine failure (running a marketing promotion that didn’t have the desired effect). Drive that distinction throughout the organization; obsess about avoiding catastrophic failure and accept routine failure.
  • Makes sure post-mortems are done respectfully and productively, with a focus on learning and a lack of blame. Share the results – even when the failure occurred at the the most senior level. There’s nothing like leading by example.
Of course a big part of building a culture that seeks success is celebrating successes. But don’t just celebrate the outcome, celebrate the journey that got the team there. I believe that most worthwhile successes don’t come easily, and many of them come within a whisker of failure at multiple points. You need to show the team how fine that line is and what others in the company have done to keep critical successes on the correct side of that line. Let them see the risk taking and the will that it took to create the success you are celebrating.
A boss once told me that I would never last at a big industrial company, I wasn’t good enough at managing expectations down. I agree, but I hope my focus on the results we want to create will help make 10gen a company that seeks successes.
— Max

Are the database rebels throwing out the baby with the bath water?

Over the last few years, there has been a rebellion brewing in the database world. There has been a proliferation of alternative databases which solve certain problems better than your traditional RDBMS (think Oracle). There are embedded databases, in memory databases, column oriented databases, xml databases, data warehousing appliances, key value stores, document databases, and plenty of others that I’m leaving out. In addition, there’s a great proliferation of non-databases being used where databases would traditionally have been used, such as in memory key-value stores and map-reduce frameworks.

With this much activity going on, the obvious question is why. I believe there are a number of factors converging that are driving the activity:

  • Roughly twenty years of developer frustration with the mismatch between relational data stores and object oriented programming
  • Growing frustration with the costs associated with traditional RDBMS vendors
  • The transition first from minicomputer-derived servers to commodity hardware, horizontal scaling, and cloud deployment
  • The need to manage  internet-scale data
  • The movement towards iterative development and agile methodologies and the difficulty with managing schema transitions in this world
Personally, I believe all of the drivers of change in the database space are valid, but users are frequently adopting the wrong solution in a ill-advised mad quest to demolish their objections to the RDBMS. Some examples of how they go too far:
  • Replacing a database with a map-reduce framework when real-time query is needed. Hadoop is great for taking jobs that run a month and running them in hours. It won’t run them subsecond though.
  • Using a key-value store when secondary indexes are needed. Yes, a key-value store provides great flexibility around schema – you don’t need one. That flexibility, however, comes at great cost. What happens the first time you want to query your user object by location instead of userid?
  • Giving up consistency without a fight. Yes, there are some problems where consistency is not needed. I certainly care about network partitions when I am designing a control system for a nuclear submarine fleet. But if I am travelling in Europe and I can’t play my favorite game for 5 minutes because the internet isn’t working, how bad is that problem? Is it worth trying to teach developers a whole new transaction semantic? In most cases, no.
  • Optimizing performance for the wrong use cases. If I am travelling in Europe and I log in to watch a video and I update my preferences, is it important that my user account info be stored locally? No; that one time cost of around 100 milliseconds is not a problem. Should the video be streamed from a local server? Absolutely, but that is a different issue which doesn’t bring the issues around resolving updates from multiple masters.
Each use case is different, but there is a common core of relational complaints that can be solved while maintaining much of what we like about RDBMS’s. I believe that systems which:
  • Offer secondary indexes without requiring up front schema definition to load data
  • Offer horizontal scalability on commodity hardware
  • Offer transactional updates and consistent reads
  • Are easy to program
  • Are open source
Will address most of the core frustrations driving the database rebellion for operational data stores (not OLAP/data warehousing). Document oriented systems are addressing these issues today. They don’t solve every issue, but I believe they solve a broad set of them and will eventually be the data store of choice for a very broad set of applications.
So, the next time someone says you need to give up secondary indexes, transactional updates or consistent reads to get the scalability or agility you need, think twice before you make the trade.
— Max
[Disclosure reminder: I am President of 10gen, the makers of mongoDB. Unsurprisingly I think mongoDB is right in the sweet spot I described, and I’d encourage you to try it out and see for yourself.]

As the minicomputer brought Oracle, will the cloud bring a new database?

If you look at the emergence of the relational database, you’ll see two things which came together to create one of the largest software companies in the world:

  • A new and compelling software technology (relational databases)
  • A platform shift (from mainframe to minicomputer)

In my opinion, it was the combination of the two which created such vast changes in the database industry.

The question I think about a lot is what changes will cloud computing bring to the database industry. Will cloud be the platform change that ushers in a new generation of database technologies, and if so which ones and how sweeping will the change be? It is both intellectually interesting and one of the largest external factors effecting the company I run (10gen, the company that builds mongoDB; yes, this is an obvious source of bias for me in this post).

I don’t have a definitive answer, but I’ll share some of the “for” and some of the “against”.

Let me start with a few of the arguments against:

No forced move: “Cloud” doesn’t require new software; part of the beauty of cloud is that you can run your same stack more flexibly and less expensively. IMS simply wouldn’t run on your VAX or Sun, so you were forced to migrate if you wanted to use that hardware. No such problem, so the comparison is irrelevant.

It already happened: with the shift to commodity hardware; mySQL has become the “default choice” that Oracle used to be, and that dominance will only increase in the cloud, where traditional licensing is often problematic. Cloud computing is mostly a cost play, and mySQL is already the perfect low-cost database.

Now, the arguments for:

Cloud is about more than cost: cloud is about scalability, flexibility, agility, and immediacy. You need a database which was “born for the cloud” to take advantage of an elastic environment. The work involved in making relational databases scale on traditional hardware is hard enough; making them scale in the cloud is possible if you’re willing to carefully restrict how you use them and build a lot of additional infrastructure, but that costs you agility.

Cloud democratizes computing: have developers finally been freed from the shackles of “big IT?” Will they finally get a datastore without an impedance mismatch? When they needed to requisition a bunch of hardware and get a 6-figure purchase order to start development, IT had a lot of control over their platform choices. Now, a developer can spin up some EC2 instances on their credit card, download an open source database, and show their boss a working app a few weeks later. Maybe they need a 5-figure check to get enterprise grade production support, but that’s an easy decision once the application is built.

How much will cloud change the database landscape? If I really knew the answer, I should be investing in software companies, not running them. But I’m betting on change.

— Max