Are the database rebels throwing out the baby with the bath water?

Posted April 20, 2011
Filed under: Uncategorized |

Over the last few years, there has been a rebellion brewing in the database world. There has been a proliferation of alternative databases which solve certain problems better than your traditional RDBMS (think Oracle). There are embedded databases, in memory databases, column oriented databases, xml databases, data warehousing appliances, key value stores, document databases, and plenty of others that I’m leaving out. In addition, there’s a great proliferation of non-databases being used where databases would traditionally have been used, such as in memory key-value stores and map-reduce frameworks.

With this much activity going on, the obvious question is why. I believe there are a number of factors converging that are driving the activity:

Roughly twenty years of developer frustration with the mismatch between relational data stores and object oriented programming
Growing frustration with the costs associated with traditional RDBMS vendors
The transition first from minicomputer-derived servers to commodity hardware, horizontal scaling, and cloud deployment
The need to manage internet-scale data
The movement towards iterative development and agile methodologies and the difficulty with managing schema transitions in this world

Personally, I believe all of the drivers of change in the database space are valid, but users are frequently adopting the wrong solution in a ill-advised mad quest to demolish their objections to the RDBMS. Some examples of how they go too far:

Replacing a database with a map-reduce framework when real-time query is needed. Hadoop is great for taking jobs that run a month and running them in hours. It won’t run them subsecond though.
Using a key-value store when secondary indexes are needed. Yes, a key-value store provides great flexibility around schema – you don’t need one. That flexibility, however, comes at great cost. What happens the first time you want to query your user object by location instead of userid?
Giving up consistency without a fight. Yes, there are some problems where consistency is not needed. I certainly care about network partitions when I am designing a control system for a nuclear submarine fleet. But if I am travelling in Europe and I can’t play my favorite game for 5 minutes because the internet isn’t working, how bad is that problem? Is it worth trying to teach developers a whole new transaction semantic? In most cases, no.
Optimizing performance for the wrong use cases. If I am travelling in Europe and I log in to watch a video and I update my preferences, is it important that my user account info be stored locally? No; that one time cost of around 100 milliseconds is not a problem. Should the video be streamed from a local server? Absolutely, but that is a different issue which doesn’t bring the issues around resolving updates from multiple masters.

Each use case is different, but there is a common core of relational complaints that can be solved while maintaining much of what we like about RDBMS’s. I believe that systems which:

Offer secondary indexes without requiring up front schema definition to load data
Offer horizontal scalability on commodity hardware
Offer transactional updates and consistent reads
Are easy to program
Are open source

Will address most of the core frustrations driving the database rebellion for operational data stores (not OLAP/data warehousing). Document oriented systems are addressing these issues today. They don’t solve every issue, but I believe they solve a broad set of them and will eventually be the data store of choice for a very broad set of applications.

So, the next time someone says you need to give up secondary indexes, transactional updates or consistent reads to get the scalability or agility you need, think twice before you make the trade.

— Max

[Disclosure reminder: I am President of 10gen, the makers of mongoDB. Unsurprisingly I think mongoDB is right in the sweet spot I described, and I’d encourage you to try it out and see for yourself.]

Max Schireson's blog