Are the database rebels throwing out the baby with the bath water?
Over the last few years, there has been a rebellion brewing in the database world. There has been a proliferation of alternative databases which solve certain problems better than your traditional RDBMS (think Oracle). There are embedded databases, in memory databases, column oriented databases, xml databases, data warehousing appliances, key value stores, document databases, and plenty of others that I’m leaving out. In addition, there’s a great proliferation of non-databases being used where databases would traditionally have been used, such as in memory key-value stores and map-reduce frameworks.
With this much activity going on, the obvious question is why. I believe there are a number of factors converging that are driving the activity:
- Roughly twenty years of developer frustration with the mismatch between relational data stores and object oriented programming
- Growing frustration with the costs associated with traditional RDBMS vendors
- The transition first from minicomputer-derived servers to commodity hardware, horizontal scaling, and cloud deployment
- The need to manage internet-scale data
- The movement towards iterative development and agile methodologies and the difficulty with managing schema transitions in this world
- Replacing a database with a map-reduce framework when real-time query is needed. Hadoop is great for taking jobs that run a month and running them in hours. It won’t run them subsecond though.
- Using a key-value store when secondary indexes are needed. Yes, a key-value store provides great flexibility around schema – you don’t need one. That flexibility, however, comes at great cost. What happens the first time you want to query your user object by location instead of userid?
- Giving up consistency without a fight. Yes, there are some problems where consistency is not needed. I certainly care about network partitions when I am designing a control system for a nuclear submarine fleet. But if I am travelling in Europe and I can’t play my favorite game for 5 minutes because the internet isn’t working, how bad is that problem? Is it worth trying to teach developers a whole new transaction semantic? In most cases, no.
- Optimizing performance for the wrong use cases. If I am travelling in Europe and I log in to watch a video and I update my preferences, is it important that my user account info be stored locally? No; that one time cost of around 100 milliseconds is not a problem. Should the video be streamed from a local server? Absolutely, but that is a different issue which doesn’t bring the issues around resolving updates from multiple masters.
- Offer secondary indexes without requiring up front schema definition to load data
- Offer horizontal scalability on commodity hardware
- Offer transactional updates and consistent reads
- Are easy to program
- Are open source