Will storage vendors be caught in the crossfire of the database wars
On one hand, storage sounds like a great business: data volumes are growing exponentially with no end in sight, and as deployment architectures move to the cloud, centrally managed storage should become ever more important, right? Should this be a great decade of market growth for EMC and NetApp provided they can maintain their leadership in the sector.
I’m not so sure. I think there are three challenges that they face, of which two are under appreciated by the market.
The end of the database neutral-hardware vendor
First and most obviously, with Oracle’s acquisition of Sun, HP-Vertica, IBM-Netezza and of course EMC-Greenplum, the separation of data storage hardware and data management software has ended. It used to be that EMC and NetApp could ride in behind Oracle wins to sell some storage. Life has gotten harder.
Cost drivers in big data
Second, I think the big data wave will benefit storage vendors much less than people think. Stepping back from the hype, there are two reasons why people will eschew databases entirely in favor of something like Hadoop:
- Complex analytics which require a custom program, not a database query
- Large low-value datasets which aren’t worth putting in a database
I think there is a lot of value in the complex analytics, which should be captured by the Hadoop software ecosystem if they can avoid infighting and brutal price competition. I expect Hadoop will play a significant role for a long time for these types of applications.
The second Hadoop opportunity is also very large, though I expect it will monetize at a lower rate . In particular I don’t think people will cough up much money for higher performing storage behind these applications. And in my opinion, people are conflating the high-value case (complex algorithmic data mining) with the large-volume use case (a repository which is slightly smarter than just dumping things in files but much cheaper than a database).
Will people spend money for high-performance map reduce? Yes. And will lots of data sit in some sort of Hadoop (or Hadoop-like) systems? Yes, but most of it at very low cost and very low margin for storage providers. There is a good business in making Hadoop fast, its just not big enough to move the needle for a multi-billion dollar storage company.
Replicated local storage
Third, the database industry is changing. While replication has long been available and a reasonable way to provide greater reliability in the face of storage failures, it is becoming a more common approach. This is being driven by a combination of high cost for shared storage, high complexity in clustered filesystem based failover, and the general trend towards distributed databases for cloud deployments. I’ve seen a number of large deployments put on hold because of central storage costs and brought back to life based on distributed databases with redundant local storage. Not good for EMC or NetApp.
Can these challenges be overcome? Maybe. But its far from obvious to me that today’s high end storage vendors will benefit from big data and the cloud.