RAID is dying. Shocked? The prediction might be a bit early for some folks. It’s still somewhat conventional for some people to think that RAID is a conservative way to scale your IO.
I’d like to assert that in 3-5 years RAID will be a thing of the past.
Want some evidence? Google doesn’t use RAID. They’ve build a database infrastructure which avoids expensive and proprietary hardware controllers.
You could call it a redundant array of inexpensive servers.
Other scale out shops which don’t have access to such toys have built out sharded MySQL installations. LiveJournal, Flickr, Facebook. These shops are using RAID in some situations but they are only using them due to the nature of MySQL scalability limitations.
For example, most MySQL shops don’t have failover master replication setups so they usually invest in more expensive RAID 10 controllers for their database to reduce master downtime.
Imagine for a moment that you had stable automated master promotion. A lot of people are playing with is now (we think we have it solved as well) and hopefully it will become commonplace.
If your master fails you just promote a slave.
So why do you need RAID 10? It’s twice as expensive!
If you wanted to you could use software RAID 0. It’s just as fast and more than 1/2 the price! Probably 1/3rd the price if you factor in the price of the RAID card. Now you can buy another server!
Why stop there. Just ditch the RAID setup altogether. No software RAID.
But how do you utilize all those disks you ask? Good question.
Run parallel MySQL installs!
More and more MySQL implementations are rolling their own sharded database technologies.
Imagine you had an Opteron box with 16G of memory, 4 HDDs, and four cores.
Instead of running only ONE MySQL instance you could run four on this box (one for each HDD, core, and 4G of memory).
This has a lot of ideal properties.
Each instance is a member of a shard within a larger federation. You don’t have to worry about idiosyncrasies like RAID chunk size tuning. If a single disk dies you don’t lose the whole box. You only lose that member of a shard. So if one disk dies you only lose 1/4th of 1/Nth of the servers in your cluster.
The key win though is the fact that you can get significantly higher disk throughput. The binary log, write ahead log for INNODB, and data can utilize one disk. In our benchmarks taking two 100MBps disks and running them on RAID 0 only gives us about a 50% performance boost.
In this setup we scale linearly with the number of disks. We can now get 400MBps of IO with four disks. Not too shabby. If we were to run RAID 0 we wouldn’t see any where near 400MBps.
One difficulty lies in configuration. You’d have to run four MySQL processes. The easy solution here is to run one per port (3306-3309).
Another potential solution would be to run virtualization software like KVM or XEN. This would increase your complexity but you’d be able to avoid configuration difficulty. You’d also need to rely on the performance of your virtualization software.
One area where this does fall over though is with battery backed write caching controllers. It would be interesting to benchmark RAID 0 with a caching controller vs four independent MySQL instances across four disks.
Thoughts?