On Mon, Oct 25, 2010 at 11:21 AM, Eric Rosenberry <e...@rosenberry.org> wrote: > Hey Chris- > That is tough to say as we started out with no data and have been > continuously loading data into the cluster. Initially we had less data than > the amount of RAM in each node (48 gigs) but we have eventually exceeded > that and now have many times more data on each node than in the entire > cluster. > Some key points though: > 1. Upon cold start of the cluster (i.e. nothing in file system cache) disk > i/o was massive even when the total dataset was less than the RAM in one > system (this same thing holds true in RDBMS systems of course, though many > of them are smart about pre-loading data) > 2. We gave up on using Cassandra's row cache as loading any reasonable > amount of data into the cache would take days/weeks with our tiny row size. > We instead are using file system cache. > 3. After switching to SSD's we thought we might be able to get away with les > RAM (as we were relying on the SSD's to be fast rather than RAM cache) but > dropping them to 24 gigs cut the clusters read capacity by 75%. > 4. When Cassandra is set to replication factor of three and the read replica > count is one, data still gets read (for read repair) on all three nodes that > have a copy of the data. This brings that data into memory on those > machines so the amount of total cluster memory available to cache actual > data is not 192 gigs in my example of four nodes, but only 64 gigs minus OS > and Cassandra overhead (I divided 192 by three since three copies are stored > in RAM across the cluster). > -Eric > > On Mon, Oct 25, 2010 at 7:41 AM, Chris Burroughs <chris.burrou...@gmail.com> > wrote: >> >> You mention that you consistently found your boxes IO bound despite the >> large amount of RAM available for caching. Could you state roughly what >> the ratio of RAM to on disk data was? > >
If reading properly it looks like you used Linux Software Raid on top of the SSD devices. Can you talk about this? I would think that even with a simple RAID this would drive you CPU high. But it seems you may not have other options since SSD RAID cards probably do not exist.