On Tue, Mar 13, 2012 at 11:32 PM, Thorsten von Eicken <t...@rightscale.com> wrote: > On 3/13/2012 4:13 PM, Viktor Jevdokimov wrote: >> What we did to speedup this process to return all exhausted nodes into >> normal state faster: >> We have created a 6 temporary virtual single Cassandra nodes with 2 >> CPU cores and 8GB RAM. >> Stopped completely a compaction for CF on a production node. >> Leveled sstables from this production node was divided into 6 ranges >> and copied into 6 temporary empty nodes. >> On each node we ran a major compaction to compact just 1/6 of data, >> about 10-14GB. It took 1-2 hours to compact them into 1GB of data. >> Then all 6 sstables was copied into one of 6 nodes for a major >> compaction, finally getting expected 3GB sstable. >> Stopping production node, deleting files that was copied, returning >> compacted (may need renaming) and node is back to normal. >> >> Using separate nodes we saved original production nodes time not to >> compact exhausted CF forever, blocking compactions for other CFs. With >> 6 separate nodes we have compacted 2 productions nodes a day, so maybe >> it took the same time, but production nodes were free for regular >> compactions for other CFs. > Yikes, that's quite the ordeal, but I totally get why you had to go > there. Cassandra seems to work well within some use-case bounds and > lacks the sophistication to handle others well. I've been wondering > about the way I use it, which is to hold the last N days of logs and > corresponding index. This means that every day I make a zillion inserts > and a corresponding zillion of deletes for the data inserted N days ago. > The way the compaction works this is horrible. The data is essentially > immutable until it's deleted, yet it's copied a whole bunch of times. In > addition, it takes forever for the deletion tombstones to "meet" the > original data in a compaction and actually compact it away. I've also > run into the zillions of files problem with level compaction you did. I > ended up with over 30k SSTables for ~1TB of data. At that point the > compaction just ceases to make progress. And starting cassandra takes >>30 minutes just for it to open all the SSTables and when done 12GB of > memory are used. Better algorithms and some tools will be needed for all > this to "just work". But then, we're also just at V1.0.8... > TvE
You are correct to say that the way Cassandra works it is not idea for a dataset where you completely delete and re add the entire dataset each day. In fact that may be one of the worst use cases for Cassandra. this has to do with the structured log format and with the tombstones and grace period. Maybe you can set a lower base. LevelDB is new and not as common in the wild as the Sized Tiered. Again it works the way it works. Google must think it is brilliant after all they invented it. For a 1TB of data your 12GB is used by bloom filters. Again this is just a fact of life. Bloom filters are their to make negative lookups faster. Maybe you can lower the bloom filter sizes and the index interval. This should use less memory and help the system start up faster respectively. But nodes stuffed with a trillion keys may not be optimal for many reasons. In out case we want a high portion of the data set in memory. So a 1TB node might need say 256 GB ram :) We opt for more smaller boxes.