On Mon, Nov 21, 2011 at 3:30 AM, Philippe <watche...@gmail.com> wrote:
> I don't remember your exact situation but could it be your network > connectivity? > I know I've been upgrading mine because I'm maxing out fastethernet on a > 12 node cluster. > Le 20 nov. 2011 22:54, "Jahangir Mohammed" <md.jahangi...@gmail.com> a > écrit : > >> Mostly, they are I/O and CPU intensive during major compaction. If >> ganglia doesn't have anything suspicious there, then what is performance >> loss ? Read or write? >> On Nov 17, 2011 1:01 PM, "Maxim Potekhin" <potek...@bnl.gov> wrote: >> >>> In view of my unpleasant discovery last week that deletions in Cassandra >>> lead to a very real >>> and serious performance loss, I'm working on a strategy of moving >>> forward. >>> >>> If the tombstones do cause such problem, where should I be looking for >>> performance bottlenecks? >>> Is it disk, CPU or something else? Thing is, I don't see anything >>> outstanding in my Ganglia plots. >>> >>> TIA, >>> >>> Maxim >>> >>> Tomstones do have a performance impact particularly in cases where data has a lot of data turnover and your are using the standard (non LevelDB compaction). Tombstones live on disk for gc_grace_seconds. First the tombstone takes up some small amount of space, which has an effect on disk caching. Secondly bloom filters having a tombstone has an effect on the read path. As a read for a row key will now match multiple bloom filters. If you are constantly adding and removing data and you have a long gc_grace_seconds (10 days is pretty long if your dataset is new every day for example) this is more profound then the use case that rarely deletes. This is why you will notice some use cases call for 'major compaction' while other people believe you should never need it. I force majors on some columns families because there is a high turnover and the data needs to be read often and the difference in data size is the difference between a 20GB size on disk that fits in VFS cache or a 35Gb size on disk that doesn't (and also may 'randomly' have a large compaction at peak time.) I am pretty excited about LevelDB because of how the tiered compaction looks to be more space efficient.