What do you mean "performance loss"? For example are you seeing it on the read or write side? During compactions? Are deletions them selves expensive (they shouldn't be) but if you have a lot of tombstones that haven't been compacted away that will make reads slower since there is more data to scan. One thing to try is kicking of major compactions more often so they're smaller (less load) and clean out the deleted data more often.
You should be able to tell if it is disk or CPU pretty easily via the JMX interface (jconsole or OpsCenter can read those values) or something like iostat. Basically look for high disk IO wait... if you see that it is disk. If not, it's CPU. One optimization I'm doing in my application is choosing row keys so that I can delete an entire row at a time rather then individual columns so there is only one tombstone for the whole row. This isn't always possible, but if you can layout your data in a way that makes this possible, it's a good optimization. On Thu, Nov 17, 2011 at 10:01 AM, Maxim Potekhin <potek...@bnl.gov> wrote: > In view of my unpleasant discovery last week that deletions in Cassandra > lead to a very real > and serious performance loss, I'm working on a strategy of moving forward. > > If the tombstones do cause such problem, where should I be looking for > performance bottlenecks? > Is it disk, CPU or something else? Thing is, I don't see anything > outstanding in my Ganglia plots. > > TIA, > > Maxim > > -- Aaron Turner http://synfin.net/ Twitter: @synfinatic http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin "carpe diem quam minimum credula postero"