What do you mean "performance loss"?  For example are you seeing it on
the read or write side?  During compactions? Are deletions them selves
expensive (they shouldn't be) but if you have a lot of tombstones that
haven't been compacted away that will make reads slower since there is
more data to scan.  One thing to try is kicking of major compactions
more often so they're smaller (less load) and clean out the deleted
data more often.

You should be able to tell if it is disk or CPU pretty easily via the
JMX interface (jconsole or OpsCenter can read those values) or
something like iostat.  Basically look for high disk IO wait... if you
see that it is disk.  If not, it's CPU.

One optimization I'm doing in my application is choosing row keys so
that I can delete an entire row at a time rather then individual
columns so there is only one tombstone for the whole row.  This isn't
always possible, but if you can layout your data in a way that makes
this possible, it's a good optimization.



On Thu, Nov 17, 2011 at 10:01 AM, Maxim Potekhin <potek...@bnl.gov> wrote:
> In view of my unpleasant discovery last week that deletions in Cassandra
> lead to a very real
> and serious performance loss, I'm working on a strategy of moving forward.
>
> If the tombstones do cause such problem, where should I be looking for
> performance bottlenecks?
> Is it disk, CPU or something else? Thing is, I don't see anything
> outstanding in my Ganglia plots.
>
> TIA,
>
> Maxim
>
>



-- 
Aaron Turner
http://synfin.net/         Twitter: @synfinatic
http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

Reply via email to