Re: Mass deletion -- slowing down

Brandon Williams Sun, 13 Nov 2011 16:22:02 -0800

On Sun, Nov 13, 2011 at 5:57 PM, Maxim Potekhin <potek...@bnl.gov> wrote:
> I've done more experimentation and the behavior persists: I start with a
> normal dataset which is searcheable by a secondary index. I select by that
> index the entries that match a certain criterion, then delete those. I tried
> two methods of deletion -- individual cf.remove() as well as batch removal
> in Pycassa.
> What happens after that is as follows: attempts to read the same CF, using
> the same index values start to time out in the Pycassa client (there is a
> thrift message about timeout). The entries not touched by such attempted
> deletion are read just fine still.
>
> Has anyone seen such behavior?


What you're probably running into is a huge amount of tombstone
filtering on the read (see
http://wiki.apache.org/cassandra/DistributedDeletes)

Since you're dealing with timeseries data, using a row-bucketing
technique like http://rubyscale.com/2011/basic-time-series-with-cassandra/
might help by eliminating the need for an index.

-Brandon

Re: Mass deletion -- slowing down

Reply via email to