Re: Mass deletion -- slowing down

Maxim Potekhin Sun, 13 Nov 2011 15:58:01 -0800

I've done more experimentation and the behavior persists: I start with anormal dataset which is searcheable by a secondary index. I select bythat index the entries that match a certain criterion, then deletethose. I tried two methods of deletion -- individual cf.remove() as wellas batch removal in Pycassa.What happens after that is as follows: attempts to read the same CF,using the same index values start to time out in the Pycassa client(there is a thrift message about timeout). The entries not touched bysuch attempted deletion are read just fine still.


Has anyone seen such behavior?


Thanks,
Maxim

On 11/10/2011 8:30 PM, Maxim Potekhin wrote:

Hello,
My data load comes in batches representing one day in the life of alarge computing facility.I index the data by the day it was produced, to be able to quicklypull data for a specific day
within the last year or two. There are 6 other indexes.
When it comes to retiring the data, I intend to delete it for theoldest date and after that adda fresh batch of data, so I control the disk space. Therein lies aproblem -- and it maybePycassa related, so I also filed an issue on github -- then I selectby 'DATE=blah' and thendo a batch remove, it works fine for a while, and then after a fewthousand deletions (donein batches of 1000) it grinds to a halt, i.e. I can no longer iteratethe result, which manifests
in a timeout error.
Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa1.3.0.
TIA,

Maxim

Re: Mass deletion -- slowing down

Reply via email to