I've done more experimentation and the behavior persists: I start with a
normal dataset which is searcheable by a secondary index. I select by
that index the entries that match a certain criterion, then delete
those. I tried two methods of deletion -- individual cf.remove() as well
as batch removal in Pycassa.
What happens after that is as follows: attempts to read the same CF,
using the same index values start to time out in the Pycassa client
(there is a thrift message about timeout). The entries not touched by
such attempted deletion are read just fine still.
Has anyone seen such behavior?
Thanks,
Maxim
On 11/10/2011 8:30 PM, Maxim Potekhin wrote:
Hello,
My data load comes in batches representing one day in the life of a
large computing facility.
I index the data by the day it was produced, to be able to quickly
pull data for a specific day
within the last year or two. There are 6 other indexes.
When it comes to retiring the data, I intend to delete it for the
oldest date and after that add
a fresh batch of data, so I control the disk space. Therein lies a
problem -- and it maybe
Pycassa related, so I also filed an issue on github -- then I select
by 'DATE=blah' and then
do a batch remove, it works fine for a while, and then after a few
thousand deletions (done
in batches of 1000) it grinds to a halt, i.e. I can no longer iterate
the result, which manifests
in a timeout error.
Is that a behavior seen before? Cassandra version is 0.8.6, Pycassa
1.3.0.
TIA,
Maxim