I'm running the 0.7 nightly build from aug 31 and noticed some different performance characteristics when using get_slice against a row that has seen a lot of deletes.

One row in the key space has around 650K columns, colums are small at around 53 bytes each so a total of around 30MB. In the last hour or so I finished deleting around 300K columns from the row (and another approx 1M rows from other CF's) that were ordered before those those left in there. 

I stopped my processing restarted it and noticed that a get_slice was running significantly slower then before. If I do a get_slice for 101 columns, no finish col name and vary the start column I see different performance.

start="" - 5 to 6 secs
start = "excer" - 5 to 6 secs
start = "excerise-2010-08-31t17-15-57-92421646-11330" - 0.5 to 0.6 secs (this is the first col in this row)

For comparison a get_slice against another row with 232K cols in the same keyspace, different CF but same col size, with an empty start returned in 0.01 secs.

Could a high level of deletes on a row reduce the get_slice performance ? Is it worth forcing the tombstones out by reducing the GCGraceSeconds and doing a compaction to see what happens ?

Thanks
Aaron



Reply via email to