Thanks Aaron, my reply is inline below: On Tue, Oct 23, 2012 at 2:38 AM, aaron morton <aa...@thelastpickle.com>wrote:
> Performing these steps results in the rows still being present using > *cassandra-cli > list*. > > I assume you are saying the row key is listed without any columns. aka a > ghost row. > Correct. > > What gets really odd is if I add these steps it works > > That's working as designed. > > gc_grace_seconds does not specify when tombstones must be purged, rather > it specifies the minimum duration the tombstone must be stored. It's really > saying "if you compact this column X seconds after the delete you can purge > the tombstone". > > Minor / automatic compaction will kick in if there are (by default) 4 > SSTables of the same size. And will only purge tombstones if all fragments > of the row exists in the SSTables being compaction. > > Major / manual compaction compacts all the sstables, and so purges the > tombstones IF gc_grace_seconds has expired. > > In your first example compaction had not run so the tombstones stayed on > disk. In the second the major compaction purged expired tombstones. > In the first example, I am running compaction at step 7 through nodetool, after gc_grace_seconds has expired. Additionally, if I do not perform the manual delete of the row in the second example, the ghost rows are not cleaned up. I want to know that in our production environment, I don't have to manually delete empty rows after the columns expire. But I can't get an example working to that effect. > > Hope that helps. > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 23/10/2012, at 2:49 PM, Stephen Mullins <smull...@thebrighttag.com> > wrote: > > Hello, I'm seeing Cassandra behavior that I can't explain, on v1.0.12. I'm > trying to test removing rows after all columns have expired. I've read the > following: > http://wiki.apache.org/cassandra/DistributedDeletes > http://wiki.apache.org/cassandra/MemtableSSTable > https://issues.apache.org/jira/browse/CASSANDRA-2795 > > And came up with a test to demonstrate the empty row removal that does the > following: > > 1. create a keyspace > 2. create a column family with gc_seconds=10 (arbitrary small number) > 3. insert a couple rows with ttl=5 (again, just a small number) > 4. use nodetool to flush the column family > 5. sleep >10 seconds > 6. ensure the columns are removed with *cassandra-cli list * > 7. use nodetool to compact the keyspace > > Performing these steps results in the rows still being present using > *cassandra-cli > list*. What gets really odd is if I add these steps it works: > > 1. sleep 5 seconds > 2. use cassandra-cli to *del mycf[arow]* > 3. use nodetool to flush the column family > 4. use nodetool to compact the keyspace > > I don't understand why the first set of steps (1-7) don't work to remove > the empty row, nor do I understand why the explicit row delete somehow > makes this work. I have all this in a script that I could attach if that's > appropriate. Is there something wrong with the steps that I have? > > Thanks, > Stephen > > >