Additionally, it will always take at least two compaction passes to purge an expired column: one to turn it into a tombstone, and a second (after gcgs) to remove it.
On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita <mor.y...@gmail.com> wrote: > Data will not be deleted when those keys appear in other stables outside of > compaction. This is to prevent obsolete data from appearing again. > > yuki > > On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote: > > Hi Samal, > > > > Thanks for your time looking into this. > > > > I force the compaction by using forceUserDefinedCompaction on only that > particular sstable. This gurantees me the new sstable being written only > contains the data from the old sstable. > > The data in the sstable is more than 31 days old and gc_grace is 0, but > still the data from the sstable is being written to the new one, while I am > 100% sure all the data is invalid. > > > > Kind regards, > > Pieter Callewaert > > > > From: samal [mailto:samalgo...@gmail.com] > Sent: dinsdag 22 mei 2012 14:33 > To: user@cassandra.apache.org > Subject: Re: supercolumns with TTL columns not being compacted correctly > > > > Data will remain till next compaction but won't be available. Compaction > will delete old sstable create new one. > > On 22-May-2012 5:47 PM, "Pieter Callewaert" <pieter.callewa...@be-mobile.be> > wrote: > > Hi, > > > > I’ve had my suspicions some months, but I think I am sure about it. > > Data is being written by the SSTableSimpleUnsortedWriter and loaded by the > sstableloader. > > The data should be alive for 31 days, so I use the following logic: > > > > int ttl = 2678400; > > long timestamp = System.currentTimeMillis() * 1000; > > long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl * > 1000)); > > > > And using this to write it: > > > > sstableWriter.newRow(bytes(entry.id)); > > sstableWriter.newSuperColumn(bytes(superColumn)); > > sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs), > timestamp, ttl, expirationTimestampMS); > > sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage), > timestamp, ttl, expirationTimestampMS); > > sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl, > expirationTimestampMS); > > > > This works perfectly, data can be queried until 31 days are passed, then no > results are given, as expected. > > But the data is still on disk until the sstables are being recompacted: > > > > One of our nodes (we got 6 total) has the following sstables: > > [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G > > -rw-rw-r--. 1 cassandra cassandra 103G May 3 03:19 > /data/MapData007/HOS-hc-125620-Data.db > > -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17 > /data/MapData007/HOS-hc-163141-Data.db > > -rw-rw-r--. 1 cassandra cassandra 25G May 15 06:17 > /data/MapData007/HOS-hc-172106-Data.db > > -rw-rw-r--. 1 cassandra cassandra 25G May 17 19:50 > /data/MapData007/HOS-hc-181902-Data.db > > -rw-rw-r--. 1 cassandra cassandra 21G May 21 07:37 > /data/MapData007/HOS-hc-191448-Data.db > > -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41 > /data/MapData007/HOS-hc-193842-Data.db > > -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03 > /data/MapData007/HOS-hc-196210-Data.db > > -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20 > /data/MapData007/HOS-hc-196779-Data.db > > -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33 > /data/MapData007/HOS-hc-58572-Data.db > > -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59 > /data/MapData007/HOS-hc-61630-Data.db > > -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46 > /data/MapData007/HOS-hc-63857-Data.db > > -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41 > /data/MapData007/HOS-hc-87900-Data.db > > > > As you can see, the following files should be invalid: > > /data/MapData007/HOS-hc-58572-Data.db > > /data/MapData007/HOS-hc-61630-Data.db > > /data/MapData007/HOS-hc-63857-Data.db > > > > Because they are all written more than an moth ago. gc_grace is 0 so this > should also not be a problem. > > > > As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db. > > Expected behavior should be an empty file is being written because all data > in the sstable should be invalid: > > > > Compactionstats is giving: > > compaction type keyspace column family bytes compacted bytes > total progress > > Compaction MapData007 HOS 11518215662 > 532355279724 2.16% > > > > And when I ls the directory I find this: > > -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12 > /data/MapData007/HOS-tmp-hc-196898-Data.db > > > > The sstable is being 1-on-1 copied to a new one. What am I missing here? > > TTL works perfectly, but is it giving a problem because it is in a super > column, and so never to be deleted from disk? > > > > Kind regards > > Pieter Callewaert | Web & IT engineer > > Be-Mobile NV | TouringMobilis > > Technologiepark 12b - 9052 Ghent - Belgium > > Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 | Cell + 32 473 777 121 > > > > -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra support http://www.datastax.com