Thanks I didn't knew two stage removal process. On 23-May-2012 2:20 AM, "Jonathan Ellis" <jbel...@gmail.com> wrote:
> Correction: the first compaction after expiration + gcgs can remove > it, even if it hasn't been turned into a tombstone previously. > > On Tue, May 22, 2012 at 9:37 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > > Additionally, it will always take at least two compaction passes to > > purge an expired column: one to turn it into a tombstone, and a second > > (after gcgs) to remove it. > > > > On Tue, May 22, 2012 at 9:21 AM, Yuki Morishita <mor.y...@gmail.com> > wrote: > >> Data will not be deleted when those keys appear in other stables > outside of > >> compaction. This is to prevent obsolete data from appearing again. > >> > >> yuki > >> > >> On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote: > >> > >> Hi Samal, > >> > >> > >> > >> Thanks for your time looking into this. > >> > >> > >> > >> I force the compaction by using forceUserDefinedCompaction on only that > >> particular sstable. This gurantees me the new sstable being written only > >> contains the data from the old sstable. > >> > >> The data in the sstable is more than 31 days old and gc_grace is 0, but > >> still the data from the sstable is being written to the new one, while > I am > >> 100% sure all the data is invalid. > >> > >> > >> > >> Kind regards, > >> > >> Pieter Callewaert > >> > >> > >> > >> From: samal [mailto:samalgo...@gmail.com] > >> Sent: dinsdag 22 mei 2012 14:33 > >> To: user@cassandra.apache.org > >> Subject: Re: supercolumns with TTL columns not being compacted correctly > >> > >> > >> > >> Data will remain till next compaction but won't be available. Compaction > >> will delete old sstable create new one. > >> > >> On 22-May-2012 5:47 PM, "Pieter Callewaert" < > pieter.callewa...@be-mobile.be> > >> wrote: > >> > >> Hi, > >> > >> > >> > >> I’ve had my suspicions some months, but I think I am sure about it. > >> > >> Data is being written by the SSTableSimpleUnsortedWriter and loaded by > the > >> sstableloader. > >> > >> The data should be alive for 31 days, so I use the following logic: > >> > >> > >> > >> int ttl = 2678400; > >> > >> long timestamp = System.currentTimeMillis() * 1000; > >> > >> long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl * > >> 1000)); > >> > >> > >> > >> And using this to write it: > >> > >> > >> > >> sstableWriter.newRow(bytes(entry.id)); > >> > >> sstableWriter.newSuperColumn(bytes(superColumn)); > >> > >> sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs), > >> timestamp, ttl, expirationTimestampMS); > >> > >> sstableWriter.addExpiringColumn(nameCov, > bytes(entry.observationCoverage), > >> timestamp, ttl, expirationTimestampMS); > >> > >> sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, > ttl, > >> expirationTimestampMS); > >> > >> > >> > >> This works perfectly, data can be queried until 31 days are passed, > then no > >> results are given, as expected. > >> > >> But the data is still on disk until the sstables are being recompacted: > >> > >> > >> > >> One of our nodes (we got 6 total) has the following sstables: > >> > >> [cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G > >> > >> -rw-rw-r--. 1 cassandra cassandra 103G May 3 03:19 > >> /data/MapData007/HOS-hc-125620-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17 > >> /data/MapData007/HOS-hc-163141-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 25G May 15 06:17 > >> /data/MapData007/HOS-hc-172106-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 25G May 17 19:50 > >> /data/MapData007/HOS-hc-181902-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 21G May 21 07:37 > >> /data/MapData007/HOS-hc-191448-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41 > >> /data/MapData007/HOS-hc-193842-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03 > >> /data/MapData007/HOS-hc-196210-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20 > >> /data/MapData007/HOS-hc-196779-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33 > >> /data/MapData007/HOS-hc-58572-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59 > >> /data/MapData007/HOS-hc-61630-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46 > >> /data/MapData007/HOS-hc-63857-Data.db > >> > >> -rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41 > >> /data/MapData007/HOS-hc-87900-Data.db > >> > >> > >> > >> As you can see, the following files should be invalid: > >> > >> /data/MapData007/HOS-hc-58572-Data.db > >> > >> /data/MapData007/HOS-hc-61630-Data.db > >> > >> /data/MapData007/HOS-hc-63857-Data.db > >> > >> > >> > >> Because they are all written more than an moth ago. gc_grace is 0 so > this > >> should also not be a problem. > >> > >> > >> > >> As a test, I use forceUserSpecifiedCompaction on the > HOS-hc-61630-Data.db. > >> > >> Expected behavior should be an empty file is being written because all > data > >> in the sstable should be invalid: > >> > >> > >> > >> Compactionstats is giving: > >> > >> compaction type keyspace column family bytes compacted > bytes > >> total progress > >> > >> Compaction MapData007 HOS > 11518215662 > >> 532355279724 2.16% > >> > >> > >> > >> And when I ls the directory I find this: > >> > >> -rw-rw-r--. 1 cassandra cassandra 3.9G May 22 14:12 > >> /data/MapData007/HOS-tmp-hc-196898-Data.db > >> > >> > >> > >> The sstable is being 1-on-1 copied to a new one. What am I missing here? > >> > >> TTL works perfectly, but is it giving a problem because it is in a super > >> column, and so never to be deleted from disk? > >> > >> > >> > >> Kind regards > >> > >> Pieter Callewaert | Web & IT engineer > >> > >> Be-Mobile NV | TouringMobilis > >> > >> Technologiepark 12b - 9052 Ghent - Belgium > >> > >> Tel + 32 9 330 51 80 | Fax + 32 9 330 51 81 | Cell + 32 473 777 121 > >> > >> > >> > >> > > > > > > > > -- > > Jonathan Ellis > > Project Chair, Apache Cassandra > > co-founder of DataStax, the source for professional Cassandra support > > http://www.datastax.com > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >