> Are you sure that it is a good idea to estimate remainingKeys like that?
Since we don't want to scan every row to check overlap and cause heavy IO automatically, the method can only do the best-effort type of calculation. In your case, try running user defined compaction on that sstable file. It goes through every row and remove tombstones when droppable. On Wed, May 22, 2013 at 11:48 AM, cem <cayiro...@gmail.com> wrote: > Thanks for the answer. > > It means that if we use randompartioner it will be very difficult to find a > sstable without any overlap. > > Let me give you an example from my test. > > I have ~50 sstables in total and an sstable with droppable ratio 0.9. I use > GUID for key and only insert (no update -delete) so I dont expect a key in > different sstables. > > I put extra logging to AbstractCompactionStrategy to see the > overlaps.size() and keys and remainingKeys: > > overlaps.size() is around 30, number of keys for that sstable is around 5 M > and remainingKeys is always 0. > > Are you sure that it is a good idea to estimate remainingKeys like that? > > Best Regards, > Cem > > > > On Wed, May 22, 2013 at 5:58 PM, Yuki Morishita <mor.y...@gmail.com> wrote: >> >> > Can method calculate non-overlapping keys as overlapping? >> >> Yes. >> And randomized keys don't matter here since sstables are sorted by >> "token" calculated from key by your partitioner, and the method uses >> sstable's min/max token to estimate overlap. >> >> On Tue, May 21, 2013 at 4:43 PM, cem <cayiro...@gmail.com> wrote: >> > Thank you very much for the swift answer. >> > >> > I have one more question about the second part. Can method calculate >> > non-overlapping keys as overlapping? I mean it uses max and min tokens >> > and >> > column count. They can be very close to each other if random keys are >> > used. >> > >> > In my use case I generate a GUID for each key and send a single write >> > request. >> > >> > Cem >> > >> > On Tue, May 21, 2013 at 11:13 PM, Yuki Morishita <mor.y...@gmail.com> >> > wrote: >> >> >> >> > Why does Cassandra single table compaction skips the keys that are in >> >> > the other sstables? >> >> >> >> because we don't want to resurrect deleted columns. Say, sstable A has >> >> the column with timestamp 1, and sstable B has the same column which >> >> deleted at timestamp 2. Then if we purge that column only from sstable >> >> B, we would see the column with timestamp 1 again. >> >> >> >> > I also dont understand why we have this line in >> >> > worthDroppingTombstones >> >> > method >> >> >> >> What the method is trying to do is to "guess" how many columns that >> >> are not in the rows that don't overlap, without actually going through >> >> every rows in the sstable. We have statistics like column count >> >> histogram, min and max row token for every sstables, we use those in >> >> the method to estimate how many columns the two sstables overlap. >> >> You may have remainingColumnsRatio of 0 when the two sstables overlap >> >> almost entirely. >> >> >> >> >> >> On Tue, May 21, 2013 at 3:43 PM, cem <cayiro...@gmail.com> wrote: >> >> > Hi all, >> >> > >> >> > I have a question about ticket >> >> > https://issues.apache.org/jira/browse/CASSANDRA-3442 >> >> > >> >> > Why does Cassandra single table compaction skips the keys that are in >> >> > the >> >> > other sstables? Please correct if I am wrong. >> >> > >> >> > I also dont understand why we have this line in >> >> > worthDroppingTombstones >> >> > method: >> >> > >> >> > double remainingColumnsRatio = ((double) columns) / >> >> > (sstable.getEstimatedColumnCount().count() * >> >> > sstable.getEstimatedColumnCount().mean()); >> >> > >> >> > remainingColumnsRatio is always 0 in my case and the droppableRatio >> >> > is >> >> > 0.9. Cassandra skips all sstables which are already expired. >> >> > >> >> > This line was introduced by >> >> > https://issues.apache.org/jira/browse/CASSANDRA-4022. >> >> > >> >> > Best Regards, >> >> > Cem >> >> >> >> >> >> >> >> -- >> >> Yuki Morishita >> >> t:yukim (http://twitter.com/yukim) >> > >> > >> >> >> >> -- >> Yuki Morishita >> t:yukim (http://twitter.com/yukim) > > -- Yuki Morishita t:yukim (http://twitter.com/yukim)