Yes, that was my intention but I wanted to cross-check with the ML and the devs keeping an eye on it first.
On Tue, May 16, 2017 at 5:10 PM, Hannu Kröger <hkro...@gmail.com> wrote: > Well, > > sstables contain some statistics about the cell timestamps and using that > information and the tombstone timestamp it might be possible to skip some > data but I’m not sure that Cassandra currently does that. Maybe it would be > worth a JIRA ticket and see what the devs think about it. If optimizing > this case would make sense. > > Hannu > > On 16 May 2017, at 18:03, Stefano Ortolani <ostef...@gmail.com> wrote: > > Hi Hannu, > > the piece of data in question is older. In my example the tombstone is the > newest piece of data. > Since a range tombstone has information re the clustering key ranges, and > the data is clustering key sorted, I would expect a linear scan not to be > necessary. > > On Tue, May 16, 2017 at 3:46 PM, Hannu Kröger <hkro...@gmail.com> wrote: > >> Well, as mentioned, probably Cassandra doesn’t have logic and data to >> skip bigger regions of deleted data based on range tombstone. If some piece >> of data in a partition is newer than the tombstone, then it cannot be >> skipped. Therefore some partition level statistics of cell ages would need >> to be kept in the column index for the skipping and that is probably not >> there. >> >> Hannu >> >> On 16 May 2017, at 17:33, Stefano Ortolani <ostef...@gmail.com> wrote: >> >> That is another way to see the question: are reverse iterators range >> tombstone aware? Yes. >> That is why I am puzzled by this afore-mentioned behavior. >> I would expect them to handle this case more gracefully. >> >> Cheers, >> Stefano >> >> On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com> wrote: >> >>> Hannu, >>> >>> How can you read a partition in reverse? >>> >>> Sent from my iPhone >>> >>> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com> wrote: >>> > >>> > Well, I’m guessing that Cassandra doesn't really know if the range >>> tombstone is useful for this or not. >>> > >>> > In many cases it might be that the partition contains data that is >>> within the range of the tombstone but is newer than the tombstone and >>> therefore it might be still be returned. Scanning through deleted data can >>> be avoided by reading the partition in reverse (if all the deleted data is >>> in the beginning of the partition). Eventually you will still end up >>> reading a lot of tombstones but you will get a lot of live data first and >>> the implicit query limit of 10000 probably is reached before you get to the >>> tombstones. Therefore you will get an immediate answer. >>> > >>> > Does it make sense? >>> > >>> > Hannu >>> > >>> >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com> >>> wrote: >>> >> >>> >> Hi all, >>> >> >>> >> I am seeing inconsistencies when mixing range tombstones, wide >>> partitions, and reverse iterators. >>> >> I still have to understand if the behaviour is to be expected hence >>> the message on the mailing list. >>> >> >>> >> The situation is conceptually simple. I am using a table defined as >>> follows: >>> >> >>> >> CREATE TABLE test_cql.test_cf ( >>> >> hash blob, >>> >> timeid timeuuid, >>> >> PRIMARY KEY (hash, timeid) >>> >> ) WITH CLUSTERING ORDER BY (timeid ASC) >>> >> AND compaction = {'class' : 'LeveledCompactionStrategy'}; >>> >> >>> >> I then proceed by loading 2/3GB from 3 sstables which I know contain >>> a really wide partition (> 512 MB) for `hash = x`. I then delete the oldest >>> _half_ of that partition by executing the query below, and restart the node: >>> >> >>> >> DELETE >>> >> FROM test_cql.test_cf >>> >> WHERE hash = x AND timeid < y; >>> >> >>> >> If I keep compactions disabled the following query timeouts (takes >>> more than 10 seconds to >>> >> succeed): >>> >> >>> >> SELECT * >>> >> FROM test_cql.test_cf >>> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf >>> >> ORDER BY timeid ASC; >>> >> >>> >> While the following returns immediately (obviously because no deleted >>> data is ever read): >>> >> >>> >> SELECT * >>> >> FROM test_cql.test_cf >>> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf >>> >> ORDER BY timeid DESC; >>> >> >>> >> If I force a compaction the problem is gone, but I presume just >>> because the data is rearranged. >>> >> >>> >> It seems to me that reading by ASC does not make use of the range >>> tombstone until C* reads the >>> >> last sstables (which actually contains the range tombstone and is >>> flushed at node restart), and it wastes time reading all rows that are >>> actually not live anymore. >>> >> >>> >> Is this expected? Should the range tombstone actually help in these >>> cases? >>> >> >>> >> Thanks a lot! >>> >> Stefano >>> > >>> > >>> > --------------------------------------------------------------------- >>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> > For additional commands, e-mail: user-h...@cassandra.apache.org >>> > >>> >> >> >> > >