Thank you Stefano
> On May 16, 2017, at 10:56 AM, Stefano Ortolani <ostef...@gmail.com> wrote:
>
> No, because C* has reverse iterators.
>
> On Tue, May 16, 2017 at 4:47 PM, Nitan Kainth <ni...@bamlabs.com
> <mailto:ni...@bamlabs.com>> wrote:
> If the data is stored in ASC order and query asks for DESC, then wouldn’t it
> read whole partition in first and then pick data from reverse order?
>
>
>> On May 16, 2017, at 10:03 AM, Stefano Ortolani <ostef...@gmail.com
>> <mailto:ostef...@gmail.com>> wrote:
>>
>> Hi Hannu,
>>
>> the piece of data in question is older. In my example the tombstone is the
>> newest piece of data.
>> Since a range tombstone has information re the clustering key ranges, and
>> the data is clustering key sorted, I would expect a linear scan not to be
>> necessary.
>>
>> On Tue, May 16, 2017 at 3:46 PM, Hannu Kröger <hkro...@gmail.com
>> <mailto:hkro...@gmail.com>> wrote:
>> Well, as mentioned, probably Cassandra doesn’t have logic and data to skip
>> bigger regions of deleted data based on range tombstone. If some piece of
>> data in a partition is newer than the tombstone, then it cannot be skipped.
>> Therefore some partition level statistics of cell ages would need to be kept
>> in the column index for the skipping and that is probably not there.
>>
>> Hannu
>>
>>> On 16 May 2017, at 17:33, Stefano Ortolani <ostef...@gmail.com
>>> <mailto:ostef...@gmail.com>> wrote:
>>>
>>> That is another way to see the question: are reverse iterators range
>>> tombstone aware? Yes.
>>> That is why I am puzzled by this afore-mentioned behavior.
>>> I would expect them to handle this case more gracefully.
>>>
>>> Cheers,
>>> Stefano
>>>
>>> On Tue, May 16, 2017 at 3:29 PM, Nitan Kainth <ni...@bamlabs.com
>>> <mailto:ni...@bamlabs.com>> wrote:
>>> Hannu,
>>>
>>> How can you read a partition in reverse?
>>>
>>> Sent from my iPhone
>>>
>>> > On May 16, 2017, at 9:20 AM, Hannu Kröger <hkro...@gmail.com
>>> > <mailto:hkro...@gmail.com>> wrote:
>>> >
>>> > Well, I’m guessing that Cassandra doesn't really know if the range
>>> > tombstone is useful for this or not.
>>> >
>>> > In many cases it might be that the partition contains data that is within
>>> > the range of the tombstone but is newer than the tombstone and therefore
>>> > it might be still be returned. Scanning through deleted data can be
>>> > avoided by reading the partition in reverse (if all the deleted data is
>>> > in the beginning of the partition). Eventually you will still end up
>>> > reading a lot of tombstones but you will get a lot of live data first and
>>> > the implicit query limit of 10000 probably is reached before you get to
>>> > the tombstones. Therefore you will get an immediate answer.
>>> >
>>> > Does it make sense?
>>> >
>>> > Hannu
>>> >
>>> >> On 16 May 2017, at 16:33, Stefano Ortolani <ostef...@gmail.com
>>> >> <mailto:ostef...@gmail.com>> wrote:
>>> >>
>>> >> Hi all,
>>> >>
>>> >> I am seeing inconsistencies when mixing range tombstones, wide
>>> >> partitions, and reverse iterators.
>>> >> I still have to understand if the behaviour is to be expected hence the
>>> >> message on the mailing list.
>>> >>
>>> >> The situation is conceptually simple. I am using a table defined as
>>> >> follows:
>>> >>
>>> >> CREATE TABLE test_cql.test_cf (
>>> >> hash blob,
>>> >> timeid timeuuid,
>>> >> PRIMARY KEY (hash, timeid)
>>> >> ) WITH CLUSTERING ORDER BY (timeid ASC)
>>> >> AND compaction = {'class' : 'LeveledCompactionStrategy'};
>>> >>
>>> >> I then proceed by loading 2/3GB from 3 sstables which I know contain a
>>> >> really wide partition (> 512 MB) for `hash = x`. I then delete the
>>> >> oldest _half_ of that partition by executing the query below, and
>>> >> restart the node:
>>> >>
>>> >> DELETE
>>> >> FROM test_cql.test_cf
>>> >> WHERE hash = x AND timeid < y;
>>> >>
>>> >> If I keep compactions disabled the following query timeouts (takes more
>>> >> than 10 seconds to
>>> >> succeed):
>>> >>
>>> >> SELECT *
>>> >> FROM test_cql.test_cf
>>> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
>>> >> ORDER BY timeid ASC;
>>> >>
>>> >> While the following returns immediately (obviously because no deleted
>>> >> data is ever read):
>>> >>
>>> >> SELECT *
>>> >> FROM test_cql.test_cf
>>> >> WHERE hash = 0x963204d451de3e611daf5e340c3594acead0eaaf
>>> >> ORDER BY timeid DESC;
>>> >>
>>> >> If I force a compaction the problem is gone, but I presume just because
>>> >> the data is rearranged.
>>> >>
>>> >> It seems to me that reading by ASC does not make use of the range
>>> >> tombstone until C* reads the
>>> >> last sstables (which actually contains the range tombstone and is
>>> >> flushed at node restart), and it wastes time reading all rows that are
>>> >> actually not live anymore.
>>> >>
>>> >> Is this expected? Should the range tombstone actually help in these
>>> >> cases?
>>> >>
>>> >> Thanks a lot!
>>> >> Stefano
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> > <mailto:user-unsubscr...@cassandra.apache.org>
>>> > For additional commands, e-mail: user-h...@cassandra.apache.org
>>> > <mailto:user-h...@cassandra.apache.org>
>>> >
>>>
>>
>>
>
>