>
> Since STCS does not sort data based on timestamp, your wide partition may
> span over multiple SSTables and inside each SSTable, old data (+
> tombstones) may sit on the same partition as newer data.


Should the data be sorted by my time column regardless of the compaction
strategy? I didn't think that the column timestamp came into play with
respect to sorting. I have been able to review some SSTables with
sstablemetadata and I can see that old/expired data is definitely living
with live data.


On Sun, Jan 29, 2017 at 2:38 PM, DuyHai Doan <doanduy...@gmail.com> wrote:

> Ok so give it a try with TWCS. Since STCS does not sort data based on
> timestamp, your wide partition may span over multiple SSTables and inside
> each SSTable, old data (+ tombstones) may sit on the same partition as
> newer data.
>
> When reading by slice, even if you request for fresh data, Cassandra has
> to scan over a lot tombstones to fetch the correct range of data thus your
> issue
>
> On Sun, Jan 29, 2017 at 8:19 PM, John Sanda <john.sa...@gmail.com> wrote:
>
>> It was with STCS. It was on a 2.x version before TWCS was available.
>>
>> On Sun, Jan 29, 2017 at 10:58 AM DuyHai Doan <doanduy...@gmail.com>
>> wrote:
>>
>>> Did you get this Overwhelming tombstonne behavior with STCS or with TWCS
>>> ?
>>>
>>> If you're using DTCS, beware of its weird behavior and tricky
>>> configuration.
>>>
>>> On Sun, Jan 29, 2017 at 3:52 PM, John Sanda <john.sa...@gmail.com>
>>> wrote:
>>>
>>> Your partitioning key is text. If you have multiple entries per id you
>>> are likely hitting older cells that have expired. Descending only affects
>>> how the data is stored on disk, if you have to read the whole partition to
>>> find whichever time you are querying for you could potentially hit
>>> tombstones in other SSTables that contain the same "id". As mentioned
>>> previously, you need to add a time bucket to your partitioning key and
>>> definitely use DTCS/TWCS.
>>>
>>>
>>> As I mentioned previously, the UI only queries recent data, e.g., the
>>> past hour, past two hours, past day, past week. The UI does not query for
>>> anything older than the TTL which is 7 days. My understanding and
>>> expectation was that Cassandra would only scan live cells. The UI is a
>>> separate application that I do not maintain, so I am not 100% certain about
>>> the queries. I have been told that it does not query for anything older
>>> than 7 days.
>>>
>>> On Sun, Jan 29, 2017 at 4:14 AM, kurt greaves <k...@instaclustr.com>
>>> wrote:
>>>
>>>
>>> Your partitioning key is text. If you have multiple entries per id you
>>> are likely hitting older cells that have expired. Descending only affects
>>> how the data is stored on disk, if you have to read the whole partition to
>>> find whichever time you are querying for you could potentially hit
>>> tombstones in other SSTables that contain the same "id". As mentioned
>>> previously, you need to add a time bucket to your partitioning key and
>>> definitely use DTCS/TWCS.
>>>
>>>
>>>
>>>
>>>
>>> --
>>>
>>> - John
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>


-- 

- John

Reply via email to