I’m confused why we see *any* increase in sstable size - TTLs and deletion 
times are already written as unsigned vints as offsets from an sstable epoch 
for each value.

I would dig in more carefully to explore why you’re seeing this increase? For 
the same data there should be no change to size on disk.

> On 14 Nov 2022, at 06:36, C. Scott Andreas <sc...@paradoxica.net> wrote:
> 
> A 2-3% increase in storage volume is roughly equivalent to giving up the 
> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression 
> levels. This regression could be very expensive for storage-bound use cases.
> 
> From the perspective of storage overhead, the unsigned int approach sounds 
> preferable.
> 
>>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi <berenguerbl...@gmail.com> 
>>> wrote:
>>> 
>> 
>> Hi all,
>> 
>> We have done some more research on c14227. The current patch for 
>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead 
>> of int. This approach does not have a negative impact on memtable memory 
>> usage, as C* controles the memory used by the Memtable, but based on our 
>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 
>> to 3%.
>> 
>> As a mitigation to this problem it is possible to encode localDeletionTime 
>> as a vint. It results in a 1% improvement but might cause additional 
>> computations during compaction or some other operations.
>> 
>> Benedict's proposal to keep on using ints for TTL but as a delta to 
>> nowInSecond would work for memtables but not for work in the SSTable where 
>> nowInSecond does not exist. By consequence we would still suffer from the 
>> impact on byte flushed and bytes on disk.
>> 
>> Another approach that was suggested is the use of unsigned integer. Java 8 
>> has an unsigned integer API that would allow us to use unsigned int for 
>> TTLs. Based on computation unsigned ints would give us a maximum time of 136 
>> years since the Unix Epoch and therefore a maximum expiration timestamp in 
>> 2106. We would have to keep TTL at 20y instead of 68y to give us enough 
>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>> 
>> Happy to hear opinions.
>> 
>> On 18/10/22 10:56, Berenguer Blasi wrote:
>>> Hi,
>>> 
>>> apologies for the late reply as I have been OOO. I have done some profiling 
>>> and results look virtually identical on trunk and 14227. I have attached 
>>> some screenshots to the ticket 
>>> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are 
>>> fooling me everything in the jfrs look the same.
>>> 
>>> Regards
>>> 
>>> On 30/9/22 9:44, Berenguer Blasi wrote:
>>>> Hi Benedict,
>>>> 
>>>> thanks for the reply! Yes some profiling is probably needed, then we can 
>>>> see if going down the delta encoding big refactor rabbit hole is worth it?
>>>> 
>>>> Let's see what other concerns people bring up.
>>>> 
>>>> Thx.
>>>> 
>>>> On 29/9/22 11:12, Benedict Elliott Smith wrote:
>>>>> My only slight concern with this approach is the additional memory 
>>>>> pressure. Since 64yrs should be plenty at any moment in time, I wonder if 
>>>>> it wouldn’t be better to represent these times as deltas from the 
>>>>> nowInSec being used to process the query. So, long math would only be 
>>>>> used to normalise the times to this nowInSec (from whatever is stored in 
>>>>> the sstable) within a method, and ints would be stored in memtables and 
>>>>> any objects used for processing.
>>>>> 
>>>>> This might admittedly be more work, but I don’t believe it should be too 
>>>>> challenging - we can introduce a method deletionTime(int nowInSec) that 
>>>>> returns a long value by adding nowInSec to the deletionTime, and make the 
>>>>> underlying value private, refactoring call sites?
>>>>> 
>>>>>> On 29 Sep 2022, at 09:37, Berenguer Blasi <berenguerbl...@gmail.com> 
>>>>>> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>>>>>> 
>>>>>> - I have moved deletion times, gc and nowInSec timestamps to long. That 
>>>>>> should get us past the 2038 limit.
>>>>>> 
>>>>>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
>>>>>> 'free' guardrail.
>>>>>> 
>>>>>> - A new NONE overflow policy is the default but everything is backwards 
>>>>>> compatible by keeping the previous ones in place. Think upgrade 
>>>>>> scenarios or apps relying on the previous behavior.
>>>>>> 
>>>>>> - The new limit is around year 292,471,208,677 which sounds ok given the 
>>>>>> Sun will start collapsing in 3 to 5 billion years :-)
>>>>>> 
>>>>>> - Please feel free to drop by the ticket and take a look at the PR even 
>>>>>> if it's cursory
>>>>>> 
>>>>>> Thx in advance.
>>>>>> 
>>>>> 

Reply via email to