> in 2035 we'd hit the same problem again.
In terms of "kicking a can down the road", this would be a pretty vigorous 
kick. I wouldn't push back against this deferral. :)

On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:
> 
> I’m confused why we see *any* increase in sstable size - TTLs and deletion 
> times are already written as unsigned vints as offsets from an sstable epoch 
> for each value.
> 
> I would dig in more carefully to explore why you’re seeing this increase? For 
> the same data there should be no change to size on disk.
> 
> 
>> On 14 Nov 2022, at 06:36, C. Scott Andreas <sc...@paradoxica.net> wrote:
>> A 2-3% increase in storage volume is roughly equivalent to giving up the 
>> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression 
>> levels. This regression could be very expensive for storage-bound use cases.
>> 
>> From the perspective of storage overhead, the unsigned int approach sounds 
>> preferable.
>> 
>>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi <berenguerbl...@gmail.com> 
>>> wrote:
>>>  
>>> Hi all,
>>> 
>>> We have done some more research on c14227. The current patch for 
>>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead 
>>> of int. This approach does not have a negative impact on memtable memory 
>>> usage, as C* controles the memory used by the Memtable, but based on our 
>>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 
>>> to 3%.
>>> 
>>> As a mitigation to this problem it is possible to encode 
>>> *localDeletionTime* as a vint. It results in a 1% improvement but might 
>>> cause additional computations during compaction or some other operations.
>>> 
>>> Benedict's proposal to keep on using ints for TTL but as a delta to 
>>> nowInSecond would work for memtables but not for work in the SSTable where 
>>> nowInSecond does not exist. By consequence we would still suffer from the 
>>> impact on byte flushed and bytes on disk.
>>> 
>>> Another approach that was suggested is the use of unsigned integer. Java 8 
>>> has an unsigned integer API that would allow us to use unsigned int for 
>>> TTLs. Based on computation unsigned ints would give us a maximum time of 
>>> 136 years since the Unix Epoch and therefore a maximum expiration timestamp 
>>> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough 
>>> breathing room though, otherwise in 2035 we'd hit the same problem again.
>>> 
>>> Happy to hear opinions.
>>> 
>>> On 18/10/22 10:56, Berenguer Blasi wrote:
>>>> Hi,
>>>> 
>>>> apologies for the late reply as I have been OOO. I have done some 
>>>> profiling and results look virtually identical on trunk and 14227. I have 
>>>> attached some screenshots to the ticket 
>>>> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are 
>>>> fooling me everything in the jfrs look the same.
>>>> 
>>>> Regards
>>>> 
>>>> On 30/9/22 9:44, Berenguer Blasi wrote:
>>>>> Hi Benedict,
>>>>> 
>>>>> thanks for the reply! Yes some profiling is probably needed, then we can 
>>>>> see if going down the delta encoding big refactor rabbit hole is worth it?
>>>>> 
>>>>> Let's see what other concerns people bring up.
>>>>> 
>>>>> Thx.
>>>>> 
>>>>> On 29/9/22 11:12, Benedict Elliott Smith wrote:
>>>>>> My only slight concern with this approach is the additional memory 
>>>>>> pressure. Since 64yrs should be plenty at any moment in time, I wonder 
>>>>>> if it wouldn’t be better to represent these times as deltas from the 
>>>>>> nowInSec being used to process the query. So, long math would only be 
>>>>>> used to normalise the times to this nowInSec (from whatever is stored in 
>>>>>> the sstable) within a method, and ints would be stored in memtables and 
>>>>>> any objects used for processing. 
>>>>>> 
>>>>>> This might admittedly be more work, but I don’t believe it should be too 
>>>>>> challenging - we can introduce a method deletionTime(int nowInSec) that 
>>>>>> returns a long value by adding nowInSec to the deletionTime, and make 
>>>>>> the underlying value private, refactoring call sites?
>>>>>> 
>>>>>>> On 29 Sep 2022, at 09:37, Berenguer Blasi <berenguerbl...@gmail.com> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I have taken a stab in a PR you can find attached in the ticket. Mainly:
>>>>>>> 
>>>>>>> - I have moved deletion times, gc and nowInSec timestamps to long. That 
>>>>>>> should get us past the 2038 limit.
>>>>>>> 
>>>>>>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a 
>>>>>>> 'free' guardrail.
>>>>>>> 
>>>>>>> - A new NONE overflow policy is the default but everything is backwards 
>>>>>>> compatible by keeping the previous ones in place. Think upgrade 
>>>>>>> scenarios or apps relying on the previous behavior.
>>>>>>> 
>>>>>>> - The new limit is around year 292,471,208,677 which sounds ok given 
>>>>>>> the Sun will start collapsing in 3 to 5 billion years :-)
>>>>>>> 
>>>>>>> - Please feel free to drop by the ticket and take a look at the PR even 
>>>>>>> if it's cursory
>>>>>>> 
>>>>>>> Thx in advance.
>>>>>>> 

Reply via email to