> in 2035 we'd hit the same problem again. In terms of "kicking a can down the road", this would be a pretty vigorous kick. I wouldn't push back against this deferral. :)
On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote: > > I’m confused why we see *any* increase in sstable size - TTLs and deletion > times are already written as unsigned vints as offsets from an sstable epoch > for each value. > > I would dig in more carefully to explore why you’re seeing this increase? For > the same data there should be no change to size on disk. > > >> On 14 Nov 2022, at 06:36, C. Scott Andreas <sc...@paradoxica.net> wrote: >> A 2-3% increase in storage volume is roughly equivalent to giving up the >> gain from LZ4 -> LZ4HC, or a one to two-level bump in Zstandard compression >> levels. This regression could be very expensive for storage-bound use cases. >> >> From the perspective of storage overhead, the unsigned int approach sounds >> preferable. >> >>> On Nov 13, 2022, at 10:13 PM, Berenguer Blasi <berenguerbl...@gmail.com> >>> wrote: >>> >>> Hi all, >>> >>> We have done some more research on c14227. The current patch for >>> CASSANDRA-14227 solves the TTL limit issue by switching TTL to long instead >>> of int. This approach does not have a negative impact on memtable memory >>> usage, as C* controles the memory used by the Memtable, but based on our >>> testing it increases the bytes flushed by 4 to 7% and the byte on disk by 2 >>> to 3%. >>> >>> As a mitigation to this problem it is possible to encode >>> *localDeletionTime* as a vint. It results in a 1% improvement but might >>> cause additional computations during compaction or some other operations. >>> >>> Benedict's proposal to keep on using ints for TTL but as a delta to >>> nowInSecond would work for memtables but not for work in the SSTable where >>> nowInSecond does not exist. By consequence we would still suffer from the >>> impact on byte flushed and bytes on disk. >>> >>> Another approach that was suggested is the use of unsigned integer. Java 8 >>> has an unsigned integer API that would allow us to use unsigned int for >>> TTLs. Based on computation unsigned ints would give us a maximum time of >>> 136 years since the Unix Epoch and therefore a maximum expiration timestamp >>> in 2106. We would have to keep TTL at 20y instead of 68y to give us enough >>> breathing room though, otherwise in 2035 we'd hit the same problem again. >>> >>> Happy to hear opinions. >>> >>> On 18/10/22 10:56, Berenguer Blasi wrote: >>>> Hi, >>>> >>>> apologies for the late reply as I have been OOO. I have done some >>>> profiling and results look virtually identical on trunk and 14227. I have >>>> attached some screenshots to the ticket >>>> https://issues.apache.org/jira/browse/CASSANDRA-14227. Unless my eyes are >>>> fooling me everything in the jfrs look the same. >>>> >>>> Regards >>>> >>>> On 30/9/22 9:44, Berenguer Blasi wrote: >>>>> Hi Benedict, >>>>> >>>>> thanks for the reply! Yes some profiling is probably needed, then we can >>>>> see if going down the delta encoding big refactor rabbit hole is worth it? >>>>> >>>>> Let's see what other concerns people bring up. >>>>> >>>>> Thx. >>>>> >>>>> On 29/9/22 11:12, Benedict Elliott Smith wrote: >>>>>> My only slight concern with this approach is the additional memory >>>>>> pressure. Since 64yrs should be plenty at any moment in time, I wonder >>>>>> if it wouldn’t be better to represent these times as deltas from the >>>>>> nowInSec being used to process the query. So, long math would only be >>>>>> used to normalise the times to this nowInSec (from whatever is stored in >>>>>> the sstable) within a method, and ints would be stored in memtables and >>>>>> any objects used for processing. >>>>>> >>>>>> This might admittedly be more work, but I don’t believe it should be too >>>>>> challenging - we can introduce a method deletionTime(int nowInSec) that >>>>>> returns a long value by adding nowInSec to the deletionTime, and make >>>>>> the underlying value private, refactoring call sites? >>>>>> >>>>>>> On 29 Sep 2022, at 09:37, Berenguer Blasi <berenguerbl...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I have taken a stab in a PR you can find attached in the ticket. Mainly: >>>>>>> >>>>>>> - I have moved deletion times, gc and nowInSec timestamps to long. That >>>>>>> should get us past the 2038 limit. >>>>>>> >>>>>>> - TTL is maxed now to 68y. Think CQL API compatibility and a sort of a >>>>>>> 'free' guardrail. >>>>>>> >>>>>>> - A new NONE overflow policy is the default but everything is backwards >>>>>>> compatible by keeping the previous ones in place. Think upgrade >>>>>>> scenarios or apps relying on the previous behavior. >>>>>>> >>>>>>> - The new limit is around year 292,471,208,677 which sounds ok given >>>>>>> the Sun will start collapsing in 3 to 5 billion years :-) >>>>>>> >>>>>>> - Please feel free to drop by the ticket and take a look at the PR even >>>>>>> if it's cursory >>>>>>> >>>>>>> Thx in advance. >>>>>>>