Hi,

20y is the current and historic value. 68y is what an integer can accommodate hence the current 2038 limit since the 1970 Unix epoch. I wouldn't make it a configurable value, off the top of my head it would make for some interesting bugs and debugging sessions when nodes had different values. Food for another ticket in any case imo.

Regards

On 3/2/23 14:18, Henrik Ingo wrote:
Naive PHB questions to follow...

Why are 68y and 20y special? Could you pick any value? Could we allow it to be configurable? (Last one probably overkill, just asking to understand...)

If we can pick any values we want, instinctively I would personally suggest to have TTL higher than 20 years, but also kicking the can further than 2035, which is only 13 years from now. Just to suggest a specific number, why not 35y and 2071?

henrik

On Fri, Feb 3, 2023 at 12:32 PM Berenguer Blasi <berenguerbl...@gmail.com> wrote:

    Hi All,

    a version using Uints, 20y max TTL and kicking the can down the
    road until 2086 has been put up for review #justfyi

    Regards

    On 15/11/22 7:06, Berenguer Blasi wrote:

    Hi all,

    thanks for your answers!.

    To Benedict's point: In terms of the uvint enconding of
    deletionTime i.e. it is true it happens here
    
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/SerializationHeader.java#L170.
    But we also have a DeletionTime serializer here
    
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/DeletionTime.java#L166
    that is writing an int and a long that would now write 2 longs.

    TTL itself (the delta) remains an int in the new PR so it should
    have no effect in size.

    Did I reference the correct parts of the codebase? No sstable
    expert here.

    On 14/11/22 19:28, Josh McKenzie wrote:
    in 2035 we'd hit the same problem again.
    In terms of "kicking a can down the road", this would be a
    pretty vigorous kick. I wouldn't push back against this deferral. :)

    On Mon, Nov 14, 2022, at 9:28 AM, Benedict wrote:

    I’m confused why we see *any* increase in sstable size - TTLs
    and deletion times are already written as unsigned vints as
    offsets from an sstable epoch for each value.

    I would dig in more carefully to explore why you’re seeing this
    increase? For the same data there should be no change to size
    on disk.


    On 14 Nov 2022, at 06:36, C. Scott Andreas
    <sc...@paradoxica.net> <mailto:sc...@paradoxica.net> wrote:
    A 2-3% increase in storage volume is roughly equivalent to
    giving up the gain from LZ4 -> LZ4HC, or a one to two-level
    bump in Zstandard compression levels. This regression could be
    very expensive for storage-bound use cases.

    From the perspective of storage overhead, the unsigned int
    approach sounds preferable.

    On Nov 13, 2022, at 10:13 PM, Berenguer Blasi
    <berenguerbl...@gmail.com> <mailto:berenguerbl...@gmail.com>
    wrote:
    

    Hi all,

    We have done some more research on c14227. The current patch
    for CASSANDRA-14227 solves the TTL limit issue by switching
    TTL to long instead of int. This approach does not have a
    negative impact on memtable memory usage, as C* controles the
    memory used by the Memtable, but based on our testing it
    increases the bytes flushed by 4 to 7% and the byte on disk
    by 2 to 3%.

    As a mitigation to this problem it is possible to encode
    /localDeletionTime/ as a vint. It results in a 1% improvement
    but might cause additional computations during compaction or
    some other operations.

    Benedict's proposal to keep on using ints for TTL but as a
    delta to nowInSecond would work for memtables but not for
    work in the SSTable where nowInSecond does not exist. By
    consequence we would still suffer from the impact on byte
    flushed and bytes on disk.

    Another approach that was suggested is the use of unsigned
    integer. Java 8 has an unsigned integer API that would allow
    us to use unsigned int for TTLs. Based on computation
    unsigned ints would give us a maximum time of 136 years since
    the Unix Epoch and therefore a maximum expiration timestamp
    in 2106. We would have to keep TTL at 20y instead of 68y to
    give us enough breathing room though, otherwise in 2035 we'd
    hit the same problem again.

    Happy to hear opinions.

    On 18/10/22 10:56, Berenguer Blasi wrote:

    Hi,

    apologies for the late reply as I have been OOO. I have done
    some profiling and results look virtually identical on trunk
    and 14227. I have attached some screenshots to the ticket
    https://issues.apache.org/jira/browse/CASSANDRA-14227.
    Unless my eyes are fooling me everything in the jfrs look
    the same.

    Regards

    On 30/9/22 9:44, Berenguer Blasi wrote:

    Hi Benedict,

    thanks for the reply! Yes some profiling is probably
    needed, then we can see if going down the delta encoding
    big refactor rabbit hole is worth it?

    Let's see what other concerns people bring up.

    Thx.

    On 29/9/22 11:12, Benedict Elliott Smith wrote:
    My only slight concern with this approach is the
    additional memory pressure. Since 64yrs should be plenty
    at any moment in time, I wonder if it wouldn’t be better
    to represent these times as deltas from the nowInSec being
    used to process the query. So, long math would only be
    used to normalise the times to this nowInSec (from
    whatever is stored in the sstable) within a method, and
    ints would be stored in memtables and any objects used for
    processing.

    This might admittedly be more work, but I don’t believe it
    should be too challenging - we can introduce a method
    deletionTime(int nowInSec) that returns a long value by
    adding nowInSec to the deletionTime, and make the
    underlying value private, refactoring call sites?

    On 29 Sep 2022, at 09:37, Berenguer Blasi
    <berenguerbl...@gmail.com>
    <mailto:berenguerbl...@gmail.com> wrote:

    Hi all,

    I have taken a stab in a PR you can find attached in the
    ticket. Mainly:

    - I have moved deletion times, gc and nowInSec timestamps
    to long. That should get us past the 2038 limit.

    - TTL is maxed now to 68y. Think CQL API compatibility
    and a sort of a 'free' guardrail.

    - A new NONE overflow policy is the default but
    everything is backwards compatible by keeping the
    previous ones in place. Think upgrade scenarios or apps
    relying on the previous behavior.

    - The new limit is around year 292,471,208,677 which
    sounds ok given the Sun will start collapsing in 3 to 5
    billion years :-)

    - Please feel free to drop by the ticket and take a look
    at the PR even if it's cursory

    Thx in advance.




--

        

Henrik Ingo

c. +358 40 569 7354

w. www.datastax.com <http://www.datastax.com>

<https://www.facebook.com/datastax><https://twitter.com/datastax><https://www.linkedin.com/company/datastax/><https://github.com/datastax/>

Reply via email to