Re: Inclusive/exclusive endpoints when compacting token ranges

Berenguer Blasi Mon, 08 Aug 2022 01:22:59 -0700

+1 to new flags also from me

On 26/7/22 18:39, Andrés de la Peña wrote:

I think that's right, using a closed range makes sense to consume thedata provided by "sstablemetadata", which also provides closed ranges.Especially because with half-open ranges we couldn't compact a sstablewith a single big partition, of which we might only know the token butno the partition key.

So probably we should just add documentation about both -st and -etbeing inclusive, and live with a different meaning of -st in repairand compact.

Also, the reason why this is so confusing in the test that started thediscussion is that those closed token ranges are internallyrepresented as "Range<Token>" objects, which are half-open bydefinition. So we should document those methods, and maybe do someminor changes to avoid the use of "Range<Token>" to silently representclosed token ranges.

On Tue, 26 Jul 2022 at 16:27, Jeremiah D Jordan<jeremiah.jor...@gmail.com> wrote:


    Reading the responses here and taking a step back, I think the
    current behavior of nodetool compact is probably the correct
    behavior.  The main use case I can see for using nodetool compact
    is someone wants to take some sstable and compact it with all the
    overlapping sstables.  So you run “sstablemetadata” on the sstable
    and get the min and max tokens, and then you pass those in to
    nodetool compact.  In that case you do want the closed range.

    This is different from running repair where you get the tokens
    from the nodes/nodetool ring and node those level token ranges
    ownership is half open when going from “token owned by node a to
    token owned by node b”.

    So my initial thought/gut reaction that it should work like repair
    is misleading, because you don’t get the tokens from the same
    place you get them when running repair.

    Making the command line options more explicit and documented does
    seem like it could be useful.

    -Jeremiah Jordan

    On Jul 26, 2022, at 9:16 AM, Derek Chen-Becker
    <de...@chen-becker.org> wrote:

    +1 to new flags. A released, albeit undocumented, behavior is
    still a contract with the end user. Flags (and documentation)
    seem like the right path to address the situation.

    Cheers,

    Derek

    On Tue, Jul 26, 2022 at 7:28 AM Benedict Elliott Smith
    <bened...@apache.org> wrote:


        I think a change like this could be dangerous for a lot of
        existing automation built atop nodetool.

        I’m not sure this change is worthwhile. I think it would be
        better to introduce e.g. -ste and -ete for “start token
        exclusive” and “end token exclusive” so that users can opt-in
        to whichever scheme they prefer for their tooling, without
        breaking existing users.

        > On 26 Jul 2022, at 14:22, Brandon Williams
        <dri...@gmail.com> wrote:
        >
        > +1, I think that makes the most sense.
        >
        > Kind Regards,
        > Brandon
        >
        > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan
        <jeremiah.jor...@gmail.com> wrote:
        >>
        >> I like the third option, especially if it makes it
        consistent with repair, which has supported ranges longer and
        I would guess most people would think the compact ranges work
        the same as the repair ranges.
        >>
        >> -Jeremiah Jordan
        >>
        >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña
        <adelap...@apache.org> wrote:
        >>>
        >>> 
        >>> Hi all,
        >>>
        >>> CASSANDRA-17575 has detected that token ranges in
        nodetool compact are interpreted as closed on both sides. For
        example, the command "nodetool compact -st 10 -et 50" will
        compact the tokens in [10, 50]. This way of interpreting
        token ranges is unusual since token ranges are usually
        half-open, and I think that in the previous example one would
        expect that the compacted tokens would be in (10, 50]. That's
        for example the way nodetool repair works, and indeed the
        class org.apache.cassandra.dht.Range is always half-open.
        >>>
        >>> It's worth mentioning that, differently from nodetool
        repair, the help and doc for nodetool compact doesn't specify
        whether the supplied start/end tokens are inclusive or exclusive.
        >>>
        >>> I think that ideally nodetool compact should interpret
        the provided token ranges as half-open, to be consistent with
        how token ranges are usually interpreted. However, this would
        change the way the tool has worked until now. This change
        might be problematic for existing users relying on the old
        behaviour. That would be especially severe for the case where
        the begin and end token are the same, because interpreting
        [x, x] we would compact a single token, whereas I think that
        interpreting (x, x] would compact all the tokens. As for
        compacting ranges including multiple tokens, I think the
        change wouldn't be so bad, since probably the supplied token
        ranges come from tools that are already presenting the ranges
        as half-open. Also, if we are splitting the full ring into
        smaller ranges, half-open intervals would still work and
        would save us some repetitions.
        >>>
        >>> So my question is: Should we change the behaviour of
        nodetool compact to interpret the token ranges as
        half-opened, aligning it with the usual interpretation of
        ranges? Or should we just document the current odd behaviour
        to prevent compatibility issues?
        >>>
        >>> A third option would be changing to half-opened ranges
        and also forbidding ranges where the begin and end token are
        the same, to prevent the accidental compaction of the entire
        ring. Note that nodetool repair also forbids this type of
        token ranges.
        >>>
        >>> What do you think?

--+---------------------------------------------------------------+

    | Derek Chen-Becker                      |
    | GPG Key available at https://keybase.io/dchenbeckerand       |
    | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
    | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
    +---------------------------------------------------------------+

Re: Inclusive/exclusive endpoints when compacting token ranges

Reply via email to