Re: Inclusive/exclusive endpoints when compacting token ranges

Jeremiah D Jordan Tue, 26 Jul 2022 08:28:03 -0700

Reading the responses here and taking a step back, I think the current behavior 
of nodetool compact is probably the correct behavior.  The main use case I can 
see for using nodetool compact is someone wants to take some sstable and 
compact it with all the overlapping sstables.  So you run “sstablemetadata” on 
the sstable and get the min and max tokens, and then you pass those in to 
nodetool compact.  In that case you do want the closed range.


This is different from running repair where you get the tokens from the 
nodes/nodetool ring and node those level token ranges ownership is half open 
when going from “token owned by node a to token owned by node b”.

So my initial thought/gut reaction that it should work like repair is 
misleading, because you don’t get the tokens from the same place you get them 
when running repair.

Making the command line options more explicit and documented does seem like it 
could be useful.

-Jeremiah Jordan

> On Jul 26, 2022, at 9:16 AM, Derek Chen-Becker <[email protected]> wrote:
> 
> +1 to new flags. A released, albeit undocumented, behavior is still a 
> contract with the end user. Flags (and documentation) seem like the right 
> path to address the situation.
> 
> Cheers,
> 
> Derek
> 
> On Tue, Jul 26, 2022 at 7:28 AM Benedict Elliott Smith <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> I think a change like this could be dangerous for a lot of existing 
> automation built atop nodetool.
> 
> I’m not sure this change is worthwhile. I think it would be better to 
> introduce e.g. -ste and -ete for “start token exclusive” and “end token 
> exclusive” so that users can opt-in to whichever scheme they prefer for their 
> tooling, without breaking existing users.
> 
> > On 26 Jul 2022, at 14:22, Brandon Williams <[email protected] 
> > <mailto:[email protected]>> wrote:
> > 
> > +1, I think that makes the most sense.
> > 
> > Kind Regards,
> > Brandon
> > 
> > On Tue, Jul 26, 2022 at 8:19 AM J. D. Jordan <[email protected] 
> > <mailto:[email protected]>> wrote:
> >> 
> >> I like the third option, especially if it makes it consistent with repair, 
> >> which has supported ranges longer and I would guess most people would 
> >> think the compact ranges work the same as the repair ranges.
> >> 
> >> -Jeremiah Jordan
> >> 
> >>> On Jul 26, 2022, at 6:49 AM, Andrés de la Peña <[email protected] 
> >>> <mailto:[email protected]>> wrote:
> >>> 
> >>> 
> >>> Hi all,
> >>> 
> >>> CASSANDRA-17575 has detected that token ranges in nodetool compact are 
> >>> interpreted as closed on both sides. For example, the command "nodetool 
> >>> compact -st 10 -et 50" will compact the tokens in [10, 50]. This way of 
> >>> interpreting token ranges is unusual since token ranges are usually 
> >>> half-open, and I think that in the previous example one would expect that 
> >>> the compacted tokens would be in (10, 50]. That's for example the way 
> >>> nodetool repair works, and indeed the class 
> >>> org.apache.cassandra.dht.Range is always half-open.
> >>> 
> >>> It's worth mentioning that, differently from nodetool repair, the help 
> >>> and doc for nodetool compact doesn't specify whether the supplied 
> >>> start/end tokens are inclusive or exclusive.
> >>> 
> >>> I think that ideally nodetool compact should interpret the provided token 
> >>> ranges as half-open, to be consistent with how token ranges are usually 
> >>> interpreted. However, this would change the way the tool has worked until 
> >>> now. This change might be problematic for existing users relying on the 
> >>> old behaviour. That would be especially severe for the case where the 
> >>> begin and end token are the same, because interpreting [x, x] we would 
> >>> compact a single token, whereas I think that interpreting (x, x] would 
> >>> compact all the tokens. As for compacting ranges including multiple 
> >>> tokens, I think the change wouldn't be so bad, since probably the 
> >>> supplied token ranges come from tools that are already presenting the 
> >>> ranges as half-open. Also, if we are splitting the full ring into smaller 
> >>> ranges, half-open intervals would still work and would save us some 
> >>> repetitions.
> >>> 
> >>> So my question is: Should we change the behaviour of nodetool compact to 
> >>> interpret the token ranges as half-opened, aligning it with the usual 
> >>> interpretation of ranges? Or should we just document the current odd 
> >>> behaviour to prevent compatibility issues?
> >>> 
> >>> A third option would be changing to half-opened ranges and also 
> >>> forbidding ranges where the begin and end token are the same, to prevent 
> >>> the accidental compaction of the entire ring. Note that nodetool repair 
> >>> also forbids this type of token ranges.
> >>> 
> >>> What do you think?
> 
> 
> 
> 
> -- 
> +---------------------------------------------------------------+
> | Derek Chen-Becker                                             |
> | GPG Key available at https://keybase.io/dchenbecker 
> <https://keybase.io/dchenbecker> and       |
> | https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org 
> <https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org> |
> | Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
>

Re: Inclusive/exclusive endpoints when compacting token ranges

Reply via email to