> The same backward compatibility mechanism needed for system-provided UUIDs 
> will work for user-provided UUIDs.

By ignoring them, and assigning a different one? That seems confusing, and like 
the feature will in effect be short lived.

It’s a very different problem to upgrade a set of IDs just once that we control 
unilaterally, and another to sensible handle some user input.

I should also note that collision detection is harder than you think. It needs 
to be reliable which means we need to use distributed consensus to allocate 
these ids, it can’t just involve our usual “look in gossip” approach. So 
collision detection by itself is not a small thing to deliver in a few days IMO.

From: Paulo Motta <pauloricard...@gmail.com>
Date: Wednesday, 27 April 2022 at 19:09
To: Cassandra DEV <dev@cassandra.apache.org>
Subject: Re: Code freeze starts 1st May. Anything to be addressed?
> One reason might be compatibility – this may (I hope _will_) migrate to a 
> simple integer of low cardinality in future, which would be a breaking change.

I look forward to this change, but won't we need to implement some backward 
compatibility handling for legacy UUIDs anyway? The same backward compatibility 
mechanism needed for system-provided UUIDs will work for user-provided UUIDs.

> This identifier will likely be used by Accord for correctness, too, and doing 
> something wrong with it could have severe consequences, so at the very least 
> it should be hard to access.

The only potentially issue I see is a host_id collision, which is easily 
fixable by a simple collision check.

> We could of course have two different host ids, one for the user to set to 
> identify the host in some way for them, and another one for internal usage, 
> but I’m not sure that’s a great idea.

I don't think we need to keep the ability to set a host ID if we change the ID 
representation, since it will be incompatible with externally-provided UUIDs. 
We can just remove the feature and call it a day since the new system will 
warrant a major version update anyway.
To be clear, I don't oppose reverting this if there are concerns about it.

Em qua., 27 de abr. de 2022 às 14:51, 
bened...@apache.org<mailto:bened...@apache.org> 
<bened...@apache.org<mailto:bened...@apache.org>> escreveu:
One reason might be compatibility – this may (I hope _will_) migrate to a 
simple integer of low cardinality in future, which would be a breaking change. 
This identifier will likely be used by Accord for correctness, too, and doing 
something wrong with it could have severe consequences, so at the very least it 
should be hard to access.

We could of course have two different host ids, one for the user to set to 
identify the host in some way for them, and another one for internal usage, but 
I’m not sure that’s a great idea.

From: Paulo Motta <pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>>
Date: Wednesday, 27 April 2022 at 18:20
To: Cassandra DEV <dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>>
Subject: Re: Code freeze starts 1st May. Anything to be addressed?
Fully agree we should add a collision check but I don't understand why this 
optional feature is bad/dangerous after we add this ability? Can you provide an 
example of a potential issue?
I don't expect this property to be used by most users, except power users which 
normally know what they're doing. We have tons of potentially dangerous knobs 
and I don't get why this particular one is any different.

Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe 
<s...@beobal.com<mailto:s...@beobal.com>> escreveu:
CASSANDRA-14582 added support for users to supply an arbitrary value for 
HOST_ID when booting a new node. IMO it's a pretty bad and potentially 
dangerous idea for the unique identifier to be settable in this way. Hint 
delivery is already routed by host id and there have been several JIRAs which 
have called for more fundamental reworking of cluster metadata using permanent 
opaque identifiers rather than IPs to address members (CASSANDRA-11559, 
CASSANDRA-15823, etc). Using host id for anything like that in future would be 
made much more difficult with this capability.

Aside from the longer term implications, it seems that the feature as currently 
implemented has some issues. There doesn't appear to be any validation that a 
supplied host id isn't already in use by a live node, so it's trivial to 
trigger a collision which can lead to divergent ring views between nodes and 
ultimately in data loss.

Although this landed in trunk almost 11 months ago it hasn't been included in a 
release yet, so I propose we revert it before cutting 4.1 (although, as the 
revert isn't a feature, I guess technically we could do that during the 
freeze). I'm not completely convinced about encoding metadata into host ids, 
but even if that is something we want to do, I don't think it's wise to 
completely remove control over the identifiers from Cassandra itself.

Thanks,
Sam

On 25 Apr 2022, at 16:17, Ekaterina Dimitrova 
<e.dimitr...@gmail.com<mailto:e.dimitr...@gmail.com>> wrote:

Hi everyone,

Kind reminder that 1st May is around the corner. What does this mean? Our code 
freeze starts on 1st May and my understanding is that only bug fixing can go 
into the 4.1 branch.
If anyone has anything to raise, now is a good time. On my end I saw a few 
things for this week that we should probably put to completion:
- CASSANDRA-17571<https://issues.apache.org/jira/browse/CASSANDRA-17571> - I 
have to close this one, it is in progress; new types in Config is good to be in 
before the freeze I guess, even if It is not yaml change
- CASSANDRA-17557<https://issues.apache.org/jira/browse/CASSANDRA-17557> - we 
need to take care of the parameters so we don't have to deprecate and  support 
anything not actually needed; I think it is probably more or less done
- CASSANDRA-17379<https://issues.apache.org/jira/browse/CASSANDRA-17379> - adds 
a new flag around config; I think it is more or less done, depends on final CI 
and second reviewer maybe needed?
- JMX intercept Cassandra exceptions, I think David mentioned a rebase was 
needed
- CASSANDRA-17212 - The config property minimum_keyspace_rf and their nodetool 
getter and setter commands are new to 4.1. They are suitable to be ported to 
guardrails, and if we do this port in 4.1 we won't need to deprecate that 
property and nodetool commands in the next release, just one release after 
their introduction.

I guess the failing tests we see could be fixed after the freeze but no API 
changes.

Thanks everyone for all the hard work. Please don’t hesitate to raise the flag 
with questions, concerns or any help needed.

Best regards,
Ekaterina

Reply via email to