> The same backward compatibility mechanism needed for system-provided UUIDs > will work for user-provided UUIDs.
By ignoring them, and assigning a different one? That seems confusing, and like the feature will in effect be short lived. It’s a very different problem to upgrade a set of IDs just once that we control unilaterally, and another to sensible handle some user input. I should also note that collision detection is harder than you think. It needs to be reliable which means we need to use distributed consensus to allocate these ids, it can’t just involve our usual “look in gossip” approach. So collision detection by itself is not a small thing to deliver in a few days IMO. From: Paulo Motta <pauloricard...@gmail.com> Date: Wednesday, 27 April 2022 at 19:09 To: Cassandra DEV <dev@cassandra.apache.org> Subject: Re: Code freeze starts 1st May. Anything to be addressed? > One reason might be compatibility – this may (I hope _will_) migrate to a > simple integer of low cardinality in future, which would be a breaking change. I look forward to this change, but won't we need to implement some backward compatibility handling for legacy UUIDs anyway? The same backward compatibility mechanism needed for system-provided UUIDs will work for user-provided UUIDs. > This identifier will likely be used by Accord for correctness, too, and doing > something wrong with it could have severe consequences, so at the very least > it should be hard to access. The only potentially issue I see is a host_id collision, which is easily fixable by a simple collision check. > We could of course have two different host ids, one for the user to set to > identify the host in some way for them, and another one for internal usage, > but I’m not sure that’s a great idea. I don't think we need to keep the ability to set a host ID if we change the ID representation, since it will be incompatible with externally-provided UUIDs. We can just remove the feature and call it a day since the new system will warrant a major version update anyway. To be clear, I don't oppose reverting this if there are concerns about it. Em qua., 27 de abr. de 2022 às 14:51, bened...@apache.org<mailto:bened...@apache.org> <bened...@apache.org<mailto:bened...@apache.org>> escreveu: One reason might be compatibility – this may (I hope _will_) migrate to a simple integer of low cardinality in future, which would be a breaking change. This identifier will likely be used by Accord for correctness, too, and doing something wrong with it could have severe consequences, so at the very least it should be hard to access. We could of course have two different host ids, one for the user to set to identify the host in some way for them, and another one for internal usage, but I’m not sure that’s a great idea. From: Paulo Motta <pauloricard...@gmail.com<mailto:pauloricard...@gmail.com>> Date: Wednesday, 27 April 2022 at 18:20 To: Cassandra DEV <dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>> Subject: Re: Code freeze starts 1st May. Anything to be addressed? Fully agree we should add a collision check but I don't understand why this optional feature is bad/dangerous after we add this ability? Can you provide an example of a potential issue? I don't expect this property to be used by most users, except power users which normally know what they're doing. We have tons of potentially dangerous knobs and I don't get why this particular one is any different. Em qua., 27 de abr. de 2022 às 14:05, Sam Tunnicliffe <s...@beobal.com<mailto:s...@beobal.com>> escreveu: CASSANDRA-14582 added support for users to supply an arbitrary value for HOST_ID when booting a new node. IMO it's a pretty bad and potentially dangerous idea for the unique identifier to be settable in this way. Hint delivery is already routed by host id and there have been several JIRAs which have called for more fundamental reworking of cluster metadata using permanent opaque identifiers rather than IPs to address members (CASSANDRA-11559, CASSANDRA-15823, etc). Using host id for anything like that in future would be made much more difficult with this capability. Aside from the longer term implications, it seems that the feature as currently implemented has some issues. There doesn't appear to be any validation that a supplied host id isn't already in use by a live node, so it's trivial to trigger a collision which can lead to divergent ring views between nodes and ultimately in data loss. Although this landed in trunk almost 11 months ago it hasn't been included in a release yet, so I propose we revert it before cutting 4.1 (although, as the revert isn't a feature, I guess technically we could do that during the freeze). I'm not completely convinced about encoding metadata into host ids, but even if that is something we want to do, I don't think it's wise to completely remove control over the identifiers from Cassandra itself. Thanks, Sam On 25 Apr 2022, at 16:17, Ekaterina Dimitrova <e.dimitr...@gmail.com<mailto:e.dimitr...@gmail.com>> wrote: Hi everyone, Kind reminder that 1st May is around the corner. What does this mean? Our code freeze starts on 1st May and my understanding is that only bug fixing can go into the 4.1 branch. If anyone has anything to raise, now is a good time. On my end I saw a few things for this week that we should probably put to completion: - CASSANDRA-17571<https://issues.apache.org/jira/browse/CASSANDRA-17571> - I have to close this one, it is in progress; new types in Config is good to be in before the freeze I guess, even if It is not yaml change - CASSANDRA-17557<https://issues.apache.org/jira/browse/CASSANDRA-17557> - we need to take care of the parameters so we don't have to deprecate and support anything not actually needed; I think it is probably more or less done - CASSANDRA-17379<https://issues.apache.org/jira/browse/CASSANDRA-17379> - adds a new flag around config; I think it is more or less done, depends on final CI and second reviewer maybe needed? - JMX intercept Cassandra exceptions, I think David mentioned a rebase was needed - CASSANDRA-17212 - The config property minimum_keyspace_rf and their nodetool getter and setter commands are new to 4.1. They are suitable to be ported to guardrails, and if we do this port in 4.1 we won't need to deprecate that property and nodetool commands in the next release, just one release after their introduction. I guess the failing tests we see could be fixed after the freeze but no API changes. Thanks everyone for all the hard work. Please don’t hesitate to raise the flag with questions, concerns or any help needed. Best regards, Ekaterina