This patch is probably now ready to merge, having been through several 
iterations of review and with green CI. Before that though, I just want to send 
one more reminder about it. We've endeavoured to preserve all existing 
behaviour and to keep configuration 100% backwards compatible. However, some 
areas have had minimal testing in real clusters, specifically the various cloud 
platform configurations: 

* Ec2Snitch/Ec2MultiRegionSnitch
* AzureSnitch
* AlibabaCloudSnitch
* GoogleCloudSnitch
* CloudstackSnitch

Any help in validating these in their native environments would be welcome.

The other consideration is toward custom snitch implementations. The intention 
is that these should continue to work without interruption or intervention, 
unless they're leaning heavily on C* internals in which case any changes 
required ought to be minimal. So it would be great if anyone using a custom 
snitch implementation is able to check it out and help verify that.


> On 31 Oct 2024, at 16:53, Sam Tunnicliffe <s...@beobal.com> wrote:
> 
> Since CEP-21, the source of truth for topology info (a node's datacenter & 
> rack) is ClusterMetadata. Each node provides its dc/rack when it registers 
> itself with the cluster prior to joining and this information is effectively 
> immutable (for now). This significantly reduces the scope of 
> IEndpointSnitch's responsibilities and CASSANDRA-19488 proposes a refactoring 
> which breaks out the remaining functionality into a handful of new providers 
> (full details can be found in the JIRA). 
> 
> This is one of the more widely used extension points in Cassandra, so we 
> wanted to bring it to the mailing list in addition to discussing on JIRA. 
> 
> To be clear, no operator intervention should be necessary when upgrading. To 
> ease migration onto the new config and to allow us to deprecate snitches in a 
> controlled way, it will remain fully supported to configure nodes using the 
> endpoint_snitch setting in yaml. A SnitchAdapter acts as a facade in this 
> case, presenting the new interfaces to calling code while delegating to the 
> legacy snitch. Most of the in-tree snitches have been refactored to extract 
> implementations of the new interfaces so that their functionality can be used 
> via the new configuration.
> 
> Some questions for the list:
> 
> * We have added 2 new methods to IEndpointSnitch, which have essentially been 
> pulled up from Ec2MultiRegionSnitch and GossipingPropertyFileSnitch to 
> support ReconnectableSnitchHelper. Currently, these are added as default 
> methods on the interface so that out-of-tree snitches remain binary 
> compatible. However, it would be safer to break binary compatibility in this 
> case to ensure that any custom snitches out in the wild must be updated and 
> their behaviour is preserved. So the question is, would there be objections 
> to extending the (now deprecated) IEndpointSnitch interface in this way?
> 
> * Python dtests and config are currently unchanged (aside from some error 
> message checks) so these are exercising the path whereby the clusters are 
> configured with endpoint_snitch and make use of the compatibility adapter. 
> In-jvm upgrade dtests switch from old to new style configuration on upgrade 
> to 5.1 (though in truth, these don't exercise snitches much at all as a 
> special dtest snitch is used throughout). cassandra-latest.yaml contains the 
> new settings, while cassandra.yaml and the variations in test/conf retain the 
> old style settings. How should we approach updating these configs so that we 
> maintain a balance between test coverage, compatibility during upgrades and 
> encouraging the use of new style config in fresh clusters?
> 

Reply via email to