I've updated the CEP with a lot of your suggestions below.
One thing I'm holding off on is making "requiresAuthentication" a
JMX-controllable ("hot") property. I do like the idea of being able to
enable it without bouncing nodes. However, I'm concerned about config
updates made through JMX not being persistent across restarts,
potentially setting someone up to believe they have authentication in
place, only to discover that new or restarted nodes are allowing clients
to slide in without authentication (and with elevated permissions).
I don't know if settling that question is a blocker for the CEP ... in
any event, I'd like to give it more thought.
Thanks -- Joel.
On 7/10/2025 1:22 PM, Josh McKenzie wrote:
... so that in the future new options could just be keys in the
multimap, which would not be considered a protocol change? Or placing
all the options -- current and future -- into the multimap?
The former. Trying to think through defensive postures on the protocol
to prevent future changes required to the client ecosystem, since at
least right now it translates into 7+ different implementations for
different language drivers to support new functionality.
On Wed, Jul 9, 2025, at 8:18 PM, Joel Shepherd wrote:
Hi Josh - Thanks for all the feedback: appreciate it. Responses to
specific points interwoven below ...
On 7/9/2025 3:25 AM, Josh McKenzie wrote:
Sorry for the delay in getting to this Joel. This is great work -
really well thought out and put together. I'm a +1 on it; had the
following observations or questions reviewing the CEP:
Rather than going with a new discretely allowed option to a closed
list of allowable options in STARTUP, what if we went the route we
did with SUPPORTED and offered a variable [string multimap] so we
could add future STARTUP options w/out fully revving the protocol
spec in the future?
(https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L462-L491
<https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v5.spec#L462-L491>)
If I'm understanding, you're suggesting making a one-time protocol
change by adding something like:
STARTUP: {
"CQL_VERSION": "3.11",
<other existing options>
"OPTIONS": <multimap of option name => value>
}
... so that in the future new options could just be keys in the
multimap, which would not be considered a protocol change? Or placing
all the options -- current and future -- into the multimap?
I don't have a strong opinion on this. My weak opinion is that --
especially if the intent is to put all options in the multimap --
it's either going to require careful client-node coordination to roll
out the protocol change, or it's going to require the node to expect
and handle both flavors of the STARTUP message structure. Doable but
I'm not sure that part of the protocol changes so frequently that
it's worth the effort. I'm not going to argue if you or others have a
different and stronger opinion on it.
Any opinion on us deprecating the existing
cassandra.yaml#authenticator: option to be renamed as
"fallback_authenticator", and updating the text to reflect that? Or
/default_authenticator/; something to denote through the naming that
the role of that is secondary to the newly added
/authenticator_negotiation/? I'm 50/50 on this; I don't love
deprecation (knowing we're going to keep supporting the old
nomenclature into perpetuity), but I do think the topic of auth is
important enough that a little config disruption and maintenance
here to steer new users to the new, more secure method of auth might
warrant the change.
I hadn't thought about that but I like the idea. I was striving for
maximum backwards compatibility, but you're right that the existing
key could continue to be supported, just not publicized. Going
forward, something like "default_authenticator" will be less
confusing than just "authenticator". I'll add it to the doc.
`‘requireAuthentication’ should be set to true as soon as all
clients are using other authenticators.`
A hot prop or mutable vtable (... did we ever do mutable / change
config through vtables? Hm, looks like not yet:
https://issues.apache.org/jira/browse/CASSANDRA-15254
<https://issues.apache.org/jira/browse/CASSANDRA-15254>) so we could
change this live on a cluster w/out bouncing would be nice. Be nicer
if that change also coincided w/a change in config via 1 UX through
the DB. ;) But that's a /different/ problem than what you're looking
to address so worth deferring on that piece probably.
Agree it'd be great to be able to enable authn and negotiation
without having to bounce nodes. Will add.
re: thundering herd risk - this tickled my memory. <snip> We have a
separate executor specifically for auth messages in Dispatcher.java;
might be worth keeping an eye on this to see if it proves to be a
new bottleneck w/a more heavyweight negotiation on connection.
Ah, thanks for pointing that out. I wouldn't expect STARTUP (which is
where negotiation will actually be resolved server-side) to be
significantly more expensive than it is today, but will keep an eye
on that. A second potential bottleneck might lie in the increased
overhead of the required OPTIONS/SUPPORTED handshake, which I believe
today is optional: the client can and often does initiate connection
with STARTUP alone. Again, while I wouldn't expect those to be very
expensive to handle, offloading them from the requestExecutor might
protect the workers serving the actual data plane traffic.
The JIRA link in the CEP just links to the ASF C* JIRA - I couldn't
find a ticket for this CEP yet in JIRA. Generally we use that to
link to a specific ticket; was a minor speed bump just FYI.
I was deferring creating a JIRA until if/when the CEP was adopted,
but I'll create one today. Can always close it later.
And we have some prior art in the following:
- CASSANDRA-13048: Support SASL mechanism negotiation in existing
Authenticators
- CASSANDRA-11471: Add SASL mechanism negotiation to the native protocol
Might be good to link to those and once we get a JIRA up for CEP-50,
we can flag those 2 as duplicates of it and close them out once it's
done.
Great: thanks for the pointers.
One detail where CEP-50 varies a bit from the SASL RFC's description
of negotiation
(https://datatracker.ietf.org/doc/html/rfc4422#section-3.2
<https://datatracker.ietf.org/doc/html/rfc4422#section-3.2>) is that
the RFC suggests the server send the client a list of authenticators
to choose from, and the client responds with its chosen
authenticator. The RFC doesn't say that the protocol "MUST" or even
"SHOULD" operate this way, just the implementations "commonly" do. In
early drafts, I actually described the exchange as working this way,
but eventually backed away from it because it seems to create the
risk of a malicious or confused client picking the least secure
option offered by the server: the server loses some control. The
current proposal is that the client sends the server a list of auth
mechanisms that the client can support, and the server makes the
final decision about which to use, which discourages the client from
choosing the weakest usable option. (It doesn't outright prevent it
however.) I believe this is still SASL-compliant because SASL doesn't
mandate a particular exchange, but did want to call it out. I'll fold
this into the CEP as well.
Thanks! -- Joel.
Overall - looks great. Again: +1 from me.
On Tue, Jul 8, 2025, at 8:34 PM, Joel Shepherd wrote:
Hi Doug - That's an interesting suggestion for a load test: I'll
include something like that in our plans.
You're right about the logic in RoleManager: it should be doing the
right thing with MutualTlsWithPasswordFallbackAuthenticator.
Thanks! -- Joel.
On 7/8/2025 1:52 PM, Doug Rohrer wrote:
+1 from me (I think committer +1s are "binding" on CEPs given our
previous "how we do things" conversation, but either way, +1)...
One interesting perf test to think about would be the difference
between negotiated auth with MutualTlsAuthenticator and
PasswordAuthenticator and the combined
MutualTlsWithPasswordFallbackAuthenticator, as I think it'll
provide a pretty good indication of the (hopefully negligible)
performance difference when negotiating.
Also, because I was curious about your last comment...
MutualTlsWithPasswordFallbackAuthenticator _derives_ from
PasswordAuthenticator, so *instanceof
PasswordAuthenticator* checks return *true* for instances of it,
and therefore (assuming you're talking about
supportedOptions/alterableOptions in
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L167-L172
<https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L167-L172>),
it should work fine. Not that that helps _you_ solve your problem,
but at least the existing classes should work.
Thanks for putting the CEP together and working on the implementation!
Doug
On Jul 3, 2025, at 2:38 PM, Joel Shepherd <sheph...@amazon.com>
<mailto:sheph...@amazon.com> wrote:
Thanks Andy - My hope/expectation is this would significantly
reduce the amount of friction involved when either implementing
or migrating to a new authenticator. I suspect it will benefit
more complex environments too, when a single authenticator isn't
ideal for all clients.
At this point, the stickiest bits I've run into have involved
logic that switches on "the" authenticator class, because of the
hardcoded dependency on a specific authn implementation, and
working out how to sanely extend the logic once it's possible for
a single node to be using different authenticators for different
client sessions. Solvable but will take some refactoring and
might also generate debate about what the right behaviors are in
that scenario. An example is the RoleManager which currently
creates additional role attributes if the Password Authenticator
is in use ... which, now that you mention it, I wonder if that's
already broken for the
MutualTlsWithPasswordFallbackAuthenticator. Hmm.
Anyway, thanks for the feedback -- Joel.
On 7/3/2025 7:52 AM, Tolbert, Andy wrote:
Hi Joel,
+1 (nb), I think this is a really good idea and well fleshed our
CEP!
The capability to allow the server to support multiple
authenticators would be very useful. CEP-34 added a
'MutualTlsWithPasswordFallbackAuthenticator' for simultaneously
supporting both mTLS and Password authentication, primarily as a
means to introduce mTLS auth without breaking password auth and
also a possible gradual migration to mTLS, but this only works
for combining these two particular authenticators, and also
creates another authenticator to support.
The migration to any new auth strategy would likely involve the
need to simultaneously support an existing and new auth
provider. I think the approach you describe is well described
and should meet this need assuming users use a driver that
supports it.
In terms of the protocol, utilizing the capabilities of the
existing OPTIONS, STARTUP and SUPPORTED messages to communicate
what authenticators are supported/should be used is pretty
clever as it shouldn't require a protocol version uprev, and
hopefully wouldn't be too complicated for a driver to implement.
Thanks,
Andy
On Mon, Jun 30, 2025 at 11:44 AM Joel Shepherd
<sheph...@amazon.com <mailto:sheph...@amazon.com>> wrote:
Erm ... and here's the CEP:
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-50%3A+Authentication+Negotiation
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-50%3A+Authentication+Negotiation>
(Thanks for the heads up, Abe ...)
-- Joel.
On 6/30/2025 9:37 AM, Joel Shepherd wrote:
> Hello - We would like to propose CEP-50: Authentication
Negotiation
> for adoption by the community: <link> .
>
> This CEP proposes minor changes to the initial handshake
protocol
> (OPTIONS, SUPPORTED and STARTUP messages) to enable a
client to inform
> the node of the authenticators supported by the client,
and changes in
> the node's authentication-related areas to enable it to
pick its
> preferred authenticator for each client client connection.
The CEP
> explains why this approach is proposed, instead of
implementing a
> "negotiating authenticator".
>
> Authentication negotiation will make it easier and safer for
> administrators to migrate clusters to stronger authentication
> mechanisms (including switching on authentication for a
cluster that
> has been using "allow-all" authentication) without
downtime, and to
> support environments where different clients prefer different
> authentication mechanisms (e.g., username and password for
ad-hoc
> cqlsh access, mutual TLS for programmatic access, etc.),
without
> having to pick a single "lowest common denominator"
authenticator for
> all. Additionally, the proposed changes are intended to be
backwards
> compatible for both clients and nodes.
>
> Thanks in advance for your time and feedback. Please keep the
> discussion on this mailing list thread.
>
> Thanks! -- Joel.
>
>
>
>