Hi,

First, thanks everyone for considering this. I did not expect such a big 
discussion form this, for me it was not such a big thing and a think 
CASSANDRA-19534 is a very good improvement.

If I have to recompile I would also update the code so I'm not sure I see much 
benefit with compile time compatibility. What made me react and raise the issue 
was the complete surprise that this would fail when I upgraded my test cluster 
to 4.1.6. My expectation was that a change like this would have been discussed 
or at least mentioned on the ML or Slack but I can't remember seeing anything. 
A note in the NEWS-file would also have made aware, I wouldn't have been super 
happy but I would have know what to expect and what I had to do.

ecaudit is one thing we do that use the QueryHandler interface but for our 
internal we also use have a few implementations for query tracing/logging and 
prioritize requests. I would say that the QueryHandler interface together with 
the custom payload feature in the native protocol is a powerful combination and 
I would not be surprised if this is used more then you might expect.

-----Original Message-----
From: J. D. Jordan 
<jeremiah.jor...@gmail.com<mailto:%22j.%20d.%20jordan%22%20%3cjeremiah.jor...@gmail.com%3e>>
Reply-To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
Subject: Re: [VOTE] Release Apache Cassandra 4.1.6
Date: Thu, 01 Aug 2024 08:26:42 -0500

@Tommy do you think?  You brought the issue up, I am assuming because you found 
the issue while trying to test ecaudit against the proposed release and it 
broke the integration?
As an active consumer of the interface what are your thoughts?

On Aug 1, 2024, at 8:17 AM, Alex Petrov <al...@coffeenco.de> wrote:


> If we have a path that resolves the issue and also maintains full 
> compatibility for this (semi- / reluctantly-accessible) interface, that would 
> seem ideal. Interested to learn more about the drawbacks to that approach.

My thinking here was that people who might have a binary dependency on this 
interface have to recompile their code, they may as well change 2 lines by 
adding a call to from the new method with `requestTime.startedAtNanos()`. I am 
not in a strong opposition to merging it though. If there is general agreement 
that this is the best way, let's do this: I do not see any drawbacks in terms 
of performance or otherwise.

If we decide to move forward with, it, the patch is up [1].

[1] https://issues.apache.org/jira/browse/CASSANDRA-19811

On Wed, Jul 31, 2024, at 11:24 PM, C. Scott Andreas wrote:
Sorry to veer off from a vote in a vote thread.

@Alex, can you say more about this statement:

> "I think I would prefer to not introduce the change I have proposed (the one 
> that would bring back non-binary compatibility)."

If we have a path that resolves the issue and also maintains full compatibility 
for this (semi- / reluctantly-accessible) interface, that would seem ideal. 
Interested to learn more about the drawbacks to that approach.

Regarding the value of C-19534 I'm happy to attest to the fact that it 
addresses severe metastable failure modes in clusters under heavy traffic on 
the verge of tipping. Jon Haddad's independent testing validated this as 
discussed on the ticket as well: 
https://issues.apache.org/jira/browse/CASSANDRA-19534

Last, @Tommy this is a great catch and I'm glad you raised it. Thanks for 
watching so closely and appreciate you bringing it to everyone's attention.

– Scott

On Jul 31, 2024, at 1:05 PM, Caleb Rackliffe <calebrackli...@gmail.com> wrote:


+1 to proceeding with a simple upgrade note in NEWS

On Wed, Jul 31, 2024 at 12:50 PM Josh McKenzie 
<jmcken...@apache.org<mailto:jmcken...@apache.org>> wrote:

Unfortunately, I can not immediately see a good way to provide the critical 
bugfix of 
CASSANDRA-19534<https://issues.apache.org/jira/browse/CASSANDRA-19534>, 
affecting all Cassandra users, without making at least some change in this API.

I personally think that this method is very tightly coupled to the 
implementation to expose it via -D. If anyone using it could provide some 
context about why it is an important part of API, it would be give some useful 
context.
Nobody stepping up to engage on the technical piece of this? Unless / until 
somebody does, Alex' argument holds the most weight as the expert with what's 
going on IMO.

The question we're facing is - when we find a defect that requires a change in 
a public facing API, which of the following 2 is more important:

  1.  Keeping the API stable
  2.  Having the defect resolved

Obviously this will be case-by-case. What CASSANDRA-19534 addresses:
When a node is under pressure, hundreds of thousands of requests can show up in 
the native transport queue, and it looks like it can take way longer to timeout 
than is configured.
...
After stopping the load test altogether, it took nearly a minute before the 
requests were no longer queued.
I believe our priority here should be having this defect resolved.

On Tue, Jul 30, 2024, at 1:43 PM, Jordan West wrote:
I would make the case that loss of availability / significant performance 
issue, regardless of the amount of time it has existed for, is worth fixing on 
the branches that are widely deployed by the community. Especially when weighed 
against a loosely defined public interface issue.

The queuing issue has been a persistent problem (like you said 10 years) and I 
regularly (approx once every 1-2 weeks) have to tell my customers “we either 
have to wait for Cassandra to clear the queues or do a rolling restart to fix 
it” both which come at a cost during an incident where a client overloaded the 
DB and the impact is severe or business impacting. Especially for customers 
doing LWTs or using non-standard RFs which are also more prevalent in my 
experience than an external implementation of QueryHandler.

While not data loss, I would argue this is a critical bug and if we did find a 
data loss issue dormant for 10 years (which has happened in the past) we would 
fix it as soon as it was found and a patch was made available on all actively 
maintained versions.

Jordan

On Tue, Jul 30, 2024 at 10:30 Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:

It’s a 10 year old flaw in an 18 month old branch. Why does it need to go into 
4.1, it’s not a regression and it clearly breaks compatibility?




On Jul 30, 2024, at 8:52 AM, Jon Haddad 
<j...@jonhaddad.com<mailto:j...@jonhaddad.com>> wrote:

This patch fixes a long standing issue that's the root cause of availability 
failures.  Even though folks can specify a custom query handler with the -D 
flag, the number of users impacted by this is going to be incredibly small.  On 
the other hand, the fix helps every single user of 4.1 that puts too much 
pressure on the cluster, which happens fairly regularly.

My POV is that it's a fairly weak argument that this is a public interface, but 
I don't consider it worth debating whether it is or not, because even if it is, 
this improves stability of the database for all users, so it's worth going in.  
Let's not be dogmatic about fixes that help 99% of users because an incredibly 
small number that actually implement a custom query handler will need to make a 
trivial update in order to use the latest 4.1.6 dependency.

Jon



On Tue, Jul 30, 2024 at 8:09 AM J. D. Jordan 
<jeremiah.jor...@gmail.com<mailto:jeremiah.jor...@gmail.com>> wrote:

Given we allow a pluggable query handler implementation to be specified for the 
server with a -D during startup. So I would consider the query handler one of 
our public interfaces.

On Jul 30, 2024, at 9:35 AM, Alex Petrov 
<al...@coffeenco.de<mailto:al...@coffeenco.de>> wrote:

Hi Tommy,

Thank you for spotting this and bringing this to community's attention.

I believe our primary interfaces are native and internode protocol, and CLI 
tools. Most interfaces are used to to abstract implementations internally. Few 
interfaces, such as DataType, Partitioner, and Triggers can be depended upon by 
external tools using Cassandra as a library. There is no official way to plug 
in a QueryHandler, so I did not consider it to be a part of our public API.

From [1]:

> These considerations are especially important for public APIs, including CQL, 
> virtual tables, JMX, yaml, system properties, etc. Any planned additions must 
> be carefully considered in the context of any existing APIs. Where possible 
> the approach of any existing API should be followed.

Maybe we should have an exhaustive list of public APIs, and explicitly mention 
that native and internode protocols are included, alongside with nodetool 
command API and output, but also which classes/interfaces specifically should 
be evolved with care.

Thank you,
--Alex

[1] https://cassandra.apache.org/_/development/index.html

On Tue, Jul 30, 2024, at 10:56 AM, Tommy Stendahl via dev wrote:
Hi,

There is a change in the QueryHandler interface introduced by 
https://issues.apache.org/jira/browse/CASSANDRA-19534

Do we allow changes such changes between 4.1.5 and 4.1.6?
CASSANDRA-19534 looks like a very good change so maybe there is an exception in 
this case?

/Tommy

-----Original Message-----
From: Brandon Williams 
<brandonwilli...@apache.org<mailto:brandon%20williams%20%3cbrandonwilli...@apache.org%3e>>
Reply-To: dev@cassandra.apache.org<mailto:dev@cassandra.apache.org>
To: dev <dev@cassandra.apache.org<mailto:dev%20%3c...@cassandra.apache.org%3e>>
Subject: [VOTE] Release Apache Cassandra 4.1.6
Date: Mon, 29 Jul 2024 09:36:04 -0500


Proposing the test build of Cassandra 4.1.6 for release.



sha1: b662744af59f3a3dfbfeb7314e29fecb93abfd80


Git:

<https://github.com/apache/cassandra/tree/4.1.6-tentative>

https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Ftree%2F4.1.6-tentative&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055937277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BWaJmvRTXvrMh%2FFBRzt%2FOost%2Bn6xAkgePP2ObtmTnbY%3D&reserved=0



Maven Artifacts:

<https://repository.apache.org/content/repositories/orgapachecassandra-1339/org/apache/cassandra/cassandra-all/4.1.6/>

https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachecassandra-1339%2Forg%2Fapache%2Fcassandra%2Fcassandra-all%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055947610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2baa1fUTwQqDpPtFAdv%2FFU6sqax3LSkKEm%2FUdbcHsbE%3D&reserved=0




The Source and Build Artifacts, and the Debian and RPM packages and


repositories, are available here:

<https://dist.apache.org/repos/dist/dev/cassandra/4.1.6/>

https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fcassandra%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055951106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9FUMT0F7c%2B0y7NbvgN9fQrSNgNO2YGfKMwk9ajy2MKA%3D&reserved=0




The vote will be open for 72 hours (longer if needed). Everyone who


has tested the build is invited to vote. Votes by PMC members are


considered binding. A vote passes if there are at least three binding


+1s and no -1's.



[1]: CHANGES.txt:

<https://github.com/apache/cassandra/blob/4.1.6-tentative/CHANGES.txt>

https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FCHANGES.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055954173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3u1LazTB3GixsR7MEwxT%2ByqMrnwHjBL72r8Vy0C1HhE%3D&reserved=0



[2]: NEWS.txt:

<https://github.com/apache/cassandra/blob/4.1.6-tentative/NEWS.txt>

https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FNEWS.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055957376%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4TROx5HB5vJuLTYNoAqMx2A3%2FUUtZ3Edr6aa4JVvHEA%3D&reserved=0





Kind Regards,


Brandon





Reply via email to