So where does this leave us with this release?  Get Alex's patch
committed and reroll, reroll with a NEWS entry, or... ?

Kind Regards,
Brandon

On Thu, Aug 1, 2024 at 9:04 AM Tommy Stendahl via dev
<dev@cassandra.apache.org> wrote:
>
> Hi,
>
> First, thanks everyone for considering this. I did not expect such a big 
> discussion form this, for me it was not such a big thing and a think 
> CASSANDRA-19534 is a very good improvement.
>
> If I have to recompile I would also update the code so I'm not sure I see 
> much benefit with compile time compatibility. What made me react and raise 
> the issue was the complete surprise that this would fail when I upgraded my 
> test cluster to 4.1.6. My expectation was that a change like this would have 
> been discussed or at least mentioned on the ML or Slack but I can't remember 
> seeing anything. A note in the NEWS-file would also have made aware, I 
> wouldn't have been super happy but I would have know what to expect and what 
> I had to do.
>
> ecaudit is one thing we do that use the QueryHandler interface but for our 
> internal we also use have a few implementations for query tracing/logging and 
> prioritize requests. I would say that the QueryHandler interface together 
> with the custom payload feature in the native protocol is a powerful 
> combination and I would not be surprised if this is used more then you might 
> expect.
>
> -----Original Message-----
> From: J. D. Jordan <jeremiah.jor...@gmail.com>
> Reply-To: dev@cassandra.apache.org
> To: dev@cassandra.apache.org
> Subject: Re: [VOTE] Release Apache Cassandra 4.1.6
> Date: Thu, 01 Aug 2024 08:26:42 -0500
>
> @Tommy do you think?  You brought the issue up, I am assuming because you 
> found the issue while trying to test ecaudit against the proposed release and 
> it broke the integration?
> As an active consumer of the interface what are your thoughts?
>
> On Aug 1, 2024, at 8:17 AM, Alex Petrov <al...@coffeenco.de> wrote:
>
> 
> > If we have a path that resolves the issue and also maintains full 
> > compatibility for this (semi- / reluctantly-accessible) interface, that 
> > would seem ideal. Interested to learn more about the drawbacks to that 
> > approach.
>
> My thinking here was that people who might have a binary dependency on this 
> interface have to recompile their code, they may as well change 2 lines by 
> adding a call to from the new method with `requestTime.startedAtNanos()`. I 
> am not in a strong opposition to merging it though. If there is general 
> agreement that this is the best way, let's do this: I do not see any 
> drawbacks in terms of performance or otherwise.
>
> If we decide to move forward with, it, the patch is up [1].
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-19811
>
> On Wed, Jul 31, 2024, at 11:24 PM, C. Scott Andreas wrote:
>
> Sorry to veer off from a vote in a vote thread.
>
> @Alex, can you say more about this statement:
>
> > "I think I would prefer to not introduce the change I have proposed (the 
> > one that would bring back non-binary compatibility)."
>
> If we have a path that resolves the issue and also maintains full 
> compatibility for this (semi- / reluctantly-accessible) interface, that would 
> seem ideal. Interested to learn more about the drawbacks to that approach.
>
> Regarding the value of C-19534 I'm happy to attest to the fact that it 
> addresses severe metastable failure modes in clusters under heavy traffic on 
> the verge of tipping. Jon Haddad's independent testing validated this as 
> discussed on the ticket as well: 
> https://issues.apache.org/jira/browse/CASSANDRA-19534
>
> Last, @Tommy this is a great catch and I'm glad you raised it. Thanks for 
> watching so closely and appreciate you bringing it to everyone's attention.
>
> – Scott
>
> On Jul 31, 2024, at 1:05 PM, Caleb Rackliffe <calebrackli...@gmail.com> wrote:
>
>
> +1 to proceeding with a simple upgrade note in NEWS
>
> On Wed, Jul 31, 2024 at 12:50 PM Josh McKenzie <jmcken...@apache.org> wrote:
>
>
> Unfortunately, I can not immediately see a good way to provide the critical 
> bugfix of CASSANDRA-19534, affecting all Cassandra users, without making at 
> least some change in this API.
>
> I personally think that this method is very tightly coupled to the 
> implementation to expose it via -D. If anyone using it could provide some 
> context about why it is an important part of API, it would be give some 
> useful context.
>
> Nobody stepping up to engage on the technical piece of this? Unless / until 
> somebody does, Alex' argument holds the most weight as the expert with what's 
> going on IMO.
>
> The question we're facing is - when we find a defect that requires a change 
> in a public facing API, which of the following 2 is more important:
>
> Keeping the API stable
> Having the defect resolved
>
> Obviously this will be case-by-case. What CASSANDRA-19534 addresses:
>
> When a node is under pressure, hundreds of thousands of requests can show up 
> in the native transport queue, and it looks like it can take way longer to 
> timeout than is configured.
> ...
> After stopping the load test altogether, it took nearly a minute before the 
> requests were no longer queued.
>
> I believe our priority here should be having this defect resolved.
>
> On Tue, Jul 30, 2024, at 1:43 PM, Jordan West wrote:
>
> I would make the case that loss of availability / significant performance 
> issue, regardless of the amount of time it has existed for, is worth fixing 
> on the branches that are widely deployed by the community. Especially when 
> weighed against a loosely defined public interface issue.
>
> The queuing issue has been a persistent problem (like you said 10 years) and 
> I regularly (approx once every 1-2 weeks) have to tell my customers “we 
> either have to wait for Cassandra to clear the queues or do a rolling restart 
> to fix it” both which come at a cost during an incident where a client 
> overloaded the DB and the impact is severe or business impacting. Especially 
> for customers doing LWTs or using non-standard RFs which are also more 
> prevalent in my experience than an external implementation of QueryHandler.
>
> While not data loss, I would argue this is a critical bug and if we did find 
> a data loss issue dormant for 10 years (which has happened in the past) we 
> would fix it as soon as it was found and a patch was made available on all 
> actively maintained versions.
>
> Jordan
>
> On Tue, Jul 30, 2024 at 10:30 Jeff Jirsa <jji...@gmail.com> wrote:
>
>
> It’s a 10 year old flaw in an 18 month old branch. Why does it need to go 
> into 4.1, it’s not a regression and it clearly breaks compatibility?
>
>
>
>
> On Jul 30, 2024, at 8:52 AM, Jon Haddad <j...@jonhaddad.com> wrote:
>
> This patch fixes a long standing issue that's the root cause of availability 
> failures.  Even though folks can specify a custom query handler with the -D 
> flag, the number of users impacted by this is going to be incredibly small.  
> On the other hand, the fix helps every single user of 4.1 that puts too much 
> pressure on the cluster, which happens fairly regularly.
>
> My POV is that it's a fairly weak argument that this is a public interface, 
> but I don't consider it worth debating whether it is or not, because even if 
> it is, this improves stability of the database for all users, so it's worth 
> going in.  Let's not be dogmatic about fixes that help 99% of users because 
> an incredibly small number that actually implement a custom query handler 
> will need to make a trivial update in order to use the latest 4.1.6 
> dependency.
>
> Jon
>
>
>
> On Tue, Jul 30, 2024 at 8:09 AM J. D. Jordan <jeremiah.jor...@gmail.com> 
> wrote:
>
>
> Given we allow a pluggable query handler implementation to be specified for 
> the server with a -D during startup. So I would consider the query handler 
> one of our public interfaces.
>
> On Jul 30, 2024, at 9:35 AM, Alex Petrov <al...@coffeenco.de> wrote:
>
> 
> Hi Tommy,
>
> Thank you for spotting this and bringing this to community's attention.
>
> I believe our primary interfaces are native and internode protocol, and CLI 
> tools. Most interfaces are used to to abstract implementations internally. 
> Few interfaces, such as DataType, Partitioner, and Triggers can be depended 
> upon by external tools using Cassandra as a library. There is no official way 
> to plug in a QueryHandler, so I did not consider it to be a part of our 
> public API.
>
> From [1]:
>
> > These considerations are especially important for public APIs, including 
> > CQL, virtual tables, JMX, yaml, system properties, etc. Any planned 
> > additions must be carefully considered in the context of any existing APIs. 
> > Where possible the approach of any existing API should be followed.
>
> Maybe we should have an exhaustive list of public APIs, and explicitly 
> mention that native and internode protocols are included, alongside with 
> nodetool command API and output, but also which classes/interfaces 
> specifically should be evolved with care.
>
> Thank you,
> --Alex
>
> [1] https://cassandra.apache.org/_/development/index.html
>
> On Tue, Jul 30, 2024, at 10:56 AM, Tommy Stendahl via dev wrote:
>
> Hi,
>
> There is a change in the QueryHandler interface introduced by 
> https://issues.apache.org/jira/browse/CASSANDRA-19534
>
> Do we allow changes such changes between 4.1.5 and 4.1.6?
> CASSANDRA-19534 looks like a very good change so maybe there is an exception 
> in this case?
>
> /Tommy
>
> -----Original Message-----
> From: Brandon Williams <brandonwilli...@apache.org>
> Reply-To: dev@cassandra.apache.org
> To: dev <dev@cassandra.apache.org>
> Subject: [VOTE] Release Apache Cassandra 4.1.6
> Date: Mon, 29 Jul 2024 09:36:04 -0500
>
> Proposing the test build of Cassandra 4.1.6 for release.
>
>
>
> sha1: b662744af59f3a3dfbfeb7314e29fecb93abfd80
>
>
> Git:
>
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Ftree%2F4.1.6-tentative&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055937277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BWaJmvRTXvrMh%2FFBRzt%2FOost%2Bn6xAkgePP2ObtmTnbY%3D&reserved=0
>
>
>
> Maven Artifacts:
>
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachecassandra-1339%2Forg%2Fapache%2Fcassandra%2Fcassandra-all%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055947610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2baa1fUTwQqDpPtFAdv%2FFU6sqax3LSkKEm%2FUdbcHsbE%3D&reserved=0
>
>
>
>
> The Source and Build Artifacts, and the Debian and RPM packages and
>
>
> repositories, are available here:
>
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fcassandra%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055951106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9FUMT0F7c%2B0y7NbvgN9fQrSNgNO2YGfKMwk9ajy2MKA%3D&reserved=0
>
>
>
>
> The vote will be open for 72 hours (longer if needed). Everyone who
>
>
> has tested the build is invited to vote. Votes by PMC members are
>
>
> considered binding. A vote passes if there are at least three binding
>
>
> +1s and no -1's.
>
>
>
> [1]: CHANGES.txt:
>
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FCHANGES.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055954173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3u1LazTB3GixsR7MEwxT%2ByqMrnwHjBL72r8Vy0C1HhE%3D&reserved=0
>
>
>
> [2]: NEWS.txt:
>
>
> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FNEWS.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055957376%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4TROx5HB5vJuLTYNoAqMx2A3%2FUUtZ3Edr6aa4JVvHEA%3D&reserved=0
>
>
>
>
>
> Kind Regards,
>
>
> Brandon
>
>
>
>
>
>

Reply via email to