I have added this note and will reroll: https://github.com/apache/cassandra/commit/790de1079811278a2b431c2ced7c7ea02d290a25
Kind Regards, Brandon On Mon, Aug 12, 2024 at 2:48 PM Jon Haddad <j...@jonhaddad.com> wrote: > > My preference is a NEWS.txt update and we reroll. > > On Fri, Aug 2, 2024 at 3:11 PM Brandon Williams <dri...@gmail.com> wrote: >> >> So where does this leave us with this release? Get Alex's patch >> committed and reroll, reroll with a NEWS entry, or... ? >> >> Kind Regards, >> Brandon >> >> On Thu, Aug 1, 2024 at 9:04 AM Tommy Stendahl via dev >> <dev@cassandra.apache.org> wrote: >> > >> > Hi, >> > >> > First, thanks everyone for considering this. I did not expect such a big >> > discussion form this, for me it was not such a big thing and a think >> > CASSANDRA-19534 is a very good improvement. >> > >> > If I have to recompile I would also update the code so I'm not sure I see >> > much benefit with compile time compatibility. What made me react and raise >> > the issue was the complete surprise that this would fail when I upgraded >> > my test cluster to 4.1.6. My expectation was that a change like this would >> > have been discussed or at least mentioned on the ML or Slack but I can't >> > remember seeing anything. A note in the NEWS-file would also have made >> > aware, I wouldn't have been super happy but I would have know what to >> > expect and what I had to do. >> > >> > ecaudit is one thing we do that use the QueryHandler interface but for our >> > internal we also use have a few implementations for query tracing/logging >> > and prioritize requests. I would say that the QueryHandler interface >> > together with the custom payload feature in the native protocol is a >> > powerful combination and I would not be surprised if this is used more >> > then you might expect. >> > >> > -----Original Message----- >> > From: J. D. Jordan <jeremiah.jor...@gmail.com> >> > Reply-To: dev@cassandra.apache.org >> > To: dev@cassandra.apache.org >> > Subject: Re: [VOTE] Release Apache Cassandra 4.1.6 >> > Date: Thu, 01 Aug 2024 08:26:42 -0500 >> > >> > @Tommy do you think? You brought the issue up, I am assuming because you >> > found the issue while trying to test ecaudit against the proposed release >> > and it broke the integration? >> > As an active consumer of the interface what are your thoughts? >> > >> > On Aug 1, 2024, at 8:17 AM, Alex Petrov <al...@coffeenco.de> wrote: >> > >> > >> > > If we have a path that resolves the issue and also maintains full >> > > compatibility for this (semi- / reluctantly-accessible) interface, that >> > > would seem ideal. Interested to learn more about the drawbacks to that >> > > approach. >> > >> > My thinking here was that people who might have a binary dependency on >> > this interface have to recompile their code, they may as well change 2 >> > lines by adding a call to from the new method with >> > `requestTime.startedAtNanos()`. I am not in a strong opposition to merging >> > it though. If there is general agreement that this is the best way, let's >> > do this: I do not see any drawbacks in terms of performance or otherwise. >> > >> > If we decide to move forward with, it, the patch is up [1]. >> > >> > [1] https://issues.apache.org/jira/browse/CASSANDRA-19811 >> > >> > On Wed, Jul 31, 2024, at 11:24 PM, C. Scott Andreas wrote: >> > >> > Sorry to veer off from a vote in a vote thread. >> > >> > @Alex, can you say more about this statement: >> > >> > > "I think I would prefer to not introduce the change I have proposed (the >> > > one that would bring back non-binary compatibility)." >> > >> > If we have a path that resolves the issue and also maintains full >> > compatibility for this (semi- / reluctantly-accessible) interface, that >> > would seem ideal. Interested to learn more about the drawbacks to that >> > approach. >> > >> > Regarding the value of C-19534 I'm happy to attest to the fact that it >> > addresses severe metastable failure modes in clusters under heavy traffic >> > on the verge of tipping. Jon Haddad's independent testing validated this >> > as discussed on the ticket as well: >> > https://issues.apache.org/jira/browse/CASSANDRA-19534 >> > >> > Last, @Tommy this is a great catch and I'm glad you raised it. Thanks for >> > watching so closely and appreciate you bringing it to everyone's attention. >> > >> > – Scott >> > >> > On Jul 31, 2024, at 1:05 PM, Caleb Rackliffe <calebrackli...@gmail.com> >> > wrote: >> > >> > >> > +1 to proceeding with a simple upgrade note in NEWS >> > >> > On Wed, Jul 31, 2024 at 12:50 PM Josh McKenzie <jmcken...@apache.org> >> > wrote: >> > >> > >> > Unfortunately, I can not immediately see a good way to provide the >> > critical bugfix of CASSANDRA-19534, affecting all Cassandra users, without >> > making at least some change in this API. >> > >> > I personally think that this method is very tightly coupled to the >> > implementation to expose it via -D. If anyone using it could provide some >> > context about why it is an important part of API, it would be give some >> > useful context. >> > >> > Nobody stepping up to engage on the technical piece of this? Unless / >> > until somebody does, Alex' argument holds the most weight as the expert >> > with what's going on IMO. >> > >> > The question we're facing is - when we find a defect that requires a >> > change in a public facing API, which of the following 2 is more important: >> > >> > Keeping the API stable >> > Having the defect resolved >> > >> > Obviously this will be case-by-case. What CASSANDRA-19534 addresses: >> > >> > When a node is under pressure, hundreds of thousands of requests can show >> > up in the native transport queue, and it looks like it can take way longer >> > to timeout than is configured. >> > ... >> > After stopping the load test altogether, it took nearly a minute before >> > the requests were no longer queued. >> > >> > I believe our priority here should be having this defect resolved. >> > >> > On Tue, Jul 30, 2024, at 1:43 PM, Jordan West wrote: >> > >> > I would make the case that loss of availability / significant performance >> > issue, regardless of the amount of time it has existed for, is worth >> > fixing on the branches that are widely deployed by the community. >> > Especially when weighed against a loosely defined public interface issue. >> > >> > The queuing issue has been a persistent problem (like you said 10 years) >> > and I regularly (approx once every 1-2 weeks) have to tell my customers >> > “we either have to wait for Cassandra to clear the queues or do a rolling >> > restart to fix it” both which come at a cost during an incident where a >> > client overloaded the DB and the impact is severe or business impacting. >> > Especially for customers doing LWTs or using non-standard RFs which are >> > also more prevalent in my experience than an external implementation of >> > QueryHandler. >> > >> > While not data loss, I would argue this is a critical bug and if we did >> > find a data loss issue dormant for 10 years (which has happened in the >> > past) we would fix it as soon as it was found and a patch was made >> > available on all actively maintained versions. >> > >> > Jordan >> > >> > On Tue, Jul 30, 2024 at 10:30 Jeff Jirsa <jji...@gmail.com> wrote: >> > >> > >> > It’s a 10 year old flaw in an 18 month old branch. Why does it need to go >> > into 4.1, it’s not a regression and it clearly breaks compatibility? >> > >> > >> > >> > >> > On Jul 30, 2024, at 8:52 AM, Jon Haddad <j...@jonhaddad.com> wrote: >> > >> > This patch fixes a long standing issue that's the root cause of >> > availability failures. Even though folks can specify a custom query >> > handler with the -D flag, the number of users impacted by this is going to >> > be incredibly small. On the other hand, the fix helps every single user >> > of 4.1 that puts too much pressure on the cluster, which happens fairly >> > regularly. >> > >> > My POV is that it's a fairly weak argument that this is a public >> > interface, but I don't consider it worth debating whether it is or not, >> > because even if it is, this improves stability of the database for all >> > users, so it's worth going in. Let's not be dogmatic about fixes that >> > help 99% of users because an incredibly small number that actually >> > implement a custom query handler will need to make a trivial update in >> > order to use the latest 4.1.6 dependency. >> > >> > Jon >> > >> > >> > >> > On Tue, Jul 30, 2024 at 8:09 AM J. D. Jordan <jeremiah.jor...@gmail.com> >> > wrote: >> > >> > >> > Given we allow a pluggable query handler implementation to be specified >> > for the server with a -D during startup. So I would consider the query >> > handler one of our public interfaces. >> > >> > On Jul 30, 2024, at 9:35 AM, Alex Petrov <al...@coffeenco.de> wrote: >> > >> > >> > Hi Tommy, >> > >> > Thank you for spotting this and bringing this to community's attention. >> > >> > I believe our primary interfaces are native and internode protocol, and >> > CLI tools. Most interfaces are used to to abstract implementations >> > internally. Few interfaces, such as DataType, Partitioner, and Triggers >> > can be depended upon by external tools using Cassandra as a library. There >> > is no official way to plug in a QueryHandler, so I did not consider it to >> > be a part of our public API. >> > >> > From [1]: >> > >> > > These considerations are especially important for public APIs, including >> > > CQL, virtual tables, JMX, yaml, system properties, etc. Any planned >> > > additions must be carefully considered in the context of any existing >> > > APIs. Where possible the approach of any existing API should be followed. >> > >> > Maybe we should have an exhaustive list of public APIs, and explicitly >> > mention that native and internode protocols are included, alongside with >> > nodetool command API and output, but also which classes/interfaces >> > specifically should be evolved with care. >> > >> > Thank you, >> > --Alex >> > >> > [1] https://cassandra.apache.org/_/development/index.html >> > >> > On Tue, Jul 30, 2024, at 10:56 AM, Tommy Stendahl via dev wrote: >> > >> > Hi, >> > >> > There is a change in the QueryHandler interface introduced by >> > https://issues.apache.org/jira/browse/CASSANDRA-19534 >> > >> > Do we allow changes such changes between 4.1.5 and 4.1.6? >> > CASSANDRA-19534 looks like a very good change so maybe there is an >> > exception in this case? >> > >> > /Tommy >> > >> > -----Original Message----- >> > From: Brandon Williams <brandonwilli...@apache.org> >> > Reply-To: dev@cassandra.apache.org >> > To: dev <dev@cassandra.apache.org> >> > Subject: [VOTE] Release Apache Cassandra 4.1.6 >> > Date: Mon, 29 Jul 2024 09:36:04 -0500 >> > >> > Proposing the test build of Cassandra 4.1.6 for release. >> > >> > >> > >> > sha1: b662744af59f3a3dfbfeb7314e29fecb93abfd80 >> > >> > >> > Git: >> > >> > >> > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Ftree%2F4.1.6-tentative&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055937277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BWaJmvRTXvrMh%2FFBRzt%2FOost%2Bn6xAkgePP2ObtmTnbY%3D&reserved=0 >> > >> > >> > >> > Maven Artifacts: >> > >> > >> > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachecassandra-1339%2Forg%2Fapache%2Fcassandra%2Fcassandra-all%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055947610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2baa1fUTwQqDpPtFAdv%2FFU6sqax3LSkKEm%2FUdbcHsbE%3D&reserved=0 >> > >> > >> > >> > >> > The Source and Build Artifacts, and the Debian and RPM packages and >> > >> > >> > repositories, are available here: >> > >> > >> > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fcassandra%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055951106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9FUMT0F7c%2B0y7NbvgN9fQrSNgNO2YGfKMwk9ajy2MKA%3D&reserved=0 >> > >> > >> > >> > >> > The vote will be open for 72 hours (longer if needed). Everyone who >> > >> > >> > has tested the build is invited to vote. Votes by PMC members are >> > >> > >> > considered binding. A vote passes if there are at least three binding >> > >> > >> > +1s and no -1's. >> > >> > >> > >> > [1]: CHANGES.txt: >> > >> > >> > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FCHANGES.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055954173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3u1LazTB3GixsR7MEwxT%2ByqMrnwHjBL72r8Vy0C1HhE%3D&reserved=0 >> > >> > >> > >> > [2]: NEWS.txt: >> > >> > >> > https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FNEWS.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055957376%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4TROx5HB5vJuLTYNoAqMx2A3%2FUUtZ3Edr6aa4JVvHEA%3D&reserved=0 >> > >> > >> > >> > >> > >> > Kind Regards, >> > >> > >> > Brandon >> > >> > >> > >> > >> > >> >