> Unfortunately, I can not immediately see a good way to provide the critical > bugfix of CASSANDRA-19534 > <https://issues.apache.org/jira/browse/CASSANDRA-19534>, affecting all > Cassandra users, without making at least some change in this API. > > I personally think that this method is very tightly coupled to the > implementation to expose it via -D. If anyone using it could provide some > context about why it is an important part of API, it would be give some > useful context. Nobody stepping up to engage on the technical piece of this? Unless / until somebody does, Alex' argument holds the most weight as the expert with what's going on IMO.
The question we're facing is - when we find a defect that requires a change in a public facing API, which of the following 2 is more important: 1. Keeping the API stable 2. Having the defect resolved Obviously this will be case-by-case. What CASSANDRA-19534 addresses: > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. > ... > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. I believe our priority here should be having this defect resolved. On Tue, Jul 30, 2024, at 1:43 PM, Jordan West wrote: > I would make the case that loss of availability / significant performance > issue, regardless of the amount of time it has existed for, is worth fixing > on the branches that are widely deployed by the community. Especially when > weighed against a loosely defined public interface issue. > > The queuing issue has been a persistent problem (like you said 10 years) and > I regularly (approx once every 1-2 weeks) have to tell my customers “we > either have to wait for Cassandra to clear the queues or do a rolling restart > to fix it” both which come at a cost during an incident where a client > overloaded the DB and the impact is severe or business impacting. Especially > for customers doing LWTs or using non-standard RFs which are also more > prevalent in my experience than an external implementation of QueryHandler. > > While not data loss, I would argue this is a critical bug and if we did find > a data loss issue dormant for 10 years (which has happened in the past) we > would fix it as soon as it was found and a patch was made available on all > actively maintained versions. > > Jordan > > On Tue, Jul 30, 2024 at 10:30 Jeff Jirsa <jji...@gmail.com> wrote: >> >> It’s a 10 year old flaw in an 18 month old branch. Why does it need to go >> into 4.1, it’s not a regression and it clearly breaks compatibility? >> >> >> >> >>> On Jul 30, 2024, at 8:52 AM, Jon Haddad <j...@jonhaddad.com> wrote: >>> >>> This patch fixes a long standing issue that's the root cause of >>> availability failures. Even though folks can specify a custom query >>> handler with the -D flag, the number of users impacted by this is going to >>> be incredibly small. On the other hand, the fix helps every single user of >>> 4.1 that puts too much pressure on the cluster, which happens fairly >>> regularly. >>> >>> My POV is that it's a fairly weak argument that this is a public interface, >>> but I don't consider it worth debating whether it is or not, because even >>> if it is, this improves stability of the database for all users, so it's >>> worth going in. Let's not be dogmatic about fixes that help 99% of users >>> because an incredibly small number that actually implement a custom query >>> handler will need to make a trivial update in order to use the latest 4.1.6 >>> dependency. >>> >>> Jon >>> >>> >>> >>> On Tue, Jul 30, 2024 at 8:09 AM J. D. Jordan <jeremiah.jor...@gmail.com> >>> wrote: >>>> >>>> Given we allow a pluggable query handler implementation to be specified >>>> for the server with a -D during startup. So I would consider the query >>>> handler one of our public interfaces. >>>> >>>>> On Jul 30, 2024, at 9:35 AM, Alex Petrov <al...@coffeenco.de> wrote: >>>>> >>>>> Hi Tommy, >>>>> >>>>> Thank you for spotting this and bringing this to community's attention. >>>>> >>>>> I believe our primary interfaces are native and internode protocol, and >>>>> CLI tools. Most interfaces are used to to abstract implementations >>>>> internally. Few interfaces, such as DataType, Partitioner, and Triggers >>>>> can be depended upon by external tools using Cassandra as a library. >>>>> There is no official way to plug in a QueryHandler, so I did not consider >>>>> it to be a part of our public API. >>>>> >>>>> From [1]: >>>>> >>>>> > These considerations are especially important for public APIs, >>>>> > including CQL, virtual tables, JMX, yaml, system properties, etc. Any >>>>> > planned additions must be carefully considered in the context of any >>>>> > existing APIs. Where possible the approach of any existing API should >>>>> > be followed. >>>>> >>>>> Maybe we should have an exhaustive list of public APIs, and explicitly >>>>> mention that native and internode protocols are included, alongside with >>>>> nodetool command API and output, but also which classes/interfaces >>>>> specifically should be evolved with care. >>>>> >>>>> Thank you, >>>>> --Alex >>>>> >>>>> [1] https://cassandra.apache.org/_/development/index.html >>>>> >>>>> On Tue, Jul 30, 2024, at 10:56 AM, Tommy Stendahl via dev wrote: >>>>>> Hi, >>>>>> >>>>>> There is a change in the QueryHandler interface introduced by >>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19534 >>>>>> >>>>>> Do we allow changes such changes between 4.1.5 and 4.1.6? >>>>>> CASSANDRA-19534 looks like a very good change so maybe there is an >>>>>> exception in this case? >>>>>> >>>>>> /Tommy >>>>>> >>>>>> -----Original Message----- >>>>>> *From*: Brandon Williams <brandonwilli...@apache.org >>>>>> <mailto:brandon%20williams%20%3cbrandonwilli...@apache.org%3e>> >>>>>> *Reply-To*: dev@cassandra.apache.org >>>>>> *To*: dev <dev@cassandra.apache.org >>>>>> <mailto:dev%20%3c...@cassandra.apache.org%3e>> >>>>>> *Subject*: [VOTE] Release Apache Cassandra 4.1.6 >>>>>> *Date*: Mon, 29 Jul 2024 09:36:04 -0500 >>>>>> >>>>>> Proposing the test build of Cassandra 4.1.6 for release. >>>>>> >>>>>> sha1: b662744af59f3a3dfbfeb7314e29fecb93abfd80 >>>>>> Git: >>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Ftree%2F4.1.6-tentative&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055937277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BWaJmvRTXvrMh%2FFBRzt%2FOost%2Bn6xAkgePP2ObtmTnbY%3D&reserved=0 >>>>>> >>>>>> Maven Artifacts: >>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachecassandra-1339%2Forg%2Fapache%2Fcassandra%2Fcassandra-all%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055947610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2baa1fUTwQqDpPtFAdv%2FFU6sqax3LSkKEm%2FUdbcHsbE%3D&reserved=0 >>>>>> >>>>>> >>>>>> The Source and Build Artifacts, and the Debian and RPM packages and >>>>>> repositories, are available here: >>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fcassandra%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055951106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9FUMT0F7c%2B0y7NbvgN9fQrSNgNO2YGfKMwk9ajy2MKA%3D&reserved=0 >>>>>> >>>>>> >>>>>> The vote will be open for 72 hours (longer if needed). Everyone who >>>>>> has tested the build is invited to vote. Votes by PMC members are >>>>>> considered binding. A vote passes if there are at least three binding >>>>>> +1s and no -1's. >>>>>> >>>>>> [1]: CHANGES.txt: >>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FCHANGES.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055954173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3u1LazTB3GixsR7MEwxT%2ByqMrnwHjBL72r8Vy0C1HhE%3D&reserved=0 >>>>>> >>>>>> [2]: NEWS.txt: >>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FNEWS.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055957376%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4TROx5HB5vJuLTYNoAqMx2A3%2FUUtZ3Edr6aa4JVvHEA%3D&reserved=0 >>>>>> >>>>>> >>>>>> >>>>>> Kind Regards, >>>>>> Brandon >>>>>