> If we have a path that resolves the issue and also maintains full 
> compatibility for this (semi- / reluctantly-accessible) interface, that would 
> seem ideal. Interested to learn more about the drawbacks to that approach.

My thinking here was that people who might have a binary dependency on this 
interface have to recompile their code, they may as well change 2 lines by 
adding a call to from the new method with `requestTime.startedAtNanos()`. I am 
not in a strong opposition to merging it though. If there is general agreement 
that this is the best way, let's do this: I do not see any drawbacks in terms 
of performance or otherwise. 

If we decide to move forward with, it, the patch is up [1]. 

[1] https://issues.apache.org/jira/browse/CASSANDRA-19811

On Wed, Jul 31, 2024, at 11:24 PM, C. Scott Andreas wrote:
> Sorry to veer off from a vote in a vote thread.
> 
> @Alex, can you say more about this statement:
> 
> > "I think I would prefer to not introduce the change I have proposed (the 
> > one that would bring back non-binary compatibility)."
> 
> If we have a path that resolves the issue and also maintains full 
> compatibility for this (semi- / reluctantly-accessible) interface, that would 
> seem ideal. Interested to learn more about the drawbacks to that approach.
> 
> Regarding the value of C-19534 I'm happy to attest to the fact that it 
> addresses severe metastable failure modes in clusters under heavy traffic on 
> the verge of tipping. Jon Haddad's independent testing validated this as 
> discussed on the ticket as well: 
> https://issues.apache.org/jira/browse/CASSANDRA-19534
> 
> Last, @Tommy this is a great catch and I'm glad you raised it. Thanks for 
> watching so closely and appreciate you bringing it to everyone's attention.
> 
> – Scott
> 
>> On Jul 31, 2024, at 1:05 PM, Caleb Rackliffe <calebrackli...@gmail.com> 
>> wrote:
>> 
>> 
>> +1 to proceeding with a simple upgrade note in NEWS
>> 
>> On Wed, Jul 31, 2024 at 12:50 PM Josh McKenzie <jmcken...@apache.org> wrote:
>>> __
>>>> Unfortunately, I can not immediately see a good way to provide the 
>>>> critical bugfix of CASSANDRA-19534 
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-19534>, affecting all 
>>>> Cassandra users, without making at least some change in this API. 
>>>> 
>>>> I personally think that this method is very tightly coupled to the 
>>>> implementation to expose it via -D. If anyone using it could provide some 
>>>> context about why it is an important part of API, it would be give some 
>>>> useful context.
>>> Nobody stepping up to engage on the technical piece of this? Unless / until 
>>> somebody does, Alex' argument holds the most weight as the expert with 
>>> what's going on IMO.
>>> 
>>> The question we're facing is - when we find a defect that requires a change 
>>> in a public facing API, which of the following 2 is more important:
>>>  1. Keeping the API stable
>>>  2. Having the defect resolved
>>> Obviously this will be case-by-case. What CASSANDRA-19534 addresses:
>>>> When a node is under pressure, hundreds of thousands of requests can show 
>>>> up in the native transport queue, and it looks like it can take way longer 
>>>> to timeout than is configured. 
>>>> ...
>>>> After stopping the load test altogether, it took nearly a minute before 
>>>> the requests were no longer queued.
>>> I believe our priority here should be having this defect resolved.
>>> 
>>> On Tue, Jul 30, 2024, at 1:43 PM, Jordan West wrote:
>>>> I would make the case that loss of availability / significant performance 
>>>> issue, regardless of the amount of time it has existed for, is worth 
>>>> fixing on the branches that are widely deployed by the community. 
>>>> Especially when weighed against a loosely defined public interface issue. 
>>>> 
>>>> The queuing issue has been a persistent problem (like you said 10 years) 
>>>> and I regularly (approx once every 1-2 weeks) have to tell my customers 
>>>> “we either have to wait for Cassandra to clear the queues or do a rolling 
>>>> restart to fix it” both which come at a cost during an incident where a 
>>>> client overloaded the DB and the impact is severe or business impacting. 
>>>> Especially for customers doing LWTs or using non-standard RFs which are 
>>>> also more prevalent in my experience than an external implementation of 
>>>> QueryHandler. 
>>>> 
>>>> While not data loss, I would argue this is a critical bug and if we did 
>>>> find a data loss issue dormant for 10 years (which has happened in the 
>>>> past) we would fix it as soon as it was found and a patch was made 
>>>> available on all actively maintained versions. 
>>>> 
>>>> Jordan 
>>>> 
>>>> On Tue, Jul 30, 2024 at 10:30 Jeff Jirsa <jji...@gmail.com> wrote:
>>>>> 
>>>>> It’s a 10 year old flaw in an 18 month old branch. Why does it need to go 
>>>>> into 4.1, it’s not a regression and it clearly breaks compatibility? 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Jul 30, 2024, at 8:52 AM, Jon Haddad <j...@jonhaddad.com> wrote:
>>>>>> 
>>>>>> This patch fixes a long standing issue that's the root cause of 
>>>>>> availability failures.  Even though folks can specify a custom query 
>>>>>> handler with the -D flag, the number of users impacted by this is going 
>>>>>> to be incredibly small.  On the other hand, the fix helps every single 
>>>>>> user of 4.1 that puts too much pressure on the cluster, which happens 
>>>>>> fairly regularly.  
>>>>>> 
>>>>>> My POV is that it's a fairly weak argument that this is a public 
>>>>>> interface, but I don't consider it worth debating whether it is or not, 
>>>>>> because even if it is, this improves stability of the database for all 
>>>>>> users, so it's worth going in.  Let's not be dogmatic about fixes that 
>>>>>> help 99% of users because an incredibly small number that actually 
>>>>>> implement a custom query handler will need to make a trivial update in 
>>>>>> order to use the latest 4.1.6 dependency.
>>>>>> 
>>>>>> Jon
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 30, 2024 at 8:09 AM J. D. Jordan <jeremiah.jor...@gmail.com> 
>>>>>> wrote:
>>>>>>> 
>>>>>>> Given we allow a pluggable query handler implementation to be specified 
>>>>>>> for the server with a -D during startup. So I would consider the query 
>>>>>>> handler one of our public interfaces.
>>>>>>> 
>>>>>>>> On Jul 30, 2024, at 9:35 AM, Alex Petrov <al...@coffeenco.de> wrote:
>>>>>>>> 
>>>>>>>> Hi Tommy,
>>>>>>>> 
>>>>>>>> Thank you for spotting this and bringing this to community's attention.
>>>>>>>> 
>>>>>>>> I believe our primary interfaces are native and internode protocol, 
>>>>>>>> and CLI tools. Most interfaces are used to to abstract implementations 
>>>>>>>> internally. Few interfaces, such as DataType, Partitioner, and 
>>>>>>>> Triggers can be depended upon by external tools using Cassandra as a 
>>>>>>>> library. There is no official way to plug in a QueryHandler, so I did 
>>>>>>>> not consider it to be a part of our public API. 
>>>>>>>> 
>>>>>>>> From [1]: 
>>>>>>>> 
>>>>>>>> > These considerations are especially important for public APIs, 
>>>>>>>> > including CQL, virtual tables, JMX, yaml, system properties, etc. 
>>>>>>>> > Any planned additions must be carefully considered in the context of 
>>>>>>>> > any existing APIs. Where possible the approach of any existing API 
>>>>>>>> > should be followed. 
>>>>>>>> 
>>>>>>>> Maybe we should have an exhaustive list of public APIs, and explicitly 
>>>>>>>> mention that native and internode protocols are included, alongside 
>>>>>>>> with nodetool command API and output, but also which 
>>>>>>>> classes/interfaces specifically should be evolved with care.
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> --Alex
>>>>>>>> 
>>>>>>>> [1] https://cassandra.apache.org/_/development/index.html
>>>>>>>> 
>>>>>>>> On Tue, Jul 30, 2024, at 10:56 AM, Tommy Stendahl via dev wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> There is a change in the QueryHandler interface introduced by 
>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-19534
>>>>>>>>> 
>>>>>>>>> Do we allow changes such changes between 4.1.5 and 4.1.6?
>>>>>>>>> CASSANDRA-19534 looks like a very good change so maybe there is an 
>>>>>>>>> exception in this case?
>>>>>>>>> 
>>>>>>>>> /Tommy
>>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> *From*: Brandon Williams <brandonwilli...@apache.org 
>>>>>>>>> <mailto:brandon%20williams%20%3cbrandonwilli...@apache.org%3e>>
>>>>>>>>> *Reply-To*: dev@cassandra.apache.org
>>>>>>>>> *To*: dev <dev@cassandra.apache.org 
>>>>>>>>> <mailto:dev%20%3c...@cassandra.apache.org%3e>>
>>>>>>>>> *Subject*: [VOTE] Release Apache Cassandra 4.1.6
>>>>>>>>> *Date*: Mon, 29 Jul 2024 09:36:04 -0500
>>>>>>>>> 
>>>>>>>>> Proposing the test build of Cassandra 4.1.6 for release.
>>>>>>>>> 
>>>>>>>>> sha1: b662744af59f3a3dfbfeb7314e29fecb93abfd80
>>>>>>>>> Git: 
>>>>>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Ftree%2F4.1.6-tentative&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055937277%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=BWaJmvRTXvrMh%2FFBRzt%2FOost%2Bn6xAkgePP2ObtmTnbY%3D&reserved=0
>>>>>>>>> 
>>>>>>>>> Maven Artifacts:
>>>>>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Frepositories%2Forgapachecassandra-1339%2Forg%2Fapache%2Fcassandra%2Fcassandra-all%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055947610%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=2baa1fUTwQqDpPtFAdv%2FFU6sqax3LSkKEm%2FUdbcHsbE%3D&reserved=0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The Source and Build Artifacts, and the Debian and RPM packages and
>>>>>>>>> repositories, are available here:
>>>>>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdist.apache.org%2Frepos%2Fdist%2Fdev%2Fcassandra%2F4.1.6%2F&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055951106%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=9FUMT0F7c%2B0y7NbvgN9fQrSNgNO2YGfKMwk9ajy2MKA%3D&reserved=0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> The vote will be open for 72 hours (longer if needed). Everyone who
>>>>>>>>> has tested the build is invited to vote. Votes by PMC members are
>>>>>>>>> considered binding. A vote passes if there are at least three binding
>>>>>>>>> +1s and no -1's.
>>>>>>>>> 
>>>>>>>>> [1]: CHANGES.txt:
>>>>>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FCHANGES.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055954173%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3u1LazTB3GixsR7MEwxT%2ByqMrnwHjBL72r8Vy0C1HhE%3D&reserved=0
>>>>>>>>> 
>>>>>>>>> [2]: NEWS.txt: 
>>>>>>>>> https://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fcassandra%2Fblob%2F4.1.6-tentative%2FNEWS.txt&data=05%7C02%7Ctommy.stendahl%40ericsson.com%7C30a819344e48491e561908dcafdbddf4%7C92e84cebfbfd47abbe52080c6b87953f%7C0%7C0%7C638578606055957376%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=4TROx5HB5vJuLTYNoAqMx2A3%2FUUtZ3Edr6aa4JVvHEA%3D&reserved=0
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Kind Regards,
>>>>>>>>> Brandon
>>>>>>>> 
>>> 
> 

Reply via email to