Ah, didn't realize P4 is default; that makes sense.
So I should not even be trying to derive omens from that.
So I guess only the assignee would know whether or not the status is closer to
"I was going to work on that next week" versus
"I totally forgot about that thing, and am about to forget about it again"
I'm quite sure he's on this list and will hopefully read the advocacy section 
of my email.

Um.  I feel awkward writing this paragraph because you know how OpenJDK works 
much better than I do, so it feels a bit silly to argue with you about it.  
But.  Um.
When you say "this is not the place to ask for fixes" ...
I was under the impression that "asking for fixes" actually does provide value, 
and not all of that value can be replaced by merely providing fixes.
In particular, asking for fixes gives maintainers a vague sense of how often 
people in the "real world" tend to run into an issue, which in turn informs how 
much "cost" is worth spending on addressing it.
(Where "cost" could mean things like "time" and also things like "this makes 
trickier and hence harder to maintain".)
In fact, I was under the impression that OpenJDK is slightly hostile to "big" 
fixes by "outsiders" because of the worry that there's now a big/complicated 
chunk of code that no one inside the project understands and yet the project is 
responsible for, and the original author might never be heard from again.

Anyway, thanks a bunch for responding; I was worried that no one would.

From: Philip Race <philip.r...@oracle.com>
Sent: Wednesday, May 22, 2024 11:54 AM
To: Yagnatinsky, Mark : IT (NYK) <mark.yagnatin...@barclays.com>; 
core-libs-dev@openjdk.org
Subject: Re: stack overflow in regex engine


CAUTION: This email originated from outside our organisation - 
philip.r...@oracle.com<mailto:philip.r...@oracle.com> Do not click on links, 
open attachments, or respond unless you recognize the sender and can validate 
the content is safe.
P4 is the default JBS priority, so sometimes it just means no one figured out 
the true priority.
But in general P4 bugs could be open for years, or even never get fixed.
The priority is also partially an assessment of where it falls as a priority 
for the JDK developers.
A user of JDK may have an entirely different perspective.
And that's why there are vendors who provide support for JDK. They can also 
arrange the backports you need.
But that's not done here. Here is where you come to participate and contribute 
fixes, not ask for fixes.
So my suggestion is to raise it via your support channel to your particular 
vendor who provided your binary.

-phil
On 5/21/24 8:46 PM, 
mark.yagnatin...@barclays.com<mailto:mark.yagnatin...@barclays.com> wrote:
(Sorry about my previous "do I need to subscribe?" email; in retrospect that 
was needless noise.)
The purpose of this email is twofold: first, inquire about the status of ticket 
filed a few years ago, and second to point out some non-obvious reasons why it 
might be slightly more serious than it seems.
The ticket is this one 
https://bugs.openjdk.org/browse/JDK-8260866<https://clicktime.symantec.com/15t5ekSGXorRH53n7q6GJ?h=e9ZmDJOAdCkeHz_PXjDgZiyUdvJmTZTTcGvZoAULMmE=&u=https://bugs.openjdk.org/browse/JDK-8260866>
 (stack overflow in regex matching quantified alternation)
The priority is listed as P4, which I guess means something like "medium" (more 
important than p5, but less than p3)
It also has a specific person assigned, which seems vaguely encouraging, but no 
updates at all in the years since it's been created, which seems less 
encouraging.
It was seemingly never once discussed on this mailing list, not even when it 
was first filed.
As an outsider, I'm not quite sure how to interpret all these various omens and 
turn them into guesses about its eventual fate.
Will it remain unfixed for another decade or two?  Will it be fixed in a few 
months, but then never backported to old versions?  Something else?  No one 
knows?

That concludes the status inquiry.  Now on to the advocacy.  Some bugs are 
annoying, but once you hit them, you can work around them by changing your code 
so it does not trigger the bug.
Note the phrase "your code" above.  This is much more awkward to do if the bug 
triggered by third-party code you got from maven central or something.
At that point your options are to either ask the third party library to work 
around it, or else fork the dependency (which is not well supported by 
mainstream build tools (or maybe I'm just using them wrong)).
In this case, regular expressions are so ubiquitous that the bug is quite 
plausibly more likely to be triggered by some third party dependency than by 
code you own.
That was the case for me today: after spending hours trying to track down a 
stack overflow error I found the offending regex in a third party library.
The good news is that for the kinds of inputs we need to handle, it is indeed 
easy to substitute a much simpler regex that would avoid the issue.
The bad news is that it's not my code, so I can't.  I could petition the 
maintainers of the library, but this is not great because:
First, maybe the version I'm on is not longer even supported, and newer 
versions are not compatible,
Second, it may take them a while to fix it, and third, it is wasteful (and 
inelegant) to have workarounds slowly percolate throughout the Java ecosystem 
instead of fixing the problem at the root.

The other annoying thing here is that even when you have "enough" stack space 
to avoid crashing, using it may not be quite "free".
For instance, project loom's foundational premise seems to be that "most 
threads have oversized stacks; we can have more threads if we start off with 
small stacks and grow them only when needed".
This would be false when the thread in question uses a regex with quantified 
alternation.
(Since many Loom threads will be based on the same Runnable, it's a pretty safe 
bet that if one of them uses this feature, many will, so you can't assume it 
will "average out".)
There are other reasons besides loom to be low on stack space; maybe you're 
using some crazy framework(s) that like(s) to have call stacks that are crazy 
deep.
Or maybe you're running with -Xss set pretty low.  Or you passed a small value 
for stack space to the Thread constructor.
Or maybe none of these things are true, but in most operating systems a thread 
stack costs "real" memory in proportion to its high-watermark, so even a SINGLE 
heavy regex in the lifetime of a thread is tantamount to a memory leak of 
hundreds of kilobytes.

Practicalities aside, I don't like it when code consumes "surprising" types of 
resources, or surprising amounts of them.
For instance, you wouldn't expect a sorting function to spawn threads behind 
your back, unless it was called "parallel sort" or something like that.
You wouldn't expect it to allocate multi-gigabyte arrays, nor to perform I/O.
Similarly, most functions need only O(1) stack space, so this tends to be the 
default assumption unless the docs explicitly call out "this thing might throw 
stack overflows at you so make sure you have plenty of stack space"
Some need a bit more... for instance, I would not be surprised if a regex need 
stack space in proportion to the depth of the parse tree of the regex.
But stack space in proportion to the length of the string being matched is the 
kind of thing that I'd hope gets called out in those @implNotes thingies, or 
better yet fixed.

Even people who know that regex matching can sometimes take exponential time 
may naively assume that regex matching would not consume O(n) stack space, 
where n is the input length.
What's worse, not only does it indeed consume stack space linear in the length 
of the input, but the constant hidden by the O() notation is itself pretty 
scary.
For instance, consider the regex that caused my troubles today:
https://github.com/apache/camel/blob/main/core/camel-support/src/main/java/org/apache/camel/support/ObjectHelper.java#L63<https://clicktime.symantec.com/15t5jadYzRY1h1shfPVQv?h=nT81oCo1qZ8nsQ8sI9SyBtH8DOuudlSAMaXkeKhYmgU=&u=https://github.com/apache/camel/blob/main/core/camel-support/src/main/java/org/apache/camel/support/ObjectHelper.java%23L63>
After getting rid of extra escaping and also double-escaping caused by java not 
having "raw" strings, we're left with this:
,(?!(?:[^(,]|[^)],[^)])+\))
(I find the above hard to read; the regex I would have replaced it with, if it 
had been "our" code is simply a single comma.)
Anyway, I tried creating a Scanner with the delimiter above and looping through 
all the tokens in the string that originally caused the crash.
I thought that perhaps it would work, since I had a simple example that does 
everything in main, but it also crashed.
Then I decided to play an alternating game where I trimmed the string until it 
stopped crashing, then lowered Xss by 64k and repeated.
Eventually, I got it crashing with a call stack well over 500 calls deep on a 
string less than a 128 characters long.
(The string was not hand-crafted; it was simply a prefix of the original string 
that caused the first crash I tracked down.)
The string in question had a mere five tokens, which is to say that it had just 
four commas.
It had no open or close parenthesis, so the entire negative lookahead assertion 
served as a giant no-op, at least when it wasn't crashing.
(Technically, the stack usage is linear in the length of the input AFTER the 
first comma, but the first comma was pretty early.)

Sorry if this email is poorly organized; I've already spent way too many hours 
on it (not even counting the debugging that prompted it) and I need to get some 
sleep now.

If anyone actually reads all or most of this, thank you.
Mark.

P.S. if anyone actually responds, thank you even more.

This message is for information purposes only. It is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service, nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is intended for the recipient(s) only. It is not directed at 
retail customers. This message is subject to the terms at: 
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html<https://clicktime.symantec.com/15t69jbyHVwykk1KPBVA2?h=WSydRJ-8a9jEVWGSLyCrdLEU7Xfx-K-gu16DAstEYWQ=&u=https://www.ib.barclays/disclosures/web-and-email-disclaimer.html>.

For important disclosures, please see: 
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html<https://clicktime.symantec.com/15t64uQgptGPLoBPqd61Q?h=sYkJo73WS5C5wTtskVoUQEfn7gI-sb4yDI0khVoYK3Q=&u=https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html>
 regarding marketing commentary from Barclays Sales and/or Trading desks, who 
are active market participants; 
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html<https://clicktime.symantec.com/15t5pQpqT3Dc6xhdCwtZY?h=qVJHSoTdp0pI-_4TT9h4U49uHhqWqUQdMGYEdhG-Ouo=&u=https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html>
 regarding our standard terms for Barclays Investment Bank where we trade with 
you in principal-to-principal wholesale markets transactions; and in respect to 
Barclays Research, including disclosures relating to specific issuers, see: 
http://publicresearch.barclays.com<https://clicktime.symantec.com/15t5ZvEz5CAps8DraGh7g?h=87QBG12g6Fm-478KIe1pp-nBD10MhX6JgAq8TwQi770=&u=http://publicresearch.barclays.com>.
__________________________________________________________________________________
If you are incorporated or operating in Australia, read these important 
disclosures: 
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html<https://clicktime.symantec.com/15t5uF27ueuCWuXYkWHiA?h=GPXVMoOv512jLvxDhIJnWdewKcbTe5uGye3MLfO8Uxc=&u=https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html>.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: 
https://www.ib.barclays/disclosures/personal-information-use.html<https://clicktime.symantec.com/15t5z5DQNGanvrMUJ4grn?h=zIClmLbPkrAGRja2m5HovRZhDKBBGmDTQHvE9kjAkxQ=&u=https://www.ib.barclays/disclosures/personal-information-use.html>.
__________________________________________________________________________________


This message is for information purposes only. It is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service, nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is intended for the recipient(s) only. It is not directed at 
retail customers. This message is subject to the terms at: 
https://www.ib.barclays/disclosures/web-and-email-disclaimer.html. 

For important disclosures, please see: 
https://www.ib.barclays/disclosures/sales-and-trading-disclaimer.html regarding 
marketing commentary from Barclays Sales and/or Trading desks, who are active 
market participants; 
https://www.ib.barclays/disclosures/barclays-global-markets-disclosures.html 
regarding our standard terms for Barclays Investment Bank where we trade with 
you in principal-to-principal wholesale markets transactions; and in respect to 
Barclays Research, including disclosures relating to specific issuers, see: 
http://publicresearch.barclays.com.
__________________________________________________________________________________
 
If you are incorporated or operating in Australia, read these important 
disclosures: 
https://www.ib.barclays/disclosures/important-disclosures-asia-pacific.html.
__________________________________________________________________________________
For more details about how we use personal information, see our privacy notice: 
https://www.ib.barclays/disclosures/personal-information-use.html. 
__________________________________________________________________________________

Reply via email to