On Thu, 18 Jan 2024 20:08:27 GMT, Justin Lu <j...@openjdk.org> wrote:
>> Hi @justin-curtis-lu, >> >> Thanks a lot for taking a look, I am glad for any other set of eyes on this >> tricky stuff! >> >>> As it stands, it would be inconsistent to have the closing bracket quoted >>> and the opening bracket not quoted. >> >> You're right - the problem exists with both `{` and `}`, as is shown here >> (unmodified jshell): >> >> >> $ jshell >> | Welcome to JShell -- Version 17.0.9 >> | For an introduction type: /help intro >> >> jshell> import java.text.*; >> >> jshell> new MessageFormat("Test: {0,number,foo'{'#.00}"); >> $2 ==> java.text.MessageFormat@951c5b58 >> >> jshell> $2.toPattern() >> $3 ==> "Test: {0,number,foo{#.00}" >> >> jshell> new MessageFormat($3) >> | Exception java.lang.IllegalArgumentException: Unmatched braces in the >> pattern. >> | at MessageFormat.applyPattern (MessageFormat.java:521) >> | at MessageFormat.<init> (MessageFormat.java:371) >> | at (#4:1) >> >> >> I've been missing the forest for the trees a bit and I think the fix can be >> simpler now. >> >> For the record, here is my latest understanding of what's going on... >> >> 1. When `MessageFormat.toPattern()` constructs the combined pattern string, >> it concatenates the plain text bits, suitably escaped, and the pattern >> strings from each subformat. >> 1. The subformat strings are already escaped, in the sense that you can take >> (for example) a `ChoiceFormat` format string and use it as-is to recreate >> that same `ChoiceFormat`, but they are escaped _only for their own purposes_. >> 1. The original example is an example of where this matters - `ChoiceFormat` >> needs to escape `#`, `|`, etc., but doesn't need to escape `{` or `}` - but >> a `MessageFormat` format string does need to escape those. >> 1. Therefore, the correct fix is for `MessageFormat.toPattern()` to modify >> the subformat strings by quoting any _unquoted_ `{` or `}` characters while >> leaving any already-quoted characters alone. >> >> Updated in a9d78c76f5f. I've also updated the test so it does more thorough >> round-trip checks. > > Hi @archiecobbs , I think the recent commit is good progress. > > First off I think this change will need a CSR as there are behavioral > concerns, although minimal. Please let me know if you have access on JBS to > file one, otherwise I can file one for you. > >> Therefore, the correct fix is for MessageFormat.toPattern() to modify the >> subformat strings by quoting any unquoted {or } characters while leaving any >> already-quoted characters alone. > > Yes, I think this behavior allows for the String returned by `toPattern()` to > create a MessageFormat that can format equivalently to the original > MessageFormat. Although to clarify, the original String pattern will not be > guaranteed to be equivalent to the one returned by `toPattern()` as we are > adding quotes to all brackets in the subformatPattern, regardless if they > were quoted in the original String. I think the former is more important > here, so the latter is okay. > > For DecimalFormat and SimpleDateFormat subformats, adding quotes to brackets > found in the subformatPattern is always right, those subformats can not be > used to create recursive format behavior. Thus you would **never** except a > nested ArgumentIndex in the subformatPattern (ex: `"{0,number,{1}foo0.0"}`), > and can confidently escape all brackets found for these subFormats (which you > do). > > To me, ChoiceFormat is where there is some concern. Consider the following, > > `new MessageFormat("{0,choice,0# {1} {2} {3} }”)` > > With this change, invoking `toPattern()` on the created MessageFormat would > return > > `"{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' }"`. > > This would technically be incorrect. One would think instead of allowing 3 > nested elements, we are now printing the literal `{1} {2} {3}` since the > brackets are escaped. But, this is not actually the case. Escaping brackets > with a choice subFormat does not function properly. This is due to the fact > that a recursive messageFormat is created, but the pattern passed has already > lost the quotes. > > This means that `new MessageFormat("{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' > }")`. > > is still equivalent to `new MessageFormat("{0,choice,0# {1} {2} {3} }”).` > > So the behavior of quoting all brackets would still guarantee _the String > returned by `toPattern()` to create a MessageFormat that can format > equivalently to the original MessageFormat_ but only because the current > behavior of formatting with a choice subFormat is technically wrong. I think > this is okay, since this incorrect behavior is long-standing, but a CSR would > be good to ad... Hi @justin-curtis-lu, Thanks for your comments. > First off I think this change will need a CSR as there are behavioral > concerns, although minimal. Please let me know if you have access on JBS to > file one, otherwise I can file one for you. No problem, I'll add one. > To me, ChoiceFormat is where there is some concern. Consider the following, OK I think I understand what you're saying. The first goal here is to ensure the round trip operation `MessageFormat` → Pattern string from `toPattern()` → `MessageFormat` takes you back to where you started. The patch achieves this so we're good so far. I think we agree on this. > This would technically be incorrect. I'm not quite understanding what you meant there (and therefore slightly worried)... In my mind, a key observation here is that the following two things are not equal and therefore shouldn't be conflated: * A `ChoiceFormat` pattern string (e.g., what you get from invoking `ChoiceFormat.toPattern()`) * The `XXX` that you see inside a `MessageFormat` pattern string as part of a `{0,choice,XXX}` subformat. Those are not the same, even modulo quoting. As you point out, `MessageFormat` has this quirky logic ([here](https://github.com/openjdk/jdk/blob/81df265e41d393cdde87729e091dd465934071fd/src/java.base/share/classes/java/text/MessageFormat.java#L1327-L1333)) in which, after it evaluates a `ChoiceFormat` subformat, if the returned string contains a `{` character, then that string is magically reinterpreted as a `MessageFormat` pattern string and evaluated again, otherwise it's left alone. So the `XXX` is not a "`ChoiceFormat` pattern string" but something else, so it can't be "incorrect" for not looking like a normal `ChoiceFormat` pattern string. Maybe it's just a manner of speaking? In any case I want to be careful I'm not missing something in what you're saying. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1899206551