On Thu, 18 Jan 2024 20:08:27 GMT, Justin Lu <j...@openjdk.org> wrote:

>> Hi @justin-curtis-lu,
>> 
>> Thanks a lot for taking a look, I am glad for any other set of eyes on this 
>> tricky stuff!
>> 
>>> As it stands, it would be inconsistent to have the closing bracket quoted 
>>> and the opening bracket not quoted.
>> 
>> You're right - the problem exists with both `{` and `}`, as is shown here 
>> (unmodified jshell):
>> 
>> 
>> $ jshell
>> |  Welcome to JShell -- Version 17.0.9
>> |  For an introduction type: /help intro
>> 
>> jshell> import java.text.*;
>> 
>> jshell> new MessageFormat("Test: {0,number,foo'{'#.00}");
>> $2 ==> java.text.MessageFormat@951c5b58
>> 
>> jshell> $2.toPattern()
>> $3 ==> "Test: {0,number,foo{#.00}"
>> 
>> jshell> new MessageFormat($3)
>> |  Exception java.lang.IllegalArgumentException: Unmatched braces in the 
>> pattern.
>> |        at MessageFormat.applyPattern (MessageFormat.java:521)
>> |        at MessageFormat.<init> (MessageFormat.java:371)
>> |        at (#4:1)
>> 
>> 
>> I've been missing the forest for the trees a bit and I think the fix can be 
>> simpler now.
>> 
>> For the record, here is my latest understanding of what's going on...
>> 
>> 1. When `MessageFormat.toPattern()` constructs the combined pattern string, 
>> it concatenates the plain text bits, suitably escaped, and the pattern 
>> strings from each subformat.
>> 1. The subformat strings are already escaped, in the sense that you can take 
>> (for example) a `ChoiceFormat` format string and use it as-is to recreate 
>> that same `ChoiceFormat`, but they are escaped _only for their own purposes_.
>> 1. The original example is an example of where this matters - `ChoiceFormat` 
>> needs to escape `#`, `|`, etc., but doesn't need to escape `{` or `}` - but 
>> a `MessageFormat` format string does need to escape those.
>> 1. Therefore, the correct fix is for `MessageFormat.toPattern()` to modify 
>> the subformat strings by quoting any _unquoted_ `{` or `}` characters while 
>> leaving any already-quoted characters alone.
>> 
>> Updated in a9d78c76f5f. I've also updated the test so it does more thorough 
>> round-trip checks.
>
> Hi @archiecobbs , I think the recent commit is good progress.
> 
> First off I think this change will need a CSR as there are behavioral 
> concerns, although minimal. Please let me know if you have access on JBS to 
> file one, otherwise I can file one for you.
> 
>> Therefore, the correct fix is for MessageFormat.toPattern() to modify the 
>> subformat strings by quoting any unquoted {or } characters while leaving any 
>> already-quoted characters alone.
> 
> Yes, I think this behavior allows for the String returned by `toPattern()` to 
> create a MessageFormat that can format equivalently to the original 
> MessageFormat. Although to clarify, the original String pattern will not be 
> guaranteed to be equivalent to the one returned by `toPattern()` as we are 
> adding quotes to all brackets in the subformatPattern, regardless if they 
> were quoted in the original String. I think the former is more important 
> here, so the latter is okay.
> 
> For DecimalFormat and SimpleDateFormat subformats, adding quotes to brackets 
> found in the subformatPattern is always right, those subformats can not be 
> used to create recursive format behavior. Thus you would **never** except a 
> nested ArgumentIndex in the subformatPattern (ex: `"{0,number,{1}foo0.0"}`), 
> and can confidently escape all brackets found for these subFormats (which you 
> do).
> 
> To me, ChoiceFormat is where there is some concern. Consider the following,
> 
> `new MessageFormat("{0,choice,0# {1} {2} {3} }”)`
> 
> With this change, invoking `toPattern()` on the created MessageFormat would 
> return 
> 
> `"{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' }"`.
> 
> This would technically be incorrect. One would think instead of allowing 3 
> nested elements, we are now printing the literal `{1} {2} {3}` since the 
> brackets are escaped. But, this is not actually the case. Escaping brackets 
> with a choice subFormat does not function properly. This is due to the fact 
> that a recursive messageFormat is created, but the pattern passed has already 
> lost the quotes.
> 
> This means that `new MessageFormat("{0,choice,0.0# '{'1'}' '{'2'}' '{'3'}' 
> }")`.
> 
> is still equivalent to `new MessageFormat("{0,choice,0# {1} {2} {3} }”).`
> 
> So the behavior of quoting all brackets would still guarantee _the String 
> returned by `toPattern()` to create a MessageFormat that can format 
> equivalently to the original MessageFormat_ but only because the current 
> behavior of formatting with a choice subFormat is technically wrong. I think 
> this is okay, since this incorrect behavior is long-standing, but a CSR would 
> be good to ad...

Hi @justin-curtis-lu,

Thanks for your comments.

> First off I think this change will need a CSR as there are behavioral 
> concerns, although minimal. Please let me know if you have access on JBS to 
> file one, otherwise I can file one for you.

No problem, I'll add one.

> To me, ChoiceFormat is where there is some concern. Consider the following,

OK I think I understand what you're saying.

The first goal here is to ensure the round trip operation `MessageFormat` → 
Pattern string from `toPattern()` → `MessageFormat` takes you back to where you 
started.

The patch achieves this so we're good so far. I think we agree on this.

> This would technically be incorrect.

I'm not quite understanding what you meant there (and therefore slightly 
worried)...

In my mind, a key observation here is that the following two things are not 
equal and therefore shouldn't be conflated:
* A `ChoiceFormat` pattern string (e.g., what you get from invoking 
`ChoiceFormat.toPattern()`)
* The `XXX` that you see inside a `MessageFormat` pattern string as part of a 
`{0,choice,XXX}` subformat.

Those are not the same, even modulo quoting. As you point out, `MessageFormat` 
has this quirky logic 
([here](https://github.com/openjdk/jdk/blob/81df265e41d393cdde87729e091dd465934071fd/src/java.base/share/classes/java/text/MessageFormat.java#L1327-L1333))
 in which, after it evaluates a `ChoiceFormat` subformat, if the returned 
string contains a `{` character, then that string is magically reinterpreted as 
a `MessageFormat` pattern string and evaluated again, otherwise it's left alone.

So the `XXX` is not a "`ChoiceFormat` pattern string" but something else, so it 
can't be "incorrect" for not looking like a normal `ChoiceFormat` pattern 
string.

Maybe it's just a manner of speaking? In any case I want to be careful I'm not 
missing something in what you're saying.

Thanks.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/17416#issuecomment-1899206551

Reply via email to