> Please consider this fix to ensure that going from `MessageFormat` to pattern > string via `toPattern()` and then back via `new MessageFormat()` results in a > format that is equivalent to the original. > > The quoting and escaping rules for `MessageFormat` pattern strings are really > tricky. I admit not completely understanding them. At a high level, they work > like this: The normal way one would "nest" strings containing special > characters is with straightforward recursive escaping like with the `bash` > command line. For example, if you want to echo `a "quoted string" example` > then you enter `echo "a "quoted string" example"`. With this scheme it's > always the "outer" layer's job to (un)escape special characters as needed. > That is, the echo command never sees the backslash characters. > > In contrast, with `MessageFormat` and friends, nested subformat pattern > strings are always provided "pre-escaped". So to build an "outer" string > (e.g., for `ChoiceFormat`) the "inner" subformat pattern strings are more or > less just concatenated, and then only the `ChoiceFormat` option separator > characters (e.g., `<`, `#`, `|`, etc.) are escaped. > > The "pre-escape" escaping algorithm escapes `{` characters, because `{` > indicates the beginning of a format argument. However, it doesn't escape `}` > characters. This is OK because the format string parser treats any "extra" > closing braces (where "extra" means not matching an opening brace) as plain > characters. > > So far, so good... at least, until a format string containing an extra > closing brace is nested inside a larger format string, where the extra > closing brace, which was previously "extra", can now suddenly match an > opening brace in the outer pattern containing it, thus truncating it by > "stealing" the match from some subsequent closing brace. > > An example is the `MessageFormat` string `"{0,choice,0.0#option A: > {1}|1.0#option B: {1}'}'}"`. Note the second option format string has a > trailing closing brace in plain text. If you create a `MessageFormat` with > this string, you see a trailing `}` only with the second option. > > However, if you then invoke `toPattern()`, the result is > `"{0,choice,0.0#option A: {1}|1.0#option B: {1}}}"`. Oops, now because the > "extra" closing brace is no longer quoted, it matches the opening brace at > the beginning of the string, and the following closing brace, which was the > previous match, is now just plain text in the outer `MessageFormat` string. > > As a result, invoking `f.format(new Object{} { 0, 5 })` will retur...
Archie Cobbs has updated the pull request incrementally with four additional commits since the last revision: - Add a couple more pattern test cases suggested in review. - Update unit test @summary to better reflect what it tests. - Use curly braces consistently in patch. - Update copyright year in modified unit test files. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17416/files - new: https://git.openjdk.org/jdk/pull/17416/files/76e3d6ef..59e88126 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17416&range=06-07 Stats: 13 lines in 4 files changed: 6 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17416.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17416/head:pull/17416 PR: https://git.openjdk.org/jdk/pull/17416