Re: RFR: 8291660: Grapheme support in BreakIterator [v4]

Stuart Marks Wed, 07 Sep 2022 16:24:35 -0700

On Fri, 26 Aug 2022 21:48:14 GMT, Naoto Sato <na...@openjdk.org> wrote:


>> This is to enhance the character break analysis in `java.text.BreakIterator` 
>> to conform to the extended grapheme cluster boundaries defined in 
>> https://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundaries. A 
>> corresponding CSR has also been drafted, as there will be behavioral changes 
>> with this modification.
>
> Naoto Sato has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Changed the paragraph to @implSpec

src/java.base/share/classes/jdk/internal/util/regex/Grapheme.java line 47:

> 45:      */
> 46:     public static int nextBoundary(CharSequence src, int off, int limit) {
> 47:         Objects.checkFromToIndex(0, limit - off, src.length());

Is this right? The old code's use of `checkFromToIndex` method seems to be the 
right way to check that `off` and `limit` are a valid from-to range within `[0, 
src.length)`. The new code subtracts `off` from both args but the arithmetic 
seems to allow for some errors. For example, depending on the value of `limit`, 
this might permit `off` to be a small negative number.

src/java.base/share/classes/sun/util/locale/provider/BreakIteratorProviderImpl.java
 line 135:

> 133:     public BreakIterator getCharacterInstance(Locale locale) {
> 134:         return new GraphemeBreakIterator();
> 135:     }

It looks like there is some kind of table Since CHARACTER_INDEX is no longer 
used, does it mean there is now dead code for the CHARACTER break iterator 
class, and dead resources for CharacterData and CharacterDictionary? Should 
this be removed? Or maybe this is all in each locale or something and should be 
cleaned up later?

-------------

PR: https://git.openjdk.org/jdk/pull/9991

Re: RFR: 8291660: Grapheme support in BreakIterator [v4]

Reply via email to