Hi Claes, Looks good to me. Thanks for catching this on so quickly!
Naoto On 5/31/19 5:13 PM, Claes Redestad wrote:
Hi, recent Unicode 12.1 updates caused a noticeable regression to Mac OS X build times. Quoting Naoto: "The regression was caused by the call to Grapheme.nextBoundary() in NFCCharProperty.match() method, which got slower with the fix to JDK-8221431 / JDK-8222978 (Unicode 12.1 / Grapheme 12.0 support). The purpose of issuing nextBoundary() is to detect whether to call (much heavy weight) Normalizer.normalize() call or not. Since this fast check does not require fully fledged boundary detection, including stateful segmentation check such as Emoji sequence, simply checking the break possibility between two code points as before should suffice. Suggested fix is to bring back the isBoundary(cp1, cp2) method from the previous revision in Grapheme.java, and issue it only from NFCCharProperty.match() method for the fast check." Bug: https://bugs.openjdk.java.net/browse/JDK-8225061 Webrev: http://cr.openjdk.java.net/~redestad/8225061/open.01/ While narrowing this down, I created a couple of microbenchmarks and experimented with a sequence of optimizations that got the regression of using the heavier nextBoundary() check down from about 300x to just about 2x as costly as before JDK-8221431. These improvements were then bypassed by reverting to isBoundary in some micros, but still helps a lot in other cases that has taken a toll from making the grapheme logic more complete/correct, so I'd like to leave them in. Testing: tier1-3, verified a 300x speedup in the complex Pattern.CANON_EQ micro, and a 2x speedup on the simpler Grapheme/\\b{g} micro. Thanks! /Claes