Or use extended Grapheme Cluster boundary "\\b{g}" instead of "\\b". This will correctly search emoji sequences such as 👨‍👩‍👧‍👧, while "\\b" with Unicode option won't.

HTH,
Naoto

On 12/15/23 11:29 AM, Stefan Norberg wrote:
Thanks Raffaello,
Ah, thanks! Found https://bugs.openjdk.org/browse/JDK-8264160 <https://bugs.openjdk.org/browse/JDK-8264160> in the release notes for 19 just now.
Have a great weekend!

/Stefan

On Fri, Dec 15, 2023 at 8:24 PM Raffaello Giulietti <raffaello.giulie...@oracle.com <mailto:raffaello.giulie...@oracle.com>> wrote:

    By default, a word boundary only considers ASCII letters and digits.
    See
    "Predefined character classes" in the documentation.

    To add Unicode support, you have a choice between adding a flag as a
    2nd
    argument to the compile() method

    Pattern p = Pattern.compile("(\\b" + word + "\\b)",
    Pattern.UNICODE_CHARACTER_CLASS);

    or add a flag in the regex pattern, as documented in "Special
    constructs
    (named-capturing and non-capturing)"

    Pattern p = Pattern.compile("(?U)(\\b" + word + "\\b)");


    Greetings
    Raffaello


    On 2023-12-15 20:07, Stefan Norberg wrote:
     > The following test works in 17 but fails in 19.0.2, and 21.0.1 on
    the
     > last assertion. Bug or feature?
     >
     > import org.junit.jupiter.api.Assertions;
     > import org.junit.jupiter.api.Test;
     >
     > import java.util.ArrayList;
     > import java.util.regex.Matcher;
     > import java.util.regex.Pattern;
     >
     > /**
     > * Tests passes in JDK 17 but fails in JDK 19, 21.
     > *
     > * The combination of a \b "word boundary" and a unicode char doesn't
     > seem to work in 19, 21.
     > *
     > */
     > public class UnicodeTest {
     > @Test
     > public void testRegexp() throws Exception {
     > var text = "En sak som ökas och sedan minskas. Bra va!";
     > var word = "ökas";
     > Assertions.assertTrue(text.contains(word));
     >
     > Pattern p = Pattern.compile("(\\b" + word + "\\b)");
     > Matcher m = p.matcher(text);
     > var matches = new ArrayList<>();
     >
     > while (m.find()) {
     > String matchString = m.group();
     > System.out.println(matchString);
     > matches.add(matchString);
     > }
     > Assertions.assertEquals(1, matches.size());
     > }
     > }
     >
     >
     >
     > openjdk version "21.0.1" 2023-10-17 LTS
     >
     > OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build
    21.0.1+12-LTS)
     >
     > OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS,
     > mixed mode, sharing)
     >
     >
     > System Version: macOS 14.2 (23C64)
     >
     > Kernel Version: Darwin 23.2.0
     >
     >
     > Thanks!
     >
     >
     > /Stefan
     >

Reply via email to