Or use extended Grapheme Cluster boundary "\\b{g}" instead of "\\b".
This will correctly search emoji sequences such as 👨👩👧👧, while
"\\b" with Unicode option won't.
HTH,
Naoto
On 12/15/23 11:29 AM, Stefan Norberg wrote:
Thanks Raffaello,
Ah, thanks! Found https://bugs.openjdk.org/browse/JDK-8264160
<https://bugs.openjdk.org/browse/JDK-8264160> in the release notes for
19 just now.
Have a great weekend!
/Stefan
On Fri, Dec 15, 2023 at 8:24 PM Raffaello Giulietti
<raffaello.giulie...@oracle.com <mailto:raffaello.giulie...@oracle.com>>
wrote:
By default, a word boundary only considers ASCII letters and digits.
See
"Predefined character classes" in the documentation.
To add Unicode support, you have a choice between adding a flag as a
2nd
argument to the compile() method
Pattern p = Pattern.compile("(\\b" + word + "\\b)",
Pattern.UNICODE_CHARACTER_CLASS);
or add a flag in the regex pattern, as documented in "Special
constructs
(named-capturing and non-capturing)"
Pattern p = Pattern.compile("(?U)(\\b" + word + "\\b)");
Greetings
Raffaello
On 2023-12-15 20:07, Stefan Norberg wrote:
> The following test works in 17 but fails in 19.0.2, and 21.0.1 on
the
> last assertion. Bug or feature?
>
> import org.junit.jupiter.api.Assertions;
> import org.junit.jupiter.api.Test;
>
> import java.util.ArrayList;
> import java.util.regex.Matcher;
> import java.util.regex.Pattern;
>
> /**
> * Tests passes in JDK 17 but fails in JDK 19, 21.
> *
> * The combination of a \b "word boundary" and a unicode char doesn't
> seem to work in 19, 21.
> *
> */
> public class UnicodeTest {
> @Test
> public void testRegexp() throws Exception {
> var text = "En sak som ökas och sedan minskas. Bra va!";
> var word = "ökas";
> Assertions.assertTrue(text.contains(word));
>
> Pattern p = Pattern.compile("(\\b" + word + "\\b)");
> Matcher m = p.matcher(text);
> var matches = new ArrayList<>();
>
> while (m.find()) {
> String matchString = m.group();
> System.out.println(matchString);
> matches.add(matchString);
> }
> Assertions.assertEquals(1, matches.size());
> }
> }
>
>
>
> openjdk version "21.0.1" 2023-10-17 LTS
>
> OpenJDK Runtime Environment Corretto-21.0.1.12.1 (build
21.0.1+12-LTS)
>
> OpenJDK 64-Bit Server VM Corretto-21.0.1.12.1 (build 21.0.1+12-LTS,
> mixed mode, sharing)
>
>
> System Version: macOS 14.2 (23C64)
>
> Kernel Version: Darwin 23.2.0
>
>
> Thanks!
>
>
> /Stefan
>