Hi, I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list.
I created the regex pattern using the following code: Regexp::Assemble->new->add(@words)->reduce(0)->as_string Afterward, I converted it to UTF-8 hex. I'm wondering if there are any tools available to facilitate the creation of these regex patterns. Thank you, Jimmy