Bayes Stopword

Jimmy Thu, 28 Dec 2023 04:00:18 -0800

Hi,

I'm seeking assistance in incorporating a stopword for Asian languages in
Unicode. Although I possess comprehensive word lists, my attempts to
generate a regex pattern and test it have been unsuccessful; the pattern
fails to match or skips tokens in the newly added stopword list.


I created the regex pattern using the following code:

Regexp::Assemble->new->add(@words)->reduce(0)->as_string

Afterward, I converted it to UTF-8 hex.

I'm wondering if there are any tools available to facilitate the creation
of these regex patterns.

Thank you,
Jimmy

Bayes Stopword

Reply via email to