Could you share a config line and a sample you are using ? Giovanni
On 12/28/23 16:26, Jimmy wrote:
Yes, I have done that, and I am also editing Plugin/Bayes.pm to investigate why it is not being skipped. I suspect that if words are not separated by spaces, longer words may not match those patterns. Jimmy On Thu, Dec 28, 2023 at 10:13 PM <giova...@paclan.it <mailto:giova...@paclan.it>> wrote: "spamassassin -D bayes" will tell you, you should see a line like: bayes: skipped token 'from' because it's in stopword list for language 'en' Giovanni On 12/28/23 15:45, Jimmy wrote: > The pattern has successfully passed the test script, but it needs to check whether Bayes learning will identify and possibly exclude the word from matching this pattern. > > Thank you. > > > On Thu, Dec 28, 2023 at 9:22 PM <giova...@paclan.it <mailto:giova...@paclan.it> <mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote: > > On 12/28/23 12:59, Jimmy wrote: > > Hi, > > > > I'm seeking assistance in incorporating a stopword for Asian languages in Unicode. Although I possess comprehensive word lists, my attempts to generate a regex pattern and test it have been unsuccessful; the pattern fails to match or skips tokens in the newly added stopword list. > > > > I created the regex pattern using the following code: > > > > Regexp::Assemble->new->add(@words)->reduce(0)->as_string > > > > Afterward, I converted it to UTF-8 hex. > > > > I'm wondering if there are any tools available to facilitate the creation of these regex patterns. > > > I have used Regexp::Trie to create Bayes stopwords in the past, code is similar to: > ----------------------------------------------------------------------------------------------------------- > use strict; > use warnings; > > use Encode; > use Regexp::Trie; > > my @input = <STDIN>; > my $rt = Regexp::Trie->new; > for my $w ( @input ) { > chomp($w); > $rt->add($w); > } > my $regexp = $rt->regexp; > my @reg = split //, $regexp; > for my $c ( @reg ) { > my $char = $c; > my $test; > eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )"; > if( $@ ) { > print 'x' . sprintf("%x", ord($c)); > } else { > print $char; > } > } > ----------------------------------------------------------------------------------------------------------- > > Giovanni >
OpenPGP_signature.asc
Description: OpenPGP digital signature