Re: Bayes Stopword

giovanni Thu, 28 Dec 2023 07:59:28 -0800

Could you share a config line and a sample you are using ?
 Giovanni


On 12/28/23 16:26, Jimmy wrote:

Yes, I have done that, and I am also editing Plugin/Bayes.pm to investigate why 
it is not being skipped. I suspect that if words are not separated by spaces, 
longer words may not match those patterns.

Jimmy

On Thu, Dec 28, 2023 at 10:13 PM <giova...@paclan.it 
<mailto:giova...@paclan.it>> wrote:

    "spamassassin -D bayes" will tell you, you should see a line like:
    bayes: skipped token 'from' because it's in stopword list for language 'en'

       Giovanni

    On 12/28/23 15:45, Jimmy wrote:
     > The pattern has successfully passed the test script, but it needs to 
check whether Bayes learning will identify and possibly exclude the word from 
matching this pattern.
     >
     > Thank you.
     >
     >
     > On Thu, Dec 28, 2023 at 9:22 PM <giova...@paclan.it <mailto:giova...@paclan.it> 
<mailto:giova...@paclan.it <mailto:giova...@paclan.it>>> wrote:
     >
     >     On 12/28/23 12:59, Jimmy wrote:
     >      > Hi,
     >      >
     >      > I'm seeking assistance in incorporating a stopword for Asian 
languages in Unicode. Although I possess comprehensive word lists, my attempts to 
generate a regex pattern and test it have been unsuccessful; the pattern fails to 
match or skips tokens in the newly added stopword list.
     >      >
     >      > I created the regex pattern using the following code:
     >      >
     >      > Regexp::Assemble->new->add(@words)->reduce(0)->as_string
     >      >
     >      > Afterward, I converted it to UTF-8 hex.
     >      >
     >      > I'm wondering if there are any tools available to facilitate the 
creation of these regex patterns.
     >      >
     >     I have used Regexp::Trie to create Bayes stopwords in the past, code 
is similar to:
     >     
-----------------------------------------------------------------------------------------------------------
     >     use strict;
     >     use warnings;
     >
     >     use Encode;
     >     use Regexp::Trie;
     >
     >     my @input = <STDIN>;
     >     my $rt = Regexp::Trie->new;
     >     for my $w ( @input ) {
     >         chomp($w);
     >         $rt->add($w);
     >     }
     >     my $regexp = $rt->regexp;
     >     my @reg = split //, $regexp;
     >     for my $c ( @reg ) {
     >         my $char = $c;
     >         my $test;
     >         eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )";
     >         if( $@ ) {
     >           print 'x' . sprintf("%x", ord($c));
     >         } else {
     >           print $char;
     >         }
     >     }
     >     
-----------------------------------------------------------------------------------------------------------
     >
     >        Giovanni
     >

OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: Bayes Stopword

Reply via email to