"spamassassin -D bayes" will tell you, you should see a line like:
bayes: skipped token 'from' because it's in stopword list for language 'en'

 Giovanni

On 12/28/23 15:45, Jimmy wrote:
The pattern has successfully passed the test script, but it needs to check 
whether Bayes learning will identify and possibly exclude the word from 
matching this pattern.

Thank you.


On Thu, Dec 28, 2023 at 9:22 PM <giova...@paclan.it 
<mailto:giova...@paclan.it>> wrote:

    On 12/28/23 12:59, Jimmy wrote:
     > Hi,
     >
     > I'm seeking assistance in incorporating a stopword for Asian languages 
in Unicode. Although I possess comprehensive word lists, my attempts to generate a 
regex pattern and test it have been unsuccessful; the pattern fails to match or 
skips tokens in the newly added stopword list.
     >
     > I created the regex pattern using the following code:
     >
     > Regexp::Assemble->new->add(@words)->reduce(0)->as_string
     >
     > Afterward, I converted it to UTF-8 hex.
     >
     > I'm wondering if there are any tools available to facilitate the 
creation of these regex patterns.
     >
    I have used Regexp::Trie to create Bayes stopwords in the past, code is 
similar to:
    
-----------------------------------------------------------------------------------------------------------
    use strict;
    use warnings;

    use Encode;
    use Regexp::Trie;

    my @input = <STDIN>;
    my $rt = Regexp::Trie->new;
    for my $w ( @input ) {
        chomp($w);
        $rt->add($w);
    }
    my $regexp = $rt->regexp;
    my @reg = split //, $regexp;
    for my $c ( @reg ) {
        my $char = $c;
        my $test;
        eval "\$test = decode( 'utf8', \$c, Encode::FB_CROAK )";
        if( $@ ) {
          print 'x' . sprintf("%x", ord($c));
        } else {
          print $char;
        }
    }
    
-----------------------------------------------------------------------------------------------------------

       Giovanni


Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to