Hello,

  I'm trying to use Hashbl plugin with bodyre function.

With that function I would like to match utf8 patterns, such as

        '([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)'

I'm in particular interested in accented characters, such as /[àèìòù]/.

With Perl, if I try:

```
use utf8;
use open ':std', ':encoding(UTF-8)';

$txt = ' musica è ciao ciao.';
$re = '([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)';

if ($txt =~ /$re/gs) {
 print "Match: $1";
}
```

then $txt matches as well.


With Spamassassin I built my own dnsbl of hashes and the Spamassassin rule:

body HASHBL_MY_SPAM1 eval:check_hashbl_bodyre('spamhash.example.com', 'sha1/max=10/shuffle', '([\p{L}\p{M}\d\S]+[\ \t]+[\p{L}\p{M}\d\S]+)', '^127\.0\.0\.2')

This doesn't match the above $txt in the body of the mail.

If I want to match as expected the string ' musica è ciao ciao.' in the body of the mail, then I must change the above regex in the following way:

body HASHBL_MY_SPAM1 eval:check_hashbl_bodyre('spamhash.example.com', 'sha1/max=10/shuffle', '([\p{L}\p{M}\d\Sàèìòù]+[\ \t]+[\p{L}\p{M}\d\Sàèìòù]+)', '^127\.0\.0\.2')


So I have to add the accented character literally.
I can't understand why. Are there any limitation in Hashbl plugin with UTF8?
Maybe I have misunderstood something.

Thank you very much for every hint.

Kind Regards
Marco

Reply via email to