$ cat RMScaa8wVRnMfwqlQ0RxAzDjYGmIumlp1wlA8QNr8z.eml | sa-learn --spam
Learned tokens from 0 message(s) (1 message(s) examined)

Indeed does work from stdin

- Lucas

From: Clive Jacques <westriverp...@gmail.com>
Date: Friday, 21 May 2021 at 21.41
To: "users@spamassassin.apache.org" <users@spamassassin.apache.org>
Subject: Re: spamassassin and *compressed* Maildir

I have a mail folder that I put false negatives in (i.e., spam which ends up in 
my inbox) and another for false negatives (ham that ends up in my spam folder). 
 Each night I run sa-learn on each folder (sa-learn will munch on entire 
Maildirs) and also feed each message to spamassassin -r to report it.  So using 
zcat or gunzip -c will work for spamassassin -r, but not for sa-learn.

Unless sa-learn can munch on stdin as well as files....

-CJ

On Fri, May 21, 2021 at 3:28 PM Lucas Rolff 
<lu...@lucasrolff.com<mailto:lu...@lucasrolff.com>> wrote:
You can do `zcat -f` or `gunzip -c -f` and avoid having to have .gz extension, 
that way you can skip the rename step

Best Regards,
Lucas Rolff

From: Clive Jacques <westriverp...@gmail.com<mailto:westriverp...@gmail.com>>
Date: Friday, 21 May 2021 at 21.04
To: "users@spamassassin.apache.org<mailto:users@spamassassin.apache.org>" 
<users@spamassassin.apache.org<mailto:users@spamassassin.apache.org>>
Subject: Re: spamassassin and *compressed* Maildir

That's confirmed.  sa-learn doesn't like compressed files.  I don't know if it 
will dine on compressed files with the correct extension (i.e., .gz).  
Unfortunately, when using compression with Maildir format, Dovecot doesn't seem 
to like to use extensions.  So, I copied the directory to a temporary location, 
decompressed the files and then set sa-learn on them.  Even getting gunzip to 
operate on the files was a pain because it only wants files with the .gz 
extension (so I had to rename all 6,000 of them first - using a utility like 
'rename').  I then did the same thing with about 9,000 hams.

There was much good news.  Learning proceeded about the same pace, but syncing 
the journal to the database was much faster.  Maybe the tokens were smaller?  I 
verified that it seemed to work with --dump magic.

Then, all by itself, Spamassassin's bayes filtering was instantly much better.  
Stuff that was tripping BAYES_00 was suddenly popping BAYES_99.

Now, I just need to update my nightly learning/reporting script.

Still, a very nice result.

On Fri, May 21, 2021 at 11:30 AM Henrik K <h...@hege.li<mailto:h...@hege.li>> 
wrote:
On Fri, May 21, 2021 at 10:54:54AM -0400, Clive Jacques wrote:
> Do spamassassin or sa-learn understand compressed files or compressed Maildir?

I believe sa-learn will automatically decompress if the files have .gz or
.bz2 extension, but yes Maildir files without extension will not work.

Should be easy to detect compressed Maildir files, perhaps file enhancement
request in bugzilla.

Reply via email to