That's confirmed. sa-learn doesn't like compressed files. I don't know if it will dine on compressed files with the correct extension (i.e., .gz). Unfortunately, when using compression with Maildir format, Dovecot doesn't seem to like to use extensions. So, I copied the directory to a temporary location, decompressed the files and then set sa-learn on them. Even getting gunzip to operate on the files was a pain because it only wants files with the .gz extension (so I had to rename all 6,000 of them first - using a utility like 'rename'). I then did the same thing with about 9,000 hams.
There was much good news. Learning proceeded about the same pace, but syncing the journal to the database was *much *faster. Maybe the tokens were smaller? I verified that it seemed to work with --dump magic. Then, all by itself, Spamassassin's bayes filtering was instantly much better. Stuff that was tripping BAYES_00 was suddenly popping BAYES_99. Now, I just need to update my nightly learning/reporting script. Still, a very nice result. On Fri, May 21, 2021 at 11:30 AM Henrik K <h...@hege.li> wrote: > On Fri, May 21, 2021 at 10:54:54AM -0400, Clive Jacques wrote: > > Do spamassassin or sa-learn understand compressed files or compressed > Maildir? > > I believe sa-learn will automatically decompress if the files have .gz or > .bz2 extension, but yes Maildir files without extension will not work. > > Should be easy to detect compressed Maildir files, perhaps file enhancement > request in bugzilla. > >