On 31.10.17 01:35, David Gessel wrote:
amavisd-new-2.11.0_2,1
I'm finding the command /usr/local/bin/sa-learn --spam --showdots
/mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to

if you use amavis, you must train amavis' bayes database
(/var/lib/amavis/.spamassassin/ here), not your own.

complete...  by a while I mean it has been running for 3 days.  The folder
has a few months of spam in it, 4760 "conversations" according to
Thunderbird, which is roughly the message count since spam doesn't tend to
thread deeply.

It's not needed to train on all spam you have. after initial training of
let's say 200-500 pieces of (different types of) logged spam, it may be
enough to train only spam that does not hit BAYES_99

It's much more important to train on ham, since SA must know the DIFFERENCES 
between ham
and spam - otherwise all mail will of course look like spam.
(Also, SA won't hit if you don't have enough of ham trained).

It's much worse to have FP than FN - train everything that does not hit
BAYES_00

I have trained my DB years ago and I rarely need new training now.

I was trying to track progress and...
# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0       1646          0  non-token data: nspam
0.000          0          0          0  non-token data: nham

but then 24 hours later...

# sa-learn --dump magic
0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          0          0  non-token data: nham

are you sure someone did not back up your spam DB

Two issues:

1) sa-learn seems really, really slow.  Slow enough that spam sometimes
comes in faster.  This seems far slower than the benchmark results suggest
is within the range of normal.  I'm sure I'm doing something really wrong,
but not sure what.

2)  what happened to my hard won spam tokens?


I know --no-sync should speed up the process and if the task ever completes
(or can be killed) I'll test that for speed on a smaller collection.

--no-sync only helps if you have "bayes_learn_to_journal 1" - it's 0 by
default.  try turning it on.

Would something like specifying the mailbox format also help?

only if you use mbox format. --
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows found: (R)emove, (E)rase, (D)elete

Reply via email to