FreeBSD 10.3-RELEASE FreeBSD 10.3-RELEASE #0 r322073: Sat Aug 5 01:44:09 PDT 2017 spamassassin-3.4.1_10 amavisd-new-2.11.0_2,1
I'm finding the command /usr/local/bin/sa-learn --spam --showdots /mail/blackrosetech.com/gessel/.Junk/{cur,new} is taking a while to complete... by a while I mean it has been running for 3 days. The folder has a few months of spam in it, 4760 "conversations" according to Thunderbird, which is roughly the message count since spam doesn't tend to thread deeply. I was trying to track progress and... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 1646 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 114841 0 non-token data: ntokens 0.000 0 1438503364 0 non-token data: oldest atime 0.000 0 1508955277 0 non-token data: newest atime 0.000 0 1508964658 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count .... about an hour later.... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 1690 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 114841 0 non-token data: ntokens 0.000 0 1438503364 0 non-token data: oldest atime 0.000 0 1508955277 0 non-token data: newest atime 0.000 0 1508964658 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count but then 24 hours later... # sa-learn --dump magic 0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 0 0 non-token data: nham 0.000 0 133661 0 non-token data: ntokens 0.000 0 1438503364 0 non-token data: oldest atime 0.000 0 1508955277 0 non-token data: newest atime 0.000 0 1508964658 0 non-token data: last journal sync atime 0.000 0 0 0 non-token data: last expiry atime 0.000 0 0 0 non-token data: last expire atime delta 0.000 0 0 0 non-token data: last expire reduction count Two issues: 1) sa-learn seems really, really slow. Slow enough that spam sometimes comes in faster. This seems far slower than the benchmark results suggest is within the range of normal. I'm sure I'm doing something really wrong, but not sure what. 2) what happened to my hard won spam tokens? I know --no-sync should speed up the process and if the task ever completes (or can be killed) I'll test that for speed on a smaller collection. Would something like specifying the mailbox format also help?