Dear experts, I have a question regarding spam/ham learning, regarding performance. I store spam in a mail folder accessible via IMAP. Then I want to feed this into bayes. For this, I do:
fetchmail -asnp IMAP --folder autolearn --user $username -m "formail -s |spamassassin -d >>/tmp/x" $mailserver # now learn as user formail </tmp/x -s spamc -u $user -L spam # now feed to bayes formail </tmp/x -n 3 -s spamc -u $user -C report # we could also do this: spamassassin -r --mbox Question 1: Do I need to call spamc twice, once with "-L spam" and once with "-C report"? Do I understand correctly that -L trains my bayes, while -C reports to spamcop etc.? Question 2: Is calling spamassassin better than spamc for such a mbox? Question 3, my main question: The fetchmail command is taking *ages*, when I call it like above it takes *hours*, replacing the "-m" parameter with "cat >>/tmp/x" takes 7 minutes. I can see spamassassin using 100% cpu. Why is it so extremely slow and CPU consuming just to remove any existing markups? I like to remove existing markups, and I need the resulting mbox format for other things as well. Is there a way to make it so fast that it's usable? mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
signature.asc
Description: This is a digitally signed message part.