On Sonntag 17 Mai 2009 Michael Monnerie wrote: To clarify my posting, here some additions: > Question 1: > Do I need to call spamc twice, once with "-L spam" and once with "-C > report"? Do I understand correctly that -L trains my bayes, while -C > reports to spamcop etc.?
The man page of spamc parameter -C says "Report or revoke a message to one of the configured collaborative filtering databases". Which one? If I use spamcop and dcc and pyzor and razor? > Question 2: > Is calling spamassassin better than spamc for such a mbox? Seems no, at least from a performance perspective: # time spamassassin -r --mbox $mbox_with_markups_existing 1017 message(s) examined. real 39m9.567s user 1m48.670s sys 2m53.980s # time formail <$mbox_with_markups_existing -n 3 -s spamc -L spam real 3m11.299s user 0m0.270s sys 0m3.540s So 36 minutes saved. Or you can say it took 13 times longer to run spamassassin than spamc. If I use not the original spam folder, but one with every markup stripped, the file is 7070015 Bytes instead 13943173. (Yes, we use a big markup). Then it only takes: # time formail <$mbox_with_markups_removed -n 3 -s spamc -L spam real 0m47.588s user 0m0.080s sys 0m0.960s The reporting takes another 3 minutes: # time formail <$mbox_with_markups_existing -n 3 -s spamc -C report real 2m48.257s user 0m0.290s sys 0m4.010s Why is there no mode -L spam -C report to spamc? Could do both at once. > Question 3, my main question: > The fetchmail command is taking *ages*, when I call it like above it > takes *hours*, replacing the "-m" parameter with "cat >>/tmp/x" takes > 7 minutes. I can see spamassassin using 100% cpu. Why is it so > extremely slow and CPU consuming just to remove any existing markups? > I like to remove existing markups, and I need the resulting mbox > format for other things as well. Is there a way to make it so fast > that it's usable? I know it takes such a long time because "formail -s|spamassassin -d" calls spamassassin for every single mail, which is a mountain of overhead. But there is no "spamc --remove-markups" mode, right? Is there a fast way to remove markups from thousands of collected e-mails? mfg zmi -- // Michael Monnerie, Ing.BSc ----- http://it-management.at // Tel: 0660 / 415 65 31 .network.your.ideas. // PGP Key: "curl -s http://zmi.at/zmi.asc | gpg --import" // Fingerprint: AC19 F9D5 36ED CD8A EF38 500E CE14 91F7 1C12 09B4 // Keyserver: wwwkeys.eu.pgp.net Key-ID: 1C1209B4
signature.asc
Description: This is a digitally signed message part.