On Sonntag 17 Mai 2009 Michael Monnerie wrote:

To clarify my posting, here some additions:
> Question 1:
> Do I need to call spamc twice, once with "-L spam" and once with "-C
> report"? Do I understand correctly that -L trains my bayes, while -C
> reports to spamcop etc.?

The man page of spamc parameter -C says "Report or revoke a message to 
one of the configured collaborative filtering databases". Which one? If 
I use spamcop and dcc and pyzor and razor?

> Question 2:
> Is calling spamassassin better than spamc for such a mbox?

Seems no, at least from a performance perspective:
# time spamassassin -r --mbox $mbox_with_markups_existing
1017 message(s) examined.
real    39m9.567s
user    1m48.670s
sys     2m53.980s

# time formail <$mbox_with_markups_existing -n 3 -s spamc -L spam
real    3m11.299s
user    0m0.270s
sys     0m3.540s

So 36 minutes saved. Or you can say it took 13 times longer to run 
spamassassin than spamc. If I use not the original spam folder, but one 
with every markup stripped, the file is 7070015 Bytes instead 13943173. 
(Yes, we use a big markup). Then it only takes:

# time formail <$mbox_with_markups_removed -n 3 -s spamc -L spam
real    0m47.588s
user    0m0.080s
sys     0m0.960s

The reporting takes another 3 minutes:
# time formail <$mbox_with_markups_existing -n 3 -s spamc -C report
real    2m48.257s
user    0m0.290s
sys     0m4.010s

Why is there no mode -L spam -C report to spamc? Could do both at once.

> Question 3, my main question:
> The fetchmail command is taking *ages*, when I call it like above it
> takes *hours*, replacing the "-m" parameter with "cat >>/tmp/x" takes
> 7 minutes. I can see spamassassin using 100% cpu. Why is it so
> extremely slow and CPU consuming just to remove any existing markups?
> I like to remove existing markups, and I need the resulting mbox
> format for other things as well. Is there a way to make it so fast
> that it's usable?

I know it takes such a long time because "formail -s|spamassassin -d" 
calls spamassassin for every single mail, which is a mountain of 
overhead. But there is no "spamc --remove-markups" mode, right? Is there 
a fast way to remove markups from thousands of collected e-mails?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc    -----      http://it-management.at
// Tel: 0660 / 415 65 31                      .network.your.ideas.
// PGP Key:         "curl -s http://zmi.at/zmi.asc | gpg --import"
// Fingerprint: AC19 F9D5 36ED CD8A EF38  500E CE14 91F7 1C12 09B4
// Keyserver: wwwkeys.eu.pgp.net                  Key-ID: 1C1209B4

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to