Re: learning from IMAP spam collection

Chris Sun, 17 May 2009 09:49:52 -0700

On Sun, 2009-05-17 at 09:42 +0200, Michael Monnerie wrote:
> Dear experts,
> 
> I have a question regarding spam/ham learning, regarding performance. I 
> store spam in a mail folder accessible via IMAP. Then I want to feed 
> this into bayes. For this, I do:
> 
> fetchmail -asnp IMAP --folder autolearn --user $username -m "formail -s 
> |spamassassin -d >>/tmp/x" $mailserver
> # now learn as user
> formail </tmp/x -s spamc -u $user -L spam
> # now feed to bayes
> formail </tmp/x -n 3 -s spamc -u $user -C report
> # we could also do this:
> spamassassin -r --mbox 
> 
> Question 1:
> Do I need to call spamc twice, once with "-L spam" and once with "-C 
> report"? Do I understand correctly that -L trains my bayes, while -C 
> reports to spamcop etc.?
> 
> Question 2:
> Is calling spamassassin better than spamc for such a mbox?
> 
> Question 3, my main question:
> The fetchmail command is taking *ages*, when I call it like above it 
> takes *hours*, replacing the "-m" parameter with "cat >>/tmp/x" takes 7 
> minutes. I can see spamassassin using 100% cpu. Why is it so extremely 
> slow and CPU consuming just to remove any existing markups?
> I like to remove existing markups, and I need the resulting mbox format 
> for other things as well. Is there a way to make it so fast that it's 
> usable?
> 
> mfg zmi
Here's a script I've been using for years now on my imap folders. Works
great. I've left some of the information in so you can see how it's
formated. Reports to Razor, Pyzor, DCC and, if setup, to Spamcop.


http://pastebin.com/m39ad4cf9

-- 
KeyID 0xE372A7DA98E6705C

signature.asc
Description: This is a digitally signed message part

Re: learning from IMAP spam collection

Reply via email to