Re: New rule for HTML spam, using comments?

Ben Johnson Tue, 18 Jun 2013 10:58:47 -0700


On 6/18/2013 1:18 PM, Amir 'CG' Caspi wrote:
> At 8:58 AM -0400 06/18/2013, Ben Johnson wrote:
>> a.) You are copying/pasting the body of the email, but not the headers.
> 
> No, I am copying the headers... however, I am using Eudora (ancient, I
> know) as a mail client, and it's possible the headers are not properly
> formatted.  For example, for SpamCop I have to use their "workaround"
> script.  I don't know what exactly is mal-formed, though.


For the sake of troubleshooting, can you try accessing the mail by some
other means, e.g., opening the file directly from the filesystem?
Doesn't mbox store email messages as plaintext files? (Kris already beat
me to it regarding this suggestion.)

> I should admit at this point that much of my sa-learn has been on
> Eudora's mboxes, by the way.  That is, I would take the Eudora mbox and
> sa-learn on that.  Eudora is supposed to use standard mbox format, but
> I'm wondering if maybe it's not so standard after all...

How would anything ever be flagged with a score higher than BAYES_00 if
this were to be the problem? Didn't you report a score of BAYES_99 in
one of your tests?

> Either way, I am _trying_ to copy the entire message.  Not sure what is
> misformatted there.  If you take a look at my two pasted examples (links
> below for convenience), those are direct copy/paste from Eudora's "raw
> source" view.  Any idea what is malformed?  Do I need an extra newline
> between the header and body, or something more complicated?
> 
> http://pastebin.com/HD0rNdxU
> http://pastebin.com/Zswg77Ds

How are you feeding the messages to sa-learn? Are you not just passing
the email file, e.g., /var/vmail/example.com/...? Why copy from Eudora
and paste into a temporary file when you can just point sa-learn
straight to the message on disk?

>> b.) You are running Bayes as two different users when you perform your
> 
> No, I have been careful for that.  You saw that I pasted the maillog
> entries... notice that spamd runs as setuid.  I made sure the same
> userid was in the logs, and in my command.

I had missed that detail; looks okay.

>> Have a look at the thread I cited and see if anything jumps-out at you.
> 
> Will do, but unfortunately, I don't think the problem is as clear cut as
> (b) ... maybe it's (a) though, in which case I wonder if I have to
> modify my Eudora mboxes before learning on them.

Do you retain your training corpus? This may be one of those instances
in which the best way to debug the problem is to wipe and retrain Bayes.
Of course, that can be a nightmare if you don't retain the messages that
you've trained as ham and spam.

> Thanks.
> 
>                         -- Amir

Re: New rule for HTML spam, using comments?

Reply via email to