From: "Magnus Holmgren" <[EMAIL PROTECTED]>
On Sunday 15 October 2006 16:55, Magnus Holmgren took the opportunity to say:
Indeed, when I did "spamassassin -D bayes < testmessage" the debug output
reported learning from a different "@sa_generated" message ID
than "sa-learn -D bayes --forget" said it was trying to forget (but didn't
find). AFAICT from reading the source, get_msg() in
Mail::SpamAssassin::Bayes is used in both cases. So why does it make up
different IDs?
Apparently, when sa-learn reads a message from stdin, for some reason the
entire header, and possibly even the empty line separating it from the body,
disappears. Or at least $msg->get_header("Date") and
$msg->get_header("Received") in get_msgid() in Bayes.pm return undef or ''.
When I give sa-learn a filename it works. Also, learning via the TELL spamd
method works, as does spamassassin -r with filename as well as stdin.
Magnus, either you have horridly hashed up your SA setup or you are
learning differently than you think.
First, if you have fed a message through SpamAssassin and it has
encapsulated the spam as an attachment the resultant message will
have a different message id. I am not sure which message ID gets
reported at the place you are looking. (It appears you are messing
with the source. That's not a good idea until you are sure what the
program is doing. But I'm sure you know that already.)
You do not give adequate information about how you are running salearn
for anybody to make any useful guesses about how to help you. So I
rather hesitate to make a lot of guesses like "testmessage" being a
pile of spam messages all rolled into one in mbox format without using
the --mbox flag on sa-learn or guessing you did not read the salearn
man page or even feeding the message to be learned to sa-learn through
"stdin."
These two lines provable work properly for me when I learn a packet
of spam messages in mbox format.
sa-learn --ham --showdots --mbox ~/mail/ham
sa-learn --spam --showdots --mbox ~/mail/spam
Your prior message indicated you were fussing with something like
"autolearn=ham" or "autolearn=spam". Those are simply informative
tags in the message markup. They are not instructions for sa-learn.
You do not want to change the message file in ANY way. Do not strip
off the SA markup. The sa-learn tool is smart enough to do that for
you. Take the raw spamassassin marked up message, feed it to sa-learn
with the proper --ham or --spam marking on it. Feed it in by filename,
which is all sa-learn understands. Give sa-learn a hint about the
mailbox format. It's designed to read masses of messages so you do not
need to feed them one at a time, although that works, too.
{^_^}
{^_^}