Hi,

Please use plain-text rather than HTML. In particular with that really
bad indentation format of quoting.

It doesn't seem possible with gmail directly any longer, so I've set up thunderbird for this. Maybe it is, but not after clicking around in the obvious places.

X-Spam-MyReport: Tokens: new, 47; hammy, 7; neutral, 54; spammy, 16.

Isn't that sufficient for auto-learning this message as spam?
             ^^^^
That's clearly referring to the _TOKEN_ data in the custom header, is it
not?

Yes. Burning the candle at both ends. Really overworked.

Sorry to hear. Nonetheless, did you take the time to really understand
my explanations? It seems you sometimes didn't in the past, and I am not
happy to waste my time on other people's problems if they aren't
following thoroughly.

Yes, always. It may not be immediately, but the time you give up to do this is not lost on me. My brain sometimes goes faster than I can explain myself properly. I make too many assumptions about what people understand about me, my abilities, and my comprehension of a topic.

Learning is not limited to new tokens. All tokens are learned,
regardless their current (h|sp)ammyness.

Still, the number of (new) tokens is not a condition for auto-learning.
That header shows some more or less nice information, but in this
context absolutely irrelevant information.

I understood "new" to mean the tokens that have not been seen before, and
would be learned if the other conditions were met.

Well, yes. So what?

Did you understand that the number of previously not seen tokens has
absolutely nothing to do with auto-learning?

Yes, that was a mistake.

Did you understand that all
tokens are learned, regardless whether they have been seen before?

That doesn't really matter from a user perspective, though, right? I mean, if there are tokens that have already been learned are learned again, the net result is zero.

This whole part is entirely unrelated to auto-learning and your original
question.

Yes, I see that, and much of it comes to not explaining myself properly originally. I really only meant to tie it in with the tokens that would be learned had it been determined that autolearning would be taking place.

I understand now that all the tokens are learned always anyway.

As I have mentioned before in this thread: It is NOT the message's
reported total score that must exceed the threshold. The auto-learning
discriminator uses an internally calculated score using the respective
non-Bayes scoreset.

Very helpful, thanks. Is there a way to see more about how it makes that
decision on a particular message?

   spamassassin -D learn

Unsurprisingly, the -D debug option shows information on that decision.
In this case limiting debug output to the 'learn' area comes in handy,
eliminating the noise.

The output includes the important details like auto-learn decision with
human readable explanation, score computed for autolearn as well as head
and body points.

It's been a long time since I've gone through the debug output for bayes info, but I have done that. Only now, I'll have a little better understanding of what it means, and can start to improve my overall understanding of the bayes component of spamassassin.

Hopefully others also benefited from this crazy thread as much as I did.

Thanks,
Alex

Reply via email to