On 08/01/13 16:31, Kevin A. McGrail wrote:
On 1/8/2013 11:27 AM, Kris Deugau wrote:
Ned Slider wrote:
Hi,

I'd just like to note some FPs on AXB_XMAILER_MIMEOLE_OL_B054A hitting
some ham.
Rules in this cluster seem to target "obsolete" versions of MSOE and its
descendants. See
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6844 for some
discussion around a similar rule.

I can see the reasoning, but all too often ISP end users do not update
their systems, ever, causing these to be seen in live legitimate traffic.

My $0.02. Rules often will hit on Spam and Ham so a FP should really be
something that causes a Spam or Ham to be categorized incorrectly as a
whole.

For example, I may write a rule that scores 0.25 that hits on Spam but
also some Ham. But I also have rules that are negative to negate the Ham
impact.

So if a score is particularly high on a single rule or it contributes to
mismarking an email, it's a good thing to discuss. If it adds a small
amount to a score, that's really not unexpected.

So when the rule misfires on the Ham, is the ham still being overall not
marked as Spam? Do you see a good amount of hits from the rule on Spam?

Regards,
KAM


Hi Kevin,

I absolutely take your point about scoring ham vs spam, and in this case the ham was indeed not misclassified as spam. Bayes was correctly scoring these, either neutrally or as ham. About the only rule hitting with any significant score was AXB_XMAILER_MIMEOLE_OL_B054A.

However, in order to improve overall efficiency I do take note and try to investigate when any rule hits on ham, especially when that rule is scored at anything much higher than an informational score. This rule came to my attention as the score has very recently increased from an informational score of 0.001 to a not insignificant 2.121 (and even higher for those not running network tests and/or bayes). If as you suggest it had a score of 0.25 then it almost certainly wouldn't have caught my attention.

The fact it is scoring greater than 40% of a spam classification doesn't appear justified from examination of my corpus. I see absolutely no hits in my spam corpus dating back two years and covering over 10,000 messages (I grant small by some standards). I see a small number of hits against ham dating back to June 2012 (perhaps around the time the rule was first introduced?) from a handful of senders.

Ultimately it has to come down to rule efficiency and the efficiency of this rule _for me_ is pretty awful even if it's not a huge issue. I see it performs a little better in the official corpus:

http://ruleqa.spamassassin.org/20130107-r1429709-n/AXB_XMAILER_MIMEOLE_OL_B054A/detail

It's probably fair to say that neither my nor the SA corpus are ideal for judging the true performance of such rules but in each case it's what we have to work with.

Having read the bugzilla Kris referenced I do now at least understand a little of the reasoning behind the rule :-)

Thanks for the responses.


Reply via email to