Re: FPs on AXB_XMAILER_MIMEOLE_OL_B054A

Ned Slider Tue, 08 Jan 2013 12:08:42 -0800

On 08/01/13 16:31, Kevin A. McGrail wrote:

On 1/8/2013 11:27 AM, Kris Deugau wrote:

Ned Slider wrote:

Hi,


I'd just like to note some FPs on AXB_XMAILER_MIMEOLE_OL_B054A hitting
some ham.

Rules in this cluster seem to target "obsolete" versions of MSOE and its
descendants. See
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6844 for some
discussion around a similar rule.

I can see the reasoning, but all too often ISP end users do not update
their systems, ever, causing these to be seen in live legitimate traffic.


My $0.02. Rules often will hit on Spam and Ham so a FP should really be
something that causes a Spam or Ham to be categorized incorrectly as a
whole.

For example, I may write a rule that scores 0.25 that hits on Spam but
also some Ham. But I also have rules that are negative to negate the Ham
impact.

So if a score is particularly high on a single rule or it contributes to
mismarking an email, it's a good thing to discuss. If it adds a small
amount to a score, that's really not unexpected.

So when the rule misfires on the Ham, is the ham still being overall not
marked as Spam? Do you see a good amount of hits from the rule on Spam?

Regards,
KAM


Hi Kevin,

I absolutely take your point about scoring ham vs spam, and in this casethe ham was indeed not misclassified as spam. Bayes was correctlyscoring these, either neutrally or as ham. About the only rule hittingwith any significant score was AXB_XMAILER_MIMEOLE_OL_B054A.

However, in order to improve overall efficiency I do take note and tryto investigate when any rule hits on ham, especially when that rule isscored at anything much higher than an informational score. This rulecame to my attention as the score has very recently increased from aninformational score of 0.001 to a not insignificant 2.121 (and evenhigher for those not running network tests and/or bayes). If as yousuggest it had a score of 0.25 then it almost certainly wouldn't havecaught my attention.

The fact it is scoring greater than 40% of a spam classification doesn'tappear justified from examination of my corpus. I see absolutely no hitsin my spam corpus dating back two years and covering over 10,000messages (I grant small by some standards). I see a small number of hitsagainst ham dating back to June 2012 (perhaps around the time the rulewas first introduced?) from a handful of senders.

Ultimately it has to come down to rule efficiency and the efficiency ofthis rule _for me_ is pretty awful even if it's not a huge issue. I seeit performs a little better in the official corpus:


http://ruleqa.spamassassin.org/20130107-r1429709-n/AXB_XMAILER_MIMEOLE_OL_B054A/detail

It's probably fair to say that neither my nor the SA corpus are idealfor judging the true performance of such rules but in each case it'swhat we have to work with.

Having read the bugzilla Kris referenced I do now at least understand alittle of the reasoning behind the rule :-)


Thanks for the responses.

Re: FPs on AXB_XMAILER_MIMEOLE_OL_B054A

Reply via email to