Re: "non-fuzzy body parts in subject": missed

Matt Kettler Mon, 17 Apr 2006 18:07:57 -0700

Linda Walsh wrote:
> I have been receiving a spate of short messages that don't seem
> 
> to trigger enough default rules to be knocked out.  I was
> investigating and noticed a discrepancy [bug?] in the rules.
> 
> One particular email refers to the uniquely Male-Body-Part starting
> w/"P", let's call MBP for purposes discussion.


> 
> 
> It gets hit by a '20' rule for body parts in the message body,
> but I noticed it doesn't get anything for the subject:

Yes it does.. the text of the subject line will match against any body rule. SA
pre-pends this so we don't have to have a massive duplication of rules to cover
both body and subject.

> "Want a Bigger MBP?"  A '25_replace' rule is present for "fuzzy"
> MBP's, but doesn't seem to catch unfuzzy ones.
> So I guess questions might be:
>    1) should 'fuzzy' rules match non-fuzzy targets as well
>       as fuzzy ones?

IMHO, no. I think there should be two rules with separate scores. In the above
example the scores would be pretty much the same.

However consider the word viagra, an obfuscation is a clear sign of spam.
Un-obfuscated is a less strong sign of spam in this case, because it could be a
joke or a conversation with a medical discussion of some form.

>    2) Should there be some "normalization" adjustment for
> short messages?
>   I'm thinking a "scale factor" rather than an absolute score
> to add, -- reflecting the general idea that short messages
> are not bad, but if you are scoring on the "bad" side, a
> multiplier (ex. 1.1 or 1.2) would increase the score of a message
> that is already being sized up as "bad".
> 
>   Does SA support any multiplier type rules? 

No.

> Should it, or
> rather, do people feel this is a good idea?

I don't feel that would be a good idea. Bear in mind this would also make a
"good" message (ie: one at -1.0) be "more good". It just doesn't make sense to
me to have something which merely acts as a "score amplifier" instead of a score
adjustment.

Performing any kind of GA to establish a reasonable multiplier value for these
would be a logistical nightmare.

You also get into an issue of order-of-operations. Does this multiplier apply to
the current score as of the momet the rule hits? or after the total message
score is calculated do you make a second pass and factor in all the multipliers,
taking a slight performance hit for the extra calculation run?

Re: "non-fuzzy body parts in subject": missed

Reply via email to