Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-28 Thread Rob McMillin
Michael Moncur wrote: >>body CORRECT_FOR_EXCHANGE /This message is in MIME format/ >>describe CORRECT_FOR_EXCHANGE Correct for MIME 'null block' >> > >FYI, I seem to recall SA already having a test like this. You might want to >double-check. > Yes, it's called MIME_NULL_BLOCK. (I'm lookin

RE: [SAtalk] Troubling new scores in 2.1 release

2002-02-28 Thread Michael Moncur
> To me, -ve scores on tests can also be used to "offset" spammy messages in > clean email. I have several of these of my own creation: Well, yes, that's true - SpamAssassin already includes a bunch of these, such as COPYRIGHT_CLAIMED and PHP_SIGNATURE. What I was talking about was the fact that

RE: [SAtalk] Troubling new scores in 2.1 release

2002-02-28 Thread Andrew Kohlsmith
> I know there are theoretical reasons why this might make sense, but I don't > see any benefit in the real world for scores like these. The high scores > increase the chance of a random false positive - regardless of the size of > the existing corpus - and if the negative ones indicate that the r

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Andrew Kohlsmith
> SPAM: Hit! (4.9 points) BODY: URL of page called "remove" > SPAM: Hit! (6.5 points) BODY: Link to a URL containing "remove" No, not impressive. Those two scores would put a whole lot of honest opt-in web "flyers" and likely many mailing lists in the spam bucket. I'm strongly opposed to any

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Daniel Rogers
On Wed, Feb 27, 2002 at 05:15:20PM -0800, Craig R Hughes wrote: > I meant single score, but yet, that message is pretty impressive. I assume it > was not a false-positive :) Uh, yeah, it was real spam. :) I just found a 47.1 hits one, even though it had two -ve scores (HTTP_USERNAME_USED and

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Craig R Hughes
CTED]>, > [EMAIL PROTECTED] > Subject: Re: [SAtalk] Troubling new scores in 2.1 release > > On Wed, Feb 27, 2002 at 05:00:29PM -0800, Craig R Hughes wrote: > > Yes, the large rule scores probably do make the system more sensitive to minor > > variations in input. How

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Daniel Rogers
On Wed, Feb 27, 2002 at 05:00:29PM -0800, Craig R Hughes wrote: > Yes, the large rule scores probably do make the system more sensitive to minor > variations in input. However, they also apparently lead to more accurate > scores. It is interesting that even running unconstrained over 50,000 >

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Craig R Hughes
fer <[EMAIL PROTECTED]> > To: [EMAIL PROTECTED] > Cc: [EMAIL PROTECTED] > Subject: Re: [SAtalk] Troubling new scores in 2.1 release > > On Wed, 27 Feb 2002, Craig R Hughes wrote: > > > This isn't really a problem. It can actually be helpful too to allow > >

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Bart Schaefer
On Wed, 27 Feb 2002, Craig R Hughes wrote: > This isn't really a problem. It can actually be helpful too to allow > the GA to do its own thing [...] On Wed, 27 Feb 2002, Tom Lipkis wrote: > With large scores like this (positive or negative), very small > perturbations in input can cause wildly

Re: [SAtalk] Troubling new scores in 2.1 release

2002-02-27 Thread Craig R Hughes
I was aware of the stuff you're pointing out below. This is basically caused by using the new evolver to do the scoring. Previously, scores were limited to the range 0.01-5, now they are unlimited, and allowed to go -ve. A side effect of this is that rules which are really non-discriminators