Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-16 Thread Craig R Hughes
It was probably me when I broght all the 2_3_0 line changes forward onto the trunk. C Daniel Quinlan wrote: DQ> Bart Schaefer <[EMAIL PROTECTED]> writes: DQ> DQ> > Hmm, I just did a "cvs up" (on the head, not the branch) and: DQ> > DQ> > score: FORGED_RCVD_TRAIL 1.000 -> absent DQ> > score: MSG

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Matthew Cline
On Saturday 15 June 2002 12:31 pm, Craig R Hughes wrote: > Michael Moncur wrote: > MM> score ASCII_FORM_ENTRY -1.660 > > Looks like lots of false positives on the appended lines at the bottom of > Sourceforge mailing list messages. This score should probably be pumped up > a little

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Duncan Findlay
On Sat, Jun 15, 2002 at 12:31:31PM -0700, Craig R Hughes wrote: > I think the rule needs to be adjusted to not trigger on 3 words' presence in the > message, since "asian" and "hardcore" can occur in legitimate messages. > Instead, it should trigger based on %age of words which are in the list, so

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Larry Rosenman
On Sat, 2002-06-15 at 14:31, Craig R Hughes wrote: > MM> - Not as weird as all that, apparently > MM> score MSGID_CHARS_WEIRD -2.178 > > Looks like mail servers (Exchange and Netscape mail server) sometimes > create > message ids which look like: > > Message-Id: > > I don't know w

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Craig R Hughes
Michael Moncur wrote: MM> When a new release comes out I like to be anal-retentive and go through the MM> GA second-guessing its scores. This is my report for 2.30. A valuable service we've come to count on. MM> - RATWARE must be fixed, it was negative last time MM> score RATWARE

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
Bart Schaefer <[EMAIL PROTECTED]> writes: > Hmm, I just did a "cvs up" (on the head, not the branch) and: > > score: FORGED_RCVD_TRAIL 1.000 -> absent > score: MSGID_CHARS_WEIRD -2.178 -> absent > score: FROM_ADDRESS_EQ_REAL 1.000 -> absent > score: X_NOT_PRESENT -1.920 -> absent > score: FROM_A

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
Craig R Hughes writes: > So it's just because the GA could get away with setting it to 0.921 > -- in practice it's a clear sign of nonspam, and we should just fix > it at -2.0, which I've done on both branches now. Okay. In HEAD, I made the rule less apt to be abused which is just as well since

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Craig R Hughes
Daniel Quinlan wrote: DQ> >>> score X_NOT_PRESENT -1.920 DQ> >> DQ> >> This one is on my hitlist as well. Didn't work out very well. DQ> DQ> Craig R Hughes <[EMAIL PROTECTED]> writes: DQ> DQ> > But it actually turns out to be great at clawing back false DQ> > positives. I think

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Craig R Hughes
Daniel Quinlan wrote: DQ> Craig R Hughes <[EMAIL PROTECTED]> writes: DQ> DQ> >> score: BUGZILLA_BUG -2.000 -> 0.921 DQ> DQ> > Moved to the right section of the scores file, and score reverted to -2.0 DQ> DQ> But why is it positive? Doesn't it mean there are good messages in DQ> the spam corpus o

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
>>> score X_NOT_PRESENT -1.920 >> >> This one is on my hitlist as well. Didn't work out very well. Craig R Hughes <[EMAIL PROTECTED]> writes: > But it actually turns out to be great at clawing back false > positives. I think we should leave it in with the low score. Hrmm, it

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Bart Schaefer
On Sat, 15 Jun 2002, Craig R Hughes wrote: > Bart Schaefer wrote: > > BS> These look suspicious: > > Changed back to ... Hmm, I just did a "cvs up" (on the head, not the branch) and: score: FORGED_RCVD_TRAIL 1.000 -> absent score: MSGID_CHARS_WEIRD -2.178 -> absent score: FROM_ADDRESS_EQ_REAL

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
Bart Schaefer wrote: >> These look suspicious: >> >> score: ASCII_FORM_ENTRY 0.036 -> -1.660 Craig R Hughes <[EMAIL PROTECTED]> writes: > Changed back to 0.5 -- as mentioned in previous message, this is > triggering on the sourceforge-appended footers on mailing list > mails. Maybe it would b

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Craig R Hughes
Daniel Quinlan wrote: DQ> Michael Moncur <[EMAIL PROTECTED]> writes: DQ> DQ> > And a few slightly questionable scores: DQ> > DQ> > - This was 0.87 before. Less and less useful? DQ> > score FROM_AND_TO_SAME -2.071 DQ> DQ> I think this one should go. It's a common way to send email t

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Craig R Hughes
Bart Schaefer wrote: BS> These look suspicious: BS> BS> score: ASCII_FORM_ENTRY 0.036 -> -1.660 Changed back to 0.5 -- as mentioned in previous message, this is triggering on the sourceforge-appended footers on mailing list mails. BS> score: BUGZILLA_BUG -2.000 -> 0.921 Moved to the right sect

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
Bart Schaefer <[EMAIL PROTECTED]> writes: > TO_ADDRESS_EQ_REAL happens a _lot_ with Outlook and Outlook Express. Any > time an OE user receives a message with no real name, the address part > gets added to their address book as the name. If they later send a reply > or other message to that add

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Bart Schaefer
On 15 Jun 2002, Daniel Quinlan wrote: > Bart Schaefer <[EMAIL PROTECTED]> writes: > > > score: FORGED_RCVD_TRAIL absent -> 1.000 > > score: FROM_ADDRESS_EQ_REAL absent -> 1.000 > > score: TO_ADDRESS_EQ_REAL absent -> 1.000 > > You're looking at HEAD. These are new rules I added last night. Ah

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
Bart Schaefer <[EMAIL PROTECTED]> writes: > score: ASCII_FORM_ENTRY 0.036 -> -1.660 > score: BUGZILLA_BUG -2.000 -> 0.921 BUGZILLA_BUG obviously needs to be fixed. Maybe an eval would be best. > score: DATE_MISSING 0.248 -> -2.140 > score: EXCUSE_16 1.345 -> -0.721 > score: FORGED_HOTMAIL_RCVD

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Daniel Quinlan
Michael Moncur <[EMAIL PROTECTED]> writes: > And a few slightly questionable scores: > > - This was 0.87 before. Less and less useful? > score FROM_AND_TO_SAME -2.071 I think this one should go. It's a common way to send email to a large list of people without subjecting them all

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Bart Schaefer
On Sat, 15 Jun 2002, Michael Moncur wrote: > When a new release comes out I like to be anal-retentive and go through > the GA second-guessing its scores. This is my report for 2.30. In a similar vein, here are the significant score changes since the last CVS version before the GA was re-run. (I

Re: [SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Tony L. Svanstrom
On Sat, 15 Jun 2002 the voices made Michael Moncur write: > When a new release comes out I like to be anal-retentive and go through the > GA second-guessing its scores. This is my report for 2.30. > - This works well for me but users in some countries may want to change it > score SUBJ_FULL_OF_8

[SAtalk] Evaluation of 2.30 GA scores

2002-06-15 Thread Michael Moncur
When a new release comes out I like to be anal-retentive and go through the GA second-guessing its scores. This is my report for 2.30. Overall, the GA did a NICE job this time. I have very little to complain about and haven't found a single score I'll be bothering to override. Here are a few scor