Bart Schaefer wrote:

BS> These look suspicious:
BS>
BS> score: ASCII_FORM_ENTRY 0.036 -> -1.660

Changed back to 0.5 -- as mentioned in previous message, this is triggering on
the sourceforge-appended footers on mailing list mails.

BS> score: BUGZILLA_BUG -2.000 -> 0.921

Moved to the right section of the scores file, and score reverted to -2.0

BS> score: DATE_MISSING 0.248 -> -2.140

I've set the score for this one to 2.0 because I think the GA was working with
bad input data on this rule (see prior message)

BS> score: EXCUSE_16 1.345 -> -0.721

I think this should be left as is.  The big score change is probably because
I've had a huge volume of correspondence with attorneys and accountants in the
last few months.  This is probably more representative of the email stream of
the general population than was the previous corpus.

BS> score: FORGED_HOTMAIL_RCVD 0.530 -> -0.356

This was arbitrary -- no FORGED_HOTMAIL in the current corpus at all.  I'll
reste to 0.5

BS> score: FROM_AND_TO_SAME 0.877 -> -2.071

I think this needs to be fixed to something like 2.0, but I'll leave it as is
pending the resolution of the bugzilla bug related to being in ones own AWL.

BS> score: FROM_NAME_NO_SPACES 0.500 -> -0.114

Not a huge swing -- I think I'll believe the GA

BS> score: GREEN_EXCUSE_1 3.116 -> -2.019

This one is a little odd.  I've reset it to 3.116

BS> score: INTL_EXEC_GUILD 0.781 -> -0.039

Also odd.  Reset to 0.781

BS> score: MONEY_BACK 1.489 -> -0.239
BS> score: MONEY_MAKING 2.490 -> -0.687

Both reset to 1.500 and 2.500

BS> score: MSGID_CHARS_WEIRD 1.500 -> -2.178

This is a bad rule.  Needs to realize that [] exist in valid MSGIDs.  Score left
as is until rule is fixed.

BS> score: NO_REAL_NAME 0.632 -> -1.068

I think this should probably be somewhere around +0.5, probably low because of
sysadmin-email bias in the corpus.  Resetting to 0.5

BS> score: X_NOT_PRESENT 0.500 -> -1.920

I think this is probably right actually.  There may be some element of sysadmin
bias in the corpus, but will need someone to argue in favor of a +ve score for
this rule before I reset it.

BS> (How does BUGZILLA_BUG keep creeping back into the GA?)

Don't know -- probably cos I failed to move it into the "do not evolve these"
section.  I've made that move now so it should be OK in the future.

BS> score: ASKS_BILLING_ADDRESS 2.627 -> -0.152
BS> score: BE_AMAZED -0.260 -> 4.202
BS> score: CTYPE_JUST_HTML 3.154 -> 1.665
BS> score: LINES_OF_YELLING 0.453 -> -0.036
BS> score: LINES_OF_YELLING_3 -1.518 -> 0.478
BS> score: MAILTO_TO_REMOVE 1.341 -> -1.669
BS> score: MAILTO_WITH_SUBJ -0.310 -> 1.900
BS> score: MIME_NULL_BLOCK 0.157 -> -0.975
BS> score: SLIGHTLY_UNSAFE_JAVASCRIPT -0.794 -> 0.693
BS> score: SUBJ_ALL_CAPS 1.933 -> -0.054
BS> score: SUPERLONG_LINE -0.374 -> 0.384
BS> score: TO_BE_REMOVED_REPLY -2.150 -> 3.985
BS> score: TO_UNSUB_REPLY -1.996 -> 3.366
BS> score: TRACKER_ID -4.215 -> 4.332
BS> score: X_ESMTP 1.000 -> -1.662

I think many of these rules changed between 2.2 and 2.3, so the scores have
changed to reflect the more accurately-drafted rules.

BS> How did these get exactly 1.0?  Not represented in the corpus at all?
BS>
BS> score: FORGED_RCVD_TRAIL absent -> 1.000
BS> score: FROM_ADDRESS_EQ_REAL absent -> 1.000
BS> score: TO_ADDRESS_EQ_REAL absent -> 1.000

Yes, none of any of these in the corpus.  The RCVD_TRAIL was added late, after
everyone had already run mass-check.  The others possibly too.

BS> Amusing anecdote in case you get this far:  I recently had to whitelist
BS> several friends because they were discussing the minutes of the local
BS> school board meeting.  The budget numbers triggered the Nigerian scam
BS> rules and the sex-ed discussion set off the PORN rules.  There's a case
BS> where rule intersection analysis might have been helpful -- there's
BS> probably not much Nigerian porn priced at millions of dollars.

You'd be surprised.  It depends on whether you're buying in bulk ;)

C


_______________________________________________________________

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to