hmm ... well, i can't reproduce. those results came right after i upgraded from 2.6 to 2.53, before i killed the database and redid it.
debug: running header regexp tests; score so far=0 debug: running body-text per-line regexp tests; score so far=0 debug: bayes corpus size: nspam = 2611, nham = 2190 debug: uri tests: Done uriRE debug: tokenize: header tokens for X-Keywords = "" debug: tokenize: header tokens for *p = "U*ben D*666.com D*com" debug: tokenize: header tokens for *M = " OEA012c OEB3256 OEC0201a8c0 neeeeeee " debug: tokenize: header tokens for *F = "U*ben D*666.com D*com" debug: tokenize: header tokens for To = "U*ben D*666.com D*com" debug: tokenize: header tokens for MIME-Version = "" debug: tokenize: header tokens for *c = "/plain; charset="iso-8859-1"" debug: tokenize: header tokens for Content-Transfer-Encoding = "7bit" debug: tokenize: header tokens for X-Priority = "3" debug: tokenize: header tokens for X-MSMail-Priority = "Normal" debug: tokenize: header tokens for *x = "Microsoft Outlook Express 6.00.2800.115 8" debug: tokenize: header tokens for X-MimeOLE = "Produced By Microsoft MimeOLE V6 .00.2800.1165" debug: tokenize: header tokens for *r = " ddslppp71.tcsn.uswest.net (HELO neeee eee) (216.161.150) by mpls-pop-02.inet.qwest.net ; " debug: tokenize: header tokens for *r = " ddslppp71.tcsn.uswest.net (HELO neeee eee) (216.161.150) by mpls-pop-02.inet.qwest.net ; mpls-pop-02.inet.qwest. net (63.231.195) by mpls-qmqp-02.inet.qwest.net QMQP; " debug: bayes token 'H*M:OEC0201a8c0' => 0.00484684684684685 debug: bayes token 'H*r:neeeeeee' => 0.00532673267326733 debug: bayes token 'H*M:neeeeeee' => 0.00532673267326733 debug: bayes token 'H*r:sk:mpls-qm' => 0.0131219512195122 debug: bayes token 'H*r:63.231.195' => 0.0131219512195122 debug: bayes token 'H*r:sk:mpls-po' => 0.0131219512195122 debug: bayes token 'H*r:216.161.150' => 0.0173548387096774 debug: bayes token 'H*r:sk:ddslppp' => 0.0256190476190476 debug: bayes token 'H*M:OEA012c' => 0.0256190476190476 debug: bayes token 'N:H*M:OECNNNNaNcN' => 0.0551219544138525 debug: bayes token 'H*x:6.00.2800.1158' => 0.942816548024795 debug: bayes token 'H*F:U*ben' => 0.0615910632357968 debug: bayes token 'H*F:D*666.com' => 0.0646164901779201 debug: bayes token 'HX-MimeOLE:V6.00.2800.1165' => 0.927470236684858 debug: bayes token 'test' => 0.111838323728771 debug: bayes token 'N:HX-MimeOLE:VN.NN.NNNN.NNNN' => 0.115400429126286 debug: bayes token 'HContent-Transfer-Encoding:7bit' => 0.115581482135493 debug: bayes token 'H*c:plain' => 0.117569325152892 debug: bayes token 'HX-MimeOLE:MimeOLE' => 0.123452938769744 debug: bayes token 'HX-MimeOLE:Microsoft' => 0.132269298472378 debug: bayes token 'HX-MimeOLE:Produced' => 0.133100007079507 debug: bayes: score = 3.84778018769349e-06 debug: using "/usr/home/wing/.spamassassin" for user state dir debug: bayes: 24040 untie-ing debug: bayes: 24040 untie-ing db_toks debug: bayes: 24040 untie-ing db_seen debug: running raw-body-text per-line regexp tests; score so far=0 ----- Original Message ----- From: "Justin Mason" <[EMAIL PROTECTED]> To: "Ben Wing" <[EMAIL PROTECTED]> Cc: "Martin Radford" <[EMAIL PROTECTED]>; "Justin Mason" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Sunday, October 19, 2003 8:59 PM Subject: Re: [SAtalk] strange behavior of Bayesian analyzer in SA 2.6 > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > "Ben Wing" writes: > >well, i get false positives with an empty body ... > > Yep, that's a pretty serious sign -- the header data in that message (sent > from yourself, to yourself, via your own relays, right?) is being > recognised as spam. > > Try using "spamassassin -D -Lt < msg > out" and watch the bayes tokens > and their values on stderr. e.g. here's an example from sample-nonspam.txt > for me: > > debug: bayes token 'N:NNNN-NN-NN' => 1.60066644848413e-05 > debug: bayes token 'organizations' => 0.000215113954418233 > debug: bayes token 'rarely' => 0.00032196289646918 > debug: bayes token 'ICANN' => 0.000451721242653233 > debug: bayes token 'deeper' => 0.000471516213847502 > debug: bayes token 'commentary' => 0.000647412755716005 > debug: bayes token 'depth' => 0.000680151706700379 > debug: bayes token '1994' => 0.000726045883940621 > debug: bayes token 'voices' => 0.000756680731364276 > debug: bayes token 'Dawson' => 0.000880523731587561 > debug: bayes token 'Host' => 0.000880523731587561 > debug: bayes token 'roots' => 0.000942206654991244 > debug: bayes token 'deceptive' => 0.00114225053078556 > debug: bayes token 'Topic' => 0.00124825986078886 > debug: bayes token 'columnists' => 0.00124825986078886 > debug: bayes token 'Sitescooper' => 0.00127790973871734 > debug: bayes token 'ash' => 0.00162537764350453 > debug: bayes token 'PDA' => 0.00167601246105919 > debug: bayes token 'UD:slashdot.org' => 0.00167601246105919 > debug: bayes token 'obsession' => 0.00198523985239852 > debug: bayes token 'intersection' => 0.00206130268199234 > debug: bayes token 'Layer' => 0.00232900432900433 > debug: bayes token 'distinctive' => 0.00267661691542289 > debug: bayes token 'separates' => 0.00281675392670157 > debug: bayes token 'UD:quicktopic.com' => 0.00281675392670157 > debug: bayes token 'U*dawson' => 0.0033416149068323 > debug: bayes token 'H*F:D*world.std.com' => 0.00664197530864198 > debug: bayes token 'H*F:D*std.com' => 0.00664197530864198 > debug: bayes token 'www.pgp.com' => 0.00881967213114754 > debug: bayes token 'UD:pgp.com' => 0.00881967213114754 > debug: bayes token 'H*m:192' => 0.0104581626770632 > debug: bayes token 'examples' => 0.0123431642679307 > debug: bayes token 'Log' => 0.0130133209114604 > debug: bayes token 'behaviors' => 0.0131219512195122 > debug: bayes token '2,000' => 0.0131219512195122 > debug: bayes token 'Hail' => 0.0131219512195122 > debug: bayes token 'SIGNED' => 0.0134839529349941 > debug: bayes token 'immoral' => 0.0173548387096774 > debug: bayes token 'aggregator' => 0.0173548387096774 > debug: bayes token 'subscribe' => 0.0214775262438607 > debug: bayes token 'UD:shtml' => 0.02559193319822 > debug: bayes token 'HTo:D*std.com' => 0.0256190476190476 > debug: bayes token 'HTo:D*world.std.com' => 0.0256190476190476 > debug: bayes token 'H*F:U*dawson' => 0.0256190476190476 > debug: bayes token 'UnBlinking' => 0.0256190476190476 > debug: bayes token 'unmatched' => 0.0256190476190476 > debug: bayes token 'H*m:193' => 0.0256190476190476 > debug: bayes token 'sk:www.sit' => 0.0256190476190476 > debug: bayes token 'Scout' => 0.0256190476190476 > debug: bayes token 'SIGNATURE' => 0.0257894126485889 > debug: bayes token 'culture' => 0.0272021597517014 > debug: bayes token 'N:N.N.N' => 0.0272793722027467 > debug: bayes token 'Gary' => 0.0320014392974647 > debug: bayes token 'PGP' => 0.0358753189283018 > debug: bayes token 'HPrecedence:list' => 0.037141126102354 > debug: bayes token 'separate' => 0.039440168771582 > debug: bayes token 'topical' => 0.958 > debug: bayes token 'ping' => 0.0451277464637061 > debug: bayes token 'ISSN' => 0.0489090909090909 > debug: bayes token 'UD:rdf' => 0.0489090909090909 > debug: bayes token 'pursues' => 0.0489090909090909 > debug: bayes token 'stock's' => 0.0489090909090909 > debug: bayes token 'resuming' => 0.0489090909090909 > debug: bayes token 'excise' => 0.0489090909090909 > debug: bayes token 'D*tbtf.com' => 0.0489090909090909 > debug: bayes token 'H*r:world.std.com' => 0.0489090909090909 > debug: bayes token 'comment' => 0.0539053222173553 > debug: bayes token 'BEGIN' => 0.0556296837236107 > debug: bayes token 'runs' => 0.0561664508720611 > debug: bayes token 'morning' => 0.0640287802717383 > debug: bayes token 'forum' => 0.0645257315925537 > debug: bayes token 'blog' => 0.0670958180925054 > debug: bayes token 'sk:_______' => 0.0675631545686896 > debug: bayes token 'prohibited' => 0.0712432072884898 > debug: bayes token 'Copy' => 0.925232790783064 > debug: bayes token 'Sun' => 0.0760314122684923 > debug: bayes token 'affect' => 0.0785519839660173 > debug: bayes token 'archive' => 0.0795060650813824 > debug: bayes token 'compelling' => 0.0863543258179187 > debug: bayes token 'subscription' => 0.0998350082839608 > debug: bayes token 'H*m:102' => 0.105326764576386 > debug: bayes token 'dead' => 0.10852287665769 > debug: bayes token 'H*c:plain' => 0.109216762405638 > debug: bayes token 'issue' => 0.118565944969698 > debug: bayes token 'utterly' => 0.121118249899843 > debug: bayes token 'H*c:us-ascii' => 0.124295594576641 > debug: bayes token 'END' => 0.125704773987869 > debug: bayes token 'file' => 0.131541664289693 > debug: bayes token 'writing' => 0.133370455069644 > debug: bayes token 'sources' => 0.141737437136924 > debug: bayes token 'Version' => 0.142590596342089 > debug: bayes token 'promises' => 0.144716859485728 > debug: bayes token 'UD:org' => 0.146916282340951 > debug: bayes token 'consider' => 0.150153742517993 > > All of those are quite low, so combined they result in a score of: > > debug: bayes: score = 0 > > - --j. > > >Return-Path: <[EMAIL PROTECTED]> > >Delivered-To: [EMAIL PROTECTED] > >Received: (qmail 38912 invoked by uid 19047); 17 Oct 2003 07:23:43 -0000 > >Received: from unknown (HELO mpls-qmqp-02.inet.qwest.net) ([63.231.195.113]) > >(envelope-sender <[EMAIL PROTECTED]>) > > by 192.220.74.103 (qmail-ldap-1.03) with SMTP > > for <[EMAIL PROTECTED]>; 17 Oct 2003 07:23:43 -0000 > >Received: (qmail 73098 invoked by uid 0); 17 Oct 2003 06:40:41 -0000 > >Received: from mpls-pop-02.inet.qwest.net (63.231.195.2) > > by mpls-qmqp-02.inet.qwest.net with QMQP; 17 Oct 2003 06:40:41 -0000 > >Received: from ddslppp71.tcsn.uswest.net (HELO neeeeeee) (216.161.150.71) > > by mpls-pop-02.inet.qwest.net with SMTP; 17 Oct 2003 07:23:42 -0000 > >Date: Fri, 17 Oct 2003 00:28:01 -0700 > >Message-ID: <[EMAIL PROTECTED]> > >From: "Ben Wing" <[EMAIL PROTECTED]> > >To: "Ben Wing" <[EMAIL PROTECTED]> > >Subject: test test > >MIME-Version: 1.0 > >Content-Type: text/plain; > > charset="iso-8859-1" > >Content-Transfer-Encoding: 7bit > >X-Priority: 3 > >X-MSMail-Priority: Normal > >X-Mailer: Microsoft Outlook Express 6.00.2800.1158 > >X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 > >X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on 666.com > >X-Spam-Report: > > * 2.1 BAYES_90 BODY: Bayesian spam probability is 90 to 99% > > * [score: 0.9573] > >X-Spam-Status: No, hits=2.1 required=5.0 tests=BAYES_90 autolearn=ham > > version=2.60 > >X-Spam-Level: ** > >Status: > > > > > >----- Original Message ----- > >From: "Justin Mason" <[EMAIL PROTECTED]> > >To: "Martin Radford" <[EMAIL PROTECTED]> > >Cc: "Ben Wing" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> > >Sent: Sunday, October 19, 2003 3:41 PM > >Subject: Re: [SAtalk] strange behavior of Bayesian analyzer in SA 2.6 > > > > > >> -----BEGIN PGP SIGNED MESSAGE----- > >> Hash: SHA1 > >> > >> > >> Martin Radford writes: > >> >At Fri Oct 17 21:17:54 2003, Ben Wing wrote: > >> >> > >> >> hi. i just upgraded from 2.53 to 2.6 and i'm seeing something > >> >> rather odd about the Bayesian results: nearly every one is almost > >> >> exactly 0%, 50%, or 100%! it's almost as if it's applying an > >> >> extreme rounding function to the actual result. now, these are > >> >> turning out so far to be accurate, but i'm still highly distrustful > >> >> of such "perfect" results. this clustering happened the instant i > >> >> upgraded spam assassin -- in fact, one of the first messages i sent > >> >> after this > >> > > >> >I found this when I first upgraded to one of the pre-releases of 2.60. > >> >The developers said that this was due to changing the method of > >> >calculating the Bayes score. The newer code is much more likely to > >> >cluster around 0, 0.5, and 1. I have seen a few messages outside > >> >those cluster areas, but not too many. I've not seen any FPs, though. > >> > >> If you're seeing FPs, it's strongly indicative of mistakes in the > >> training data -- spam trained as ham or vice-versa, I'm afraid ;) > >> > >> - --j. > >> -----BEGIN PGP SIGNATURE----- > >> Version: GnuPG v1.2.2 (GNU/Linux) > >> Comment: Exmh CVS > >> > >> iD8DBQE/kxMjQTcbUG5Y7woRAgnyAJ9GaPCdey9oNgAT/y2ZiJkahjPuIgCgoxAC > >> vPt8S4fWAKrhfkvq++O4BmI= > >> =JWtb > >> -----END PGP SIGNATURE----- > >> > >> > >> > >> ------------------------------------------------------- > >> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > >> The Event For Linux Datacenter Solutions & Strategies in The Enterprise > >> Linux in the Boardroom; in the Front Office; & in the Server Room > >> http://www.enterpriselinuxforum.com > >> _______________________________________________ > >> Spamassassin-talk mailing list > >> [EMAIL PROTECTED] > >> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.2 (GNU/Linux) > Comment: Exmh CVS > > iD8DBQE/k12fQTcbUG5Y7woRAiTIAJ4kUN/aAIP81n1NvVqmVmURTdwVkgCfTaq+ > ibaeU0UkxgYBEgokyZlvU1Y= > =dHCc > -----END PGP SIGNATURE----- > > > > ------------------------------------------------------- > This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo > The Event For Linux Datacenter Solutions & Strategies in The Enterprise > Linux in the Boardroom; in the Front Office; & in the Server Room > http://www.enterpriselinuxforum.com > _______________________________________________ > Spamassassin-talk mailing list > [EMAIL PROTECTED] > https://lists.sourceforge.net/lists/listinfo/spamassassin-talk ------------------------------------------------------- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk