Re: [SAtalk] strange behavior of Bayesian analyzer in SA 2.6

Justin Mason Sun, 19 Oct 2003 22:20:40 -0700

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


"Ben Wing" writes:
>well, i get false positives with an empty body ...

Yep, that's a pretty serious sign -- the header data in that message (sent
from yourself, to yourself, via your own relays, right?) is being
recognised as spam.

Try using "spamassassin -D -Lt < msg > out" and watch the bayes tokens
and their values on stderr.  e.g. here's an example from sample-nonspam.txt
for me:

debug: bayes token 'N:NNNN-NN-NN' => 1.60066644848413e-05
debug: bayes token 'organizations' => 0.000215113954418233
debug: bayes token 'rarely' => 0.00032196289646918
debug: bayes token 'ICANN' => 0.000451721242653233
debug: bayes token 'deeper' => 0.000471516213847502
debug: bayes token 'commentary' => 0.000647412755716005
debug: bayes token 'depth' => 0.000680151706700379
debug: bayes token '1994' => 0.000726045883940621
debug: bayes token 'voices' => 0.000756680731364276
debug: bayes token 'Dawson' => 0.000880523731587561
debug: bayes token 'Host' => 0.000880523731587561
debug: bayes token 'roots' => 0.000942206654991244
debug: bayes token 'deceptive' => 0.00114225053078556
debug: bayes token 'Topic' => 0.00124825986078886
debug: bayes token 'columnists' => 0.00124825986078886
debug: bayes token 'Sitescooper' => 0.00127790973871734
debug: bayes token 'ash' => 0.00162537764350453
debug: bayes token 'PDA' => 0.00167601246105919
debug: bayes token 'UD:slashdot.org' => 0.00167601246105919
debug: bayes token 'obsession' => 0.00198523985239852
debug: bayes token 'intersection' => 0.00206130268199234
debug: bayes token 'Layer' => 0.00232900432900433
debug: bayes token 'distinctive' => 0.00267661691542289
debug: bayes token 'separates' => 0.00281675392670157
debug: bayes token 'UD:quicktopic.com' => 0.00281675392670157
debug: bayes token 'U*dawson' => 0.0033416149068323
debug: bayes token 'H*F:D*world.std.com' => 0.00664197530864198
debug: bayes token 'H*F:D*std.com' => 0.00664197530864198
debug: bayes token 'www.pgp.com' => 0.00881967213114754
debug: bayes token 'UD:pgp.com' => 0.00881967213114754
debug: bayes token 'H*m:192' => 0.0104581626770632
debug: bayes token 'examples' => 0.0123431642679307
debug: bayes token 'Log' => 0.0130133209114604
debug: bayes token 'behaviors' => 0.0131219512195122
debug: bayes token '2,000' => 0.0131219512195122
debug: bayes token 'Hail' => 0.0131219512195122
debug: bayes token 'SIGNED' => 0.0134839529349941
debug: bayes token 'immoral' => 0.0173548387096774
debug: bayes token 'aggregator' => 0.0173548387096774
debug: bayes token 'subscribe' => 0.0214775262438607
debug: bayes token 'UD:shtml' => 0.02559193319822
debug: bayes token 'HTo:D*std.com' => 0.0256190476190476
debug: bayes token 'HTo:D*world.std.com' => 0.0256190476190476
debug: bayes token 'H*F:U*dawson' => 0.0256190476190476
debug: bayes token 'UnBlinking' => 0.0256190476190476
debug: bayes token 'unmatched' => 0.0256190476190476
debug: bayes token 'H*m:193' => 0.0256190476190476
debug: bayes token 'sk:www.sit' => 0.0256190476190476
debug: bayes token 'Scout' => 0.0256190476190476
debug: bayes token 'SIGNATURE' => 0.0257894126485889
debug: bayes token 'culture' => 0.0272021597517014
debug: bayes token 'N:N.N.N' => 0.0272793722027467
debug: bayes token 'Gary' => 0.0320014392974647
debug: bayes token 'PGP' => 0.0358753189283018
debug: bayes token 'HPrecedence:list' => 0.037141126102354
debug: bayes token 'separate' => 0.039440168771582
debug: bayes token 'topical' => 0.958
debug: bayes token 'ping' => 0.0451277464637061
debug: bayes token 'ISSN' => 0.0489090909090909
debug: bayes token 'UD:rdf' => 0.0489090909090909
debug: bayes token 'pursues' => 0.0489090909090909
debug: bayes token 'stock's' => 0.0489090909090909
debug: bayes token 'resuming' => 0.0489090909090909
debug: bayes token 'excise' => 0.0489090909090909
debug: bayes token 'D*tbtf.com' => 0.0489090909090909
debug: bayes token 'H*r:world.std.com' => 0.0489090909090909
debug: bayes token 'comment' => 0.0539053222173553
debug: bayes token 'BEGIN' => 0.0556296837236107
debug: bayes token 'runs' => 0.0561664508720611
debug: bayes token 'morning' => 0.0640287802717383
debug: bayes token 'forum' => 0.0645257315925537
debug: bayes token 'blog' => 0.0670958180925054
debug: bayes token 'sk:_______' => 0.0675631545686896
debug: bayes token 'prohibited' => 0.0712432072884898
debug: bayes token 'Copy' => 0.925232790783064
debug: bayes token 'Sun' => 0.0760314122684923
debug: bayes token 'affect' => 0.0785519839660173
debug: bayes token 'archive' => 0.0795060650813824
debug: bayes token 'compelling' => 0.0863543258179187
debug: bayes token 'subscription' => 0.0998350082839608
debug: bayes token 'H*m:102' => 0.105326764576386
debug: bayes token 'dead' => 0.10852287665769
debug: bayes token 'H*c:plain' => 0.109216762405638
debug: bayes token 'issue' => 0.118565944969698
debug: bayes token 'utterly' => 0.121118249899843
debug: bayes token 'H*c:us-ascii' => 0.124295594576641
debug: bayes token 'END' => 0.125704773987869
debug: bayes token 'file' => 0.131541664289693
debug: bayes token 'writing' => 0.133370455069644
debug: bayes token 'sources' => 0.141737437136924
debug: bayes token 'Version' => 0.142590596342089
debug: bayes token 'promises' => 0.144716859485728
debug: bayes token 'UD:org' => 0.146916282340951
debug: bayes token 'consider' => 0.150153742517993

All of those are quite low, so combined they result in a score of:

debug: bayes: score = 0

- --j.

>Return-Path: <[EMAIL PROTECTED]>
>Delivered-To: [EMAIL PROTECTED]
>Received: (qmail 38912 invoked by uid 19047); 17 Oct 2003 07:23:43 -0000
>Received: from unknown (HELO mpls-qmqp-02.inet.qwest.net) ([63.231.195.113])
>(envelope-sender <[EMAIL PROTECTED]>)
>          by 192.220.74.103 (qmail-ldap-1.03) with SMTP
>          for <[EMAIL PROTECTED]>; 17 Oct 2003 07:23:43 -0000
>Received: (qmail 73098 invoked by uid 0); 17 Oct 2003 06:40:41 -0000
>Received: from mpls-pop-02.inet.qwest.net (63.231.195.2)
>  by mpls-qmqp-02.inet.qwest.net with QMQP; 17 Oct 2003 06:40:41 -0000
>Received: from ddslppp71.tcsn.uswest.net (HELO neeeeeee) (216.161.150.71)
>  by mpls-pop-02.inet.qwest.net with SMTP; 17 Oct 2003 07:23:42 -0000
>Date: Fri, 17 Oct 2003 00:28:01 -0700
>Message-ID: <[EMAIL PROTECTED]>
>From: "Ben Wing" <[EMAIL PROTECTED]>
>To: "Ben Wing" <[EMAIL PROTECTED]>
>Subject: test test
>MIME-Version: 1.0
>Content-Type: text/plain;
> charset="iso-8859-1"
>Content-Transfer-Encoding: 7bit
>X-Priority: 3
>X-MSMail-Priority: Normal
>X-Mailer: Microsoft Outlook Express 6.00.2800.1158
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
>X-Spam-Checker-Version: SpamAssassin 2.60 (1.212-2003-09-23-exp) on 666.com
>X-Spam-Report:
> *  2.1 BAYES_90 BODY: Bayesian spam probability is 90 to 99%
> *      [score: 0.9573]
>X-Spam-Status: No, hits=2.1 required=5.0 tests=BAYES_90 autolearn=ham
> version=2.60
>X-Spam-Level: **
>Status:
>
>
>----- Original Message ----- 
>From: "Justin Mason" <[EMAIL PROTECTED]>
>To: "Martin Radford" <[EMAIL PROTECTED]>
>Cc: "Ben Wing" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
>Sent: Sunday, October 19, 2003 3:41 PM
>Subject: Re: [SAtalk] strange behavior of Bayesian analyzer in SA 2.6
>
>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>>
>> Martin Radford writes:
>> >At Fri Oct 17 21:17:54 2003, Ben Wing wrote:
>> >>
>> >> hi.  i just upgraded from 2.53 to 2.6 and i'm seeing something
>> >> rather odd about the Bayesian results: nearly every one is almost
>> >> exactly 0%, 50%, or 100%!  it's almost as if it's applying an
>> >> extreme rounding function to the actual result.  now, these are
>> >> turning out so far to be accurate, but i'm still highly distrustful
>> >> of such "perfect" results.  this clustering happened the instant i
>> >> upgraded spam assassin -- in fact, one of the first messages i sent
>> >> after this
>> >
>> >I found this when I first upgraded to one of the pre-releases of 2.60.
>> >The developers said that this was due to changing the method of
>> >calculating the Bayes score.  The newer code is much more likely to
>> >cluster around 0, 0.5, and 1.  I have seen a few messages outside
>> >those cluster areas, but not too many.  I've not seen any FPs, though.
>>
>> If you're seeing FPs, it's strongly indicative of mistakes in the
>> training data -- spam trained as ham or vice-versa, I'm afraid ;)
>>
>> - --j.
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.2.2 (GNU/Linux)
>> Comment: Exmh CVS
>>
>> iD8DBQE/kxMjQTcbUG5Y7woRAgnyAJ9GaPCdey9oNgAT/y2ZiJkahjPuIgCgoxAC
>> vPt8S4fWAKrhfkvq++O4BmI=
>> =JWtb
>> -----END PGP SIGNATURE-----
>>
>>
>>
>> -------------------------------------------------------
>> This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo
>> The Event For Linux Datacenter Solutions & Strategies in The Enterprise
>> Linux in the Boardroom; in the Front Office; & in the Server Room
>> http://www.enterpriselinuxforum.com
>> _______________________________________________
>> Spamassassin-talk mailing list
>> [EMAIL PROTECTED]
>> https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
>
>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Exmh CVS

iD8DBQE/k12fQTcbUG5Y7woRAiTIAJ4kUN/aAIP81n1NvVqmVmURTdwVkgCfTaq+
ibaeU0UkxgYBEgokyZlvU1Y=
=dHCc
-----END PGP SIGNATURE-----



-------------------------------------------------------
This SF.net email sponsored by: Enterprise Linux Forum Conference & Expo
The Event For Linux Datacenter Solutions & Strategies in The Enterprise 
Linux in the Boardroom; in the Front Office; & in the Server Room 
http://www.enterpriselinuxforum.com
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Re: [SAtalk] strange behavior of Bayesian analyzer in SA 2.6

Reply via email to