TopPost: I for one believe this would be a VERY good idea as it should then be able to customize the rules to a set that apply more to each of us. Would it not?
I've noticed several have mentioned that the bigevil rules makes up a lot of the "hits" whereas that is not the case for my domains and they are very little of them. At 05:41 PM 1.18.2004 +0100, PieterB wrote: >Hi, > >I have an idea, similar to Scott A Crosby's datamining application. >I didn't use a datamining/analysis program, but used the Bayes >database. For example if you use: > > sa-learn --dump all | grep "^0\.999 *[0-9]* *0 [0-9]*" > >sa-learn will show all Bayes entries which are clearly a sign of spam >(score=0.999, zero occurences in ham). After manuallycleaning up the >list for non URL's, I have lines like: > >0.999 36 0 1073851236 www.10cial.biz >0.999 49 0 1074054013 www.tupit.info >0.999 58 0 1074283556 U*www.treasurecity.biz.in >0.999 38 0 1073851236 D*naturalgrowth.us >0.999 48 0 1074371753 www.mytoyz.biz >0.999 34 0 1073976168 N:www.hwyNNz.com >0.999 35 0 1073769982 www.560000x.com >0.999 36 0 1074416509 www.gowebrx.com >0.999 36 0 1073841838 UD:2005hosting.com >0.999 54 0 1074302451 UD:3001hosting.com >0.999 34 0 1074301410 UD:getwebrx.com >0.999 47 0 1074279713 UD:mytoyz.biz >0.999 63 0 1074270837 UD:cashcome.net >0.999 58 0 1074283556 UD:ktbxurnjlpe.ph >0.999 38 0 1074111779 UD:whokz.info >0.999 36 0 1074036850 UD:freeadultranch.com >0.999 35 0 1073769982 UD:560000x.com >0.999 85 0 1074304161 UD:herbalsforcheap.com >0.999 45 0 1073719261 UD:mdpillsource.com >0.999 39 0 1074148074 UD:net.tw >0.999 31 0 1073802737 UD:2006hosting.com >0.999 36 0 1074025244 UD:bestofthestarz.com >0.999 38 0 1074133361 UD:ez-123hosting.com >0.999 71 0 1074372616 UD:amyz.info >0.999 34 0 1073976168 UD:hwy55z.com >0.999 39 0 1074302451 UD:3002hosting.com >0.999 49 0 1073888477 UD:e-hostzz.com >0.999 73 0 1073871887 UD:kimo.com.tw >0.999 36 0 1073851236 UD:10cial.biz >0.999 89 0 1074193423 UD:tupit.info >0.999 31 0 1074318551 UD:nepzzz.com > >I'm thinking of writing a script that can use this information and >can filter the spam mbox to find the full URL patterns. These URL >patterns can then be used to write custom rules, or to extend the >bigevil ruleset. > >Some questions: >- does this sound like a good idea? >- is the source list of domains listed in bigevil available? >- making it easier to contribute URL's for bigevil might increase > the number of false positives. How can this be prevented? > (e.g. using only 0.999 bayesscore, contributor should check that > the URL parts don't exist in ham, domainname must exist in DNS, > logging contributor, requiring example of spammail when contributing, > etc). > >Suggestions are welcome, >Regards, >Pieter > >-- >http://zwiki.org/PieterB > > >------------------------------------------------------- >The SF.Net email is sponsored by EclipseCon 2004 >Premiere Conference on Open Tools Development and Integration >See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. >http://www.eclipsecon.org/osdn >_______________________________________________ >Spamassassin-talk mailing list >[EMAIL PROTECTED] >https://lists.sourceforge.net/lists/listinfo/spamassassin-talk > > Best regards, Jack L. Stone, Administrator Sage American http://www.sage-american.com [EMAIL PROTECTED] ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk