TopPost:

I for one believe this would be a VERY good idea as it should then be able
to customize the rules to a set that apply more to each of us. Would it not?

I've noticed several have mentioned that the bigevil rules makes up a lot
of the "hits" whereas that is not the case for my domains and they are very
little of them.

At 05:41 PM 1.18.2004 +0100, PieterB wrote:
>Hi,
>
>I have an idea, similar to Scott A Crosby's datamining application.
>I didn't use a datamining/analysis program, but used the Bayes
>database. For example if you use:
>
>       sa-learn --dump all | grep "^0\.999 *[0-9]*  *0 [0-9]*"
>
>sa-learn will show all Bayes entries which are clearly a sign of spam
>(score=0.999, zero occurences in ham). After manuallycleaning up the
>list for non URL's, I have lines like:
>
>0.999         36          0 1073851236  www.10cial.biz
>0.999         49          0 1074054013  www.tupit.info
>0.999         58          0 1074283556  U*www.treasurecity.biz.in
>0.999         38          0 1073851236  D*naturalgrowth.us
>0.999         48          0 1074371753  www.mytoyz.biz
>0.999         34          0 1073976168  N:www.hwyNNz.com
>0.999         35          0 1073769982  www.560000x.com
>0.999         36          0 1074416509  www.gowebrx.com
>0.999         36          0 1073841838  UD:2005hosting.com
>0.999         54          0 1074302451  UD:3001hosting.com
>0.999         34          0 1074301410  UD:getwebrx.com
>0.999         47          0 1074279713  UD:mytoyz.biz
>0.999         63          0 1074270837  UD:cashcome.net
>0.999         58          0 1074283556  UD:ktbxurnjlpe.ph
>0.999         38          0 1074111779  UD:whokz.info
>0.999         36          0 1074036850  UD:freeadultranch.com
>0.999         35          0 1073769982  UD:560000x.com
>0.999         85          0 1074304161  UD:herbalsforcheap.com
>0.999         45          0 1073719261  UD:mdpillsource.com
>0.999         39          0 1074148074  UD:net.tw
>0.999         31          0 1073802737  UD:2006hosting.com
>0.999         36          0 1074025244  UD:bestofthestarz.com
>0.999         38          0 1074133361  UD:ez-123hosting.com
>0.999         71          0 1074372616  UD:amyz.info
>0.999         34          0 1073976168  UD:hwy55z.com
>0.999         39          0 1074302451  UD:3002hosting.com
>0.999         49          0 1073888477  UD:e-hostzz.com
>0.999         73          0 1073871887  UD:kimo.com.tw
>0.999         36          0 1073851236  UD:10cial.biz
>0.999         89          0 1074193423  UD:tupit.info
>0.999         31          0 1074318551  UD:nepzzz.com
>
>I'm thinking of writing a script that can use this information and
>can filter the spam mbox to find the full URL patterns. These URL
>patterns can then be used to write custom rules, or to extend the
>bigevil ruleset.
>
>Some questions:
>- does this sound like a good idea?
>- is the source list of domains listed in bigevil available?
>- making it easier to contribute URL's for bigevil might increase
>  the number of false positives. How can this be prevented?
>  (e.g. using only 0.999 bayesscore, contributor should check that
>  the URL parts don't exist in ham, domainname must exist in DNS,
>   logging contributor, requiring example of spammail when contributing,
>   etc).
>
>Suggestions are welcome,
>Regards,
>Pieter
>
>-- 
>http://zwiki.org/PieterB
>
>
>-------------------------------------------------------
>The SF.Net email is sponsored by EclipseCon 2004
>Premiere Conference on Open Tools Development and Integration
>See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
>http://www.eclipsecon.org/osdn
>_______________________________________________
>Spamassassin-talk mailing list
>[EMAIL PROTECTED]
>https://lists.sourceforge.net/lists/listinfo/spamassassin-talk
>
>

Best regards,
Jack L. Stone,
Administrator

Sage American
http://www.sage-american.com
[EMAIL PROTECTED]


-------------------------------------------------------
The SF.Net email is sponsored by EclipseCon 2004
Premiere Conference on Open Tools Development and Integration
See the breadth of Eclipse activity. February 3-5 in Anaheim, CA.
http://www.eclipsecon.org/osdn
_______________________________________________
Spamassassin-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/spamassassin-talk

Reply via email to