Ok, I grabbed the script I wrote. Quick and dirty but seems to save a LOT of time. The idea is simple.
Basically grep your spamtrap for all lines that have 'http://' in them. You lose a small percent because of line breaks but they repeat so much it doesn't matter. Next you just strip away all the garbage until you are left with either an IP or a FQDN. This little script will do about 80% of the work. But you still need to clean by hand, but it takes minutes after this script. Evilrules script: (Yes I named the file evilrules :P, the first 'rm -f evil' line cleans up the file from the previous run.) usage: evilrules spamtrapfile --copy here-- #!/bin/sh rm -f evil cat $1 | grep 'http://' > evil sed -e 's/^.*http\:\/\///i' < evil | cat >> evil2 rm -f evil sed -e 's/[\/].*$//i' < evil2 | cat >> evil rm -f evil2 sed -e 's/^.*@//i' < evil | cat >> evil2 rm -f evil sed -e 's/=.*$//i' < evil2 | cat >> evil rm -f evil2 sed -e 's/".*$//i' < evil | cat >> evil2 rm -f evil sed -e 's/^.*\#$//i' < evil2 | cat >> evil rm -f evil2 sort evil > evil2 rm -f evil uniq evil2 > evil rm -f evil2 echo Please edit by hand the 'evil' file. echo then run 'reg2rule.pl -b evil > somefile.cf' --paste here-- That will clean a lot. However for some reason I couldn't get it to strip the '&','%', and '#' correctly. the regex seems a little different. However these are usually obfuscated urls, so it could be nice to leave in, BUT I take them out as Dave's code in reg2rule.pl will not properly escape them. It has to be gone over by hand anyway. Might be FPs in there, and you also see a few incomplete repeats: walmart.co walmart.c ect.... Just a limitation on uniq, because the FQDNs range in length. Then that's it. Like the echo said, just run 'reg2rule.pl -b evil > somefile.cf' after cleaning the evil file and BLAMO! It took you 5 minutes to generate tons of evil domain rules. You could also use other otions as well. I use 'reg2rule.pl -b -dEvil_date -s1.5 evil > EvilDATE.cf' , where date is the date I ran it on. So I know the last time I did it. Again http://www.wot.no-ip.com/Projects/Blocklist/reg2rule.pl it uses STDIN and STDOUT. run reg2rule.pl -h for usage (Note: the version I have says the default score is 1.0, but it defaults to 0.5, I may have a beta version. But simple to change that code. ) I can't thank Yorkshie Dave enough for writing this script. Saves a TON of time and hits like a rabid pitbull. No FPs unless you were alseep when you went over the evil file. Thia is a real winner in my book! Chris Santerre System Admin and SA Custom Rules Emporium keeper http://www.merchantsoverseas.com/wwwroot/gorilla/sa_rules.htm "A little nonsense now and then, is relished by the wisest men." - Willy Wonka -----Original Message----- From: myname [mailto:[EMAIL PROTECTED] Sent: Friday, August 22, 2003 6:18 AM To: [EMAIL PROTECTED] Subject: [SAtalk] How To generate a spammers domain list Hello all, Is there a way I can generate a list of domains of senders which are marked as spam by spamassassin Thanks Ram ------------------------------------------------------- This SF.net email is sponsored by: VM Ware With VMware you can run multiple operating systems on a single machine. WITHOUT REBOOTING! Mix Linux / Windows / Novell virtual machines at the same time. Free trial click here:http://www.vmware.com/wl/offer/358/0 _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk