Sent to fitug-debate (actually a nontechnical discussion list) and to spamassassin-talk. Reply-To set to me personally.
Please adjust accordingly. A corpus of spam, freshly collected: $ ls -l ~/Mail/OLD total 96988 -rw------- 1 kris kiel 1676771 2003-09-24 23:59 spammed-probable.01.gz -rw------- 1 kris kiel 2510905 2003-09-23 23:48 spammed-probable.02.gz -rw------- 1 kris kiel 1863673 2003-09-22 23:57 spammed-probable.03.gz -rw------- 1 kris kiel 1014158 2003-09-21 23:54 spammed-probable.04.gz -rw------- 1 kris kiel 617841 2003-09-20 23:16 spammed-probable.05.gz -rw------- 1 kris kiel 2861005 2003-09-20 06:13 spammed-probable.06.gz -rw------- 1 kris kiel 108846 2003-09-17 21:07 spammed-probable.07.gz -rw------- 1 kris kiel 12130 2003-09-16 19:48 spammed-probable.08.gz -rw------- 1 kris kiel 14029 2003-09-15 21:09 spammed-probable.09.gz -rw------- 1 kris kiel 35414 2003-09-15 01:51 spammed-probable.10.gz -rw------- 1 kris kiel 10032896 2003-09-24 23:58 spammed-sure.01.gz -rw------- 1 kris kiel 18746508 2003-09-23 23:58 spammed-sure.02.gz -rw------- 1 kris kiel 17935355 2003-09-22 23:57 spammed-sure.03.gz -rw------- 1 kris kiel 13535730 2003-09-21 23:48 spammed-sure.04.gz -rw------- 1 kris kiel 11984834 2003-09-20 23:57 spammed-sure.05.gz -rw------- 1 kris kiel 13597743 2003-09-20 08:40 spammed-sure.06.gz -rw------- 1 kris kiel 474242 2003-09-17 23:56 spammed-sure.07.gz -rw------- 1 kris kiel 665272 2003-09-16 23:59 spammed-sure.08.gz -rw------- 1 kris kiel 719339 2003-09-15 23:48 spammed-sure.09.gz -rw------- 1 kris kiel 584819 2003-09-15 06:42 spammed-sure.10.gz Who sent me spam? Find out in perl: $ cat ~/Mail/p.pl #! /usr/bin/perl -- $hostname = "p15104972"; while (<>) { chomp; if (/^\s+/) { $line .= $_; } else { $line = $_; } if ($line =~ /^From /) { $state = "newmail"; } if ($line =~ /Content-Description: original message before SpamAssassin/) { $state = "spammail"; } if ($line =~ /^$/ and $state eq "newmail") { $state = "body"; } if ($line =~ /^$/ and $state eq "spammail") { $state = "newmail"; } if ($state eq "newmail" and $line =~ /^Received:/) { $line =~ /\[(.*?)\].*by\s+$hostname/; print "$1\n" if ($1 ne "" and $1 ne "127.0.0.1"); } } Applied to my corpus above: $ cd Mail/OLD $ gzip -dc *gz | ~/Mail/p.pl > log $ wc -l ~/Mail/OLD/log 6614 /home/kris/Mail/OLD/log $ sort ~/Mail/OLD/log | uniq -c | sort -rn > ~/Mail/OLD/log2 $ wc -l ~/Mail/OLD/log2 1238 /home/kris/Mail/OLD/log2 $ head -10 ~/Mail/OLD/log2 980 195.244.243.1 532 193.98.110.1 498 193.158.124.58 196 193.110.157.89 56 24.201.245.36 40 209.225.8.34 40 204.127.202.56 34 216.148.227.85 34 209.225.8.29 32 204.127.202.55 These are my secondaries, an old mail address [EMAIL PROTECTED], which I have not been using for years, and the freeswan mailing list, which I can really live without. $ awk '$1 > 8 { print $2 }' ~/Mail/OLD/log2| xargs -i dig -x {} | grep PTR > ~/Mail/OLD/log3 This finds 64 machines that have me sent more than 8 spams, 58 of which resolve reverse. $ perl -ne 'split; print join(".", reverse split(/\./, $_[4])), "\n";' ~/Mail/OLD/log3 | sort > ~/Mail/OLD/log4 $ cat ~/Mail/OLD/log4 au.net.iprimus.syd.smtp01 be.skynet.ferengi be.skynet.gallantin be.skynet.kira be.skynet.sarek be.skynet.sojef ca.videotron.relais com.btconnect.dswu26 com.btinternet.protactinium com.cbeyond.atl.smtp com.latinmail.smtp com.ntlworld.mta02-svc com.ntlworld.mta06-svc com.rr.nyroc.ms-smtp-02 de.netuse.ns1 de.netuse.nuki de.netzservice.hh.proxy de.sczn.secondary de.toppoint.archer it.tin.vsmtp1 it.tuttopmi.fep01 lt.takas.mail-src net.bellsouth.mail.imf16aec net.bellsouth.mail.imf18aec net.bellsouth.mail.imf19aec net.bellsouth.mail.imf20aec net.bellsouth.mail.imf22aec net.bellsouth.mail.imf24aec net.bellsouth.mail.imf25aec net.charter.cluster1.remt19 net.charter.cluster1.remt20 net.charter.cluster1.remt21 net.charter.cluster1.remt22 net.charter.cluster1.remt23 net.charter.cluster1.remt24 net.charter.cluster1.remt25 net.charter.cluster1.remt26 net.charter.cluster1.remt27 net.charter.cluster1.remt28 net.charter.cluster1.remt29 net.comcast.rwcrmhc11 net.comcast.rwcrmhc12 net.comcast.rwcrmhc13 net.comcast.sccrmhc11 net.comcast.sccrmhc12 net.comcast.sccrmhc13 net.entelchile.ismtp5 net.entelchile.mail.real1.test_web_temp net.libertysurf.mail net.qwest.inet.mpls-qmqp-02 net.surewest.smtp2 net.telus.defout net.telus.outbound02 net.telus.outbound04 org.freeswan.mj2 pt.telepac.mail.fep01-svc pt.telepac.mail.fep02-svc ro.rdsnet.mail3 The de-Addresses are just the secondaries of mine and the Toppoint.de-address. The rest is a surprisingly short list when you look at just the domains. Perhaps SpamAssassin should really maintain a list of IP numbers which have sent detected spam within the last n hours, and I should build a sendmail access table from that every night. If you repeat that analysis on your corpus, can you reproduce my results? Thought for improvement: What happens if you take only the domain names of the above hosts, resolve their MXes and list their mail servers - will that result in a better blocking closure? Kristian ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ Spamassassin-talk mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/spamassassin-talk