In the last few months many personal website owners (such as myself) have found that spammers have been using their domain names to masquerade as valid users to send spam, normally in the form of:
[EMAIL PROTECTED] This new tactic has an annoying problem, which is that the bounced emails end up back with the postmaster at the innocent persons domain. This is normally the first time that the domain owner realises that there is a problem. I am one of those people and currently have nearly 3 thousand bounces in my catch all POP3 box. Solutions I can see to this are one of two things: 1) Delete the email as it arrives and ignore it. Realise that the domain name might end up being blacklisted as a spammer's domain and be done with it, or 2) Fight back! All of the bounced emails contain at least one URI to a spammer website, in a effort to sell "Cheap Meds" or "Faked Rolexes" or similar. The format is usually something like this: http://www.sickmate.info/?a2fb9e415e74beS9cdee919d78Sa6a7d The query part of the URI I believe provides the reference between the email address and the visit. Hence if you visit the website with this link, your email address is saved in a database as one that is a) valid, and b) dumb enough to visit the website. The spammers rely on the fact that some people will visit this website and buy from them. In fact, Q.E.D., some people must buy from these websites via spam, otherwise the spammers would have given up a long time ago*. So, as a web programmer and someone who specialises in getting good results on Google, I realised that I could simply post every spammer website on a Google optimized page, which if searched for on Google would return something like: "WARNING: DO NOT BUY FROM THIS WEBSITE. THE SPAMMER IS A RUSSIAN MAFIA CROOK WHO WILL STEAL YOUR MONEY." ...Or something equally obvious along those lines. In this way we attack the websites that are the link between the spam and the money. The real necessity therefore is to: a) Process the received bounced messages quickly and list them on the website without delay. b) Prevent the spammer using the domain The answer to (b) I cannot find. I thought SPF might help, but it is not a panacea. The answer to (a) I need help with! So, I'm on Windows XP. I use Outlook 2002 and I already have the excellent (and FREE) SpamBayes Outlook add-in** that blocks spam and loves ham. Spambayes is open source and as such I can modify the source code, recompile it and install it afresh. However, the problem is that I'm not a python programmer, and I'm not sure where to start. This is what I want to do, so if anyone would like to direct me, I'd be grateful: 1) Add a menu option to the SpamBayes add-in - "Post Spam Site to Web Service". I'm guessing I can add a new line to the addin.py such as below, but how do I sink the event? self._AddControl(popup, constants.msoControlButton, ButtonEvent, (PostSpamSite, self.manager,), Caption="Post Spam Site to Web Service", Enabled=True, Visible=True, Tag = "SpamBayesCommand.PostSpam") 2) Add a configuration setting, so that the web service location can be set. I'm guessing this is in config.py. Pointers welcome. 4) Add a function to extract all links in a block of text. I have written a good one of these for .NET, but I'm not sure if, or how it would work in Python: string hrefPattern = @"(?<all>(?:(?<protocol>http(?:s?)|ftp)(?:\:\/\/))" + @"(?<domain>[^/\r\n\:]+)?" + @"(?<port>\:\d+)?" + @"(?<path>[^\?#]*)?" + @"(?<qrystr>\?\w*)?" + @"(?<bookmark>\#\w*)?)"; // Regular Expression Regex hrefRegex = new Regex(hrefPattern, RegexOptions.Singleline | RegexOptions.IgnorePatternWhitespace | RegexOptions.IgnoreCase); Any help with this welcome. Do I need a specific Python regex library or can I use the .NET regex library in Python? 4) Connect to web service using SOAP and consume that service. Service will provide: a) Authorise (username, password) - returns access b) Submit (domain) - returns success or failure Can I use SOAPpy for this? Can anyone give me any examples or point me in the right direction? 5) Provide another option in the add in to "Scan folder and Post Spam Sites to Web Service", in the same manner as "Filter messages" works now. Can I use filter.py as a model to work from? Summary ================================= I am not a Python programmer per se but have no problem with getting my hands dirty. I have already got the basics of this working as a Windows.Forms application, but running both that and Outlook together is daft. The Spambayes project already does the hard bit in classifying the spam, so it makes sense to hang off the back of it. Has anyone else had similar problems as me with these "phantom" email addresses being using by spammers and would like to work with me on this? Would anyone in the Spambayes team like to have a go at this, or point me in the right direction? Has anyone had a go at hacking around with the SpamBayes source code and knows what I should do? Basically any help is extremely welcome! Regards Ben * There must obviously be enough people out there who can't get an erection or dumb enough to munch pills to get slim rather than endure a bit of excercise. That being said, they will also trust their credit card to a bunch of crooks who even if they send you the pills, will probably sent you rat poison! ** Get the FREE Spambayes Outlook add-in from http://sourceforge.net/projects/spambayes -- http://mail.python.org/mailman/listinfo/python-list