>-----Original Message----- >From: Chr. von Stuckrad [mailto:[EMAIL PROTECTED] >Sent: Wednesday, September 15, 2004 5:41 AM >To: users@spamassassin.apache.org >Subject: Re: Phishing obfuscated url detection > > >On Wed, Sep 15, 2004 at 02:17:15AM -0700, Jeff Chan wrote: >> On Wednesday, September 15, 2004, 1:38:30 AM, Julian Field wrote: >> > ... Is it possible to detect where >> > <A HREF="foo">bar</A> >> > and foo and bar are unrelated domains? >> >> That could be a good idea for a rule. It would be nice if it >> could be determined canonically, without actually resolving >> either location. > >IMHO this is near impossible. > >The trivial String Back-reference check can never >determine whether 'foo' and 'bar' are un*related*. >Just whether the text *in* the HREF is unequal to >the text shown to the user highlighted as a link. > >In all cases, where the HREF is only 'semantically' >*related* to the following link text, a string check >will assume 'spam', while 'spam/scam' will sooner or >later just obfuscate the text portion by javascript >or encoding tricks. > >e.g.: <a HREF="www.eplus.de">imail.de</a> > is 'related' (even if 'mis'constructed) > because you find access to the 'imail.de' > Mails via the 'www.eplus.de' webserver. > > Also many Mail-Texts of the kind > ... to reach FOO click <a HREF="somedomain">here</a> > would be very difficult to 'analyze correctly'. > >So I believe it to be an interesting idea for AI specialists, >but alas not for inclusion in spamassassin as it works now. > >Stucki (postmaster at mi.fu-berlin.de using spamassassin 2.63)
I have to agree with Stucki. What about all those image caching services? They would all get tagged, which is a large amount of legit newsletters. It was a good idea, so don't feel bad. --Chris