>-----Original Message-----
>From: Chr. von Stuckrad [mailto:[EMAIL PROTECTED]
>Sent: Wednesday, September 15, 2004 5:41 AM
>To: users@spamassassin.apache.org
>Subject: Re: Phishing obfuscated url detection
>
>
>On Wed, Sep 15, 2004 at 02:17:15AM -0700, Jeff Chan wrote:
>> On Wednesday, September 15, 2004, 1:38:30 AM, Julian Field wrote:
>> > ... Is it possible to detect where
>> > <A HREF="foo">bar</A>
>> > and foo and bar are unrelated domains?
>> 
>> That could be a good idea for a rule.  It would be nice if it
>> could be determined canonically, without actually resolving
>> either location.
>
>IMHO this is near impossible.
>
>The trivial String Back-reference check can never
>determine whether 'foo' and 'bar' are un*related*.
>Just whether the text *in* the HREF is unequal to
>the text shown to the user highlighted as a link.
>
>In all cases, where the HREF is only 'semantically'
>*related* to the following link text, a string check
>will assume 'spam', while 'spam/scam' will sooner or
>later just obfuscate the text portion by javascript
>or encoding tricks.
>
>e.g.:   <a HREF="www.eplus.de">imail.de</a>
>        is 'related' (even if 'mis'constructed)
>        because you find access to the 'imail.de'
>        Mails via the 'www.eplus.de' webserver.
>
>        Also many Mail-Texts of the kind
>         ... to reach FOO click <a HREF="somedomain">here</a>
>        would be very difficult to 'analyze correctly'.
>
>So I believe it to be an interesting idea for AI specialists,
>but alas not for inclusion in spamassassin as it works now.
>
>Stucki  (postmaster at mi.fu-berlin.de using spamassassin 2.63)


I have to agree with Stucki. What about all those image caching services?
They would all get tagged, which is a large amount of legit newsletters. It
was a good idea, so don't feel bad. 

--Chris

Reply via email to