you should be able to check against img src content, right?

2011/10/14 Christian Grunfeld <christian.grunf...@gmail.com>:
> and what about when there is no anchor text in the link ? eg. paypal
> image button
>
>
> 2011/10/14  <dar...@chaosreigns.com>:
>> Existing rule:
>>
>> rawbody  __SPOOFED_URL  m/<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:[^>"'\# 
>> ]{8,29}[^>"'\# 
>> :\/?&=])[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>>
>>
>> How about this, to only check for a changed domain part instead?
>>
>> rawbody SPOOFED_URL_DOMAIN 
>> /<a\s[^>]{0,2048}\bhref=(?:3D)?.?(https?:\/\/?[^\/>"'\# 
>> ]{8,29})[^>]{0,2048}>(?:[^<]{0,1024}<(?!\/a)[^>]{1,1024}>){0,99}\s{0,10}(?!\1)https?[^\w<]{1,3}[^<]{5}/i
>>
>> It matches this:
>>
>>  <a href="http://www.chaosreigns.com/";>http://www.example.com</a>
>>
>> But does not match this (example from actual non-spam):
>>
>>  <a 
>> href="http://www.jr.com/tracking?ord_q_num=105725494&ord_q_zip=03076";>http://www.jr.com/tracking</a>
>>
>>
>> A very simplified form of this new one:
>>
>> rawbody SPOOFED_URL_DOMAIN /<a href="(https?:\/\/[^\/">]+)[^>]*>(?!\1)http/i
>>
>> That "(?!\1)" bit is nice and fancy.  It means "not what was in the first
>> set of parentheses).  In the perlre man page: "A zero-width negative
>> look-ahead assertion."
>>
>> --
>> "Every normal man must be tempted at times to spit upon his hands,
>> hoist the black flag, and begin slitting throats."
>>  - Henry Louis Mencken (1880-1956)
>> http://www.ChaosReigns.com
>>
>

Reply via email to