On 11/10/2012 11:13 AM, John Hardin wrote:
On Sat, 10 Nov 2012, Marc Perkel wrote:

Just a thought, I changed this:

uri  URI_PROTO_MC  /^(?!(?-i:https?:))https?:/i

into this:

uri  URI_PROTO_MC  /^(?!(?-i:ttps?:))ttps?:/i

Some people capitalize the H - but the rest of it being mixed case should be 100% accurate.

That breaks it. Note the RE is anchored at the beginning of the URI.

This is what you want:

  uri  URI_PROTO_MC  /^(?!(?-i:[Hh]ttps?:))https?:/i

The string inside the parentheses is what you want to _not_ hit, and that part is _not_ case-insensitive, even though the rest of the expression _is_ case-insensitive.

Also, for the TLD rule: after a bit of thought I realized it would be very unlikely a spammer would be doing this to a .gov URI, so I substituted .biz:

uri __URI_TLD_MC /\.(?!(?-i:com|net|org|biz|info))(?:com|net|org|biz|info)\b/i


--
 John Hardin KA7OHZ http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  The fetters imposed on liberty at home have ever been forged out
  of the weapons provided for defense against real, pretended, or
  imaginary dangers from abroad.               -- James Madison, 1799
-----------------------------------------------------------------------
 Tomorrow: Veterans Day




So far working good. Caught 4620 spams since sunday morning with these mixed case rules. I added this as a separate rule.

/^(?!(?-i:[Hh]ttps?:\/\/www))https?:\/\/www/i

Found some cases where the HTTP was lower case but the WWW was mixed.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400

Reply via email to