On 11/10/2012 11:13 AM, John Hardin wrote:
On Sat, 10 Nov 2012, Marc Perkel wrote:
Just a thought, I changed this:
uri URI_PROTO_MC /^(?!(?-i:https?:))https?:/i
into this:
uri URI_PROTO_MC /^(?!(?-i:ttps?:))ttps?:/i
Some people capitalize the H - but the rest of it being mixed case
should be 100% accurate.
That breaks it. Note the RE is anchored at the beginning of the URI.
This is what you want:
uri URI_PROTO_MC /^(?!(?-i:[Hh]ttps?:))https?:/i
The string inside the parentheses is what you want to _not_ hit, and
that part is _not_ case-insensitive, even though the rest of the
expression _is_ case-insensitive.
Also, for the TLD rule: after a bit of thought I realized it would be
very unlikely a spammer would be doing this to a .gov URI, so I
substituted .biz:
uri __URI_TLD_MC
/\.(?!(?-i:com|net|org|biz|info))(?:com|net|org|biz|info)\b/i
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
The fetters imposed on liberty at home have ever been forged out
of the weapons provided for defense against real, pretended, or
imaginary dangers from abroad. -- James Madison, 1799
-----------------------------------------------------------------------
Tomorrow: Veterans Day
So far working good. Caught 4620 spams since sunday morning with these
mixed case rules. I added this as a separate rule.
/^(?!(?-i:[Hh]ttps?:\/\/www))https?:\/\/www/i
Found some cases where the HTTP was lower case but the WWW was mixed.
--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400