On Sat, 10 Nov 2012, Marc Perkel wrote:
Need a rule to catch this:
HtTp://goOGleplAcESSEOopTimiZaTIonx.cOm
Mixed case links
Mixed-case protocol:
uri URI_PROTO_MC /^(?!(?-i:https?:))https?:/i
Note: this _will_trigger on HTTP and HTTPS but I expect they are rare in
legitimate URIs
Mixed case TLD:
uri URI_TLD_MC
/\.(?!(?-i:com|net|org|gov|info))(?:com|net|org|gov|info)\b/i
Add TLDs as needed. Again, this _will_ trigger on totally UC TLDs. If
that's a problem just add the fully-uppercase TLD to the first TLD list
(the case-insensitive zero-width lookahead assertion).
Common domain name parts or subparts:
uri URI_GOOG_MC /(?!(?-i:google))google/i
HTH.
How much are you seeing these in real traffic?
--
John Hardin KA7OHZ http://www.impsec.org/~jhardin/
jhar...@impsec.org FALaholic #11174 pgpk -a jhar...@impsec.org
key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
Perfect Security and Absolute Safety are unattainable; beware
those who would try to sell them to you, regardless of the cost,
for they are trying to sell you your own slavery.
-----------------------------------------------------------------------
Tomorrow: Veterans Day