On Sat, 10 Nov 2012, Marc Perkel wrote:

Need a rule to catch this:

HtTp://goOGleplAcESSEOopTimiZaTIonx.cOm

Mixed case links

Mixed-case protocol:

   uri  URI_PROTO_MC  /^(?!(?-i:https?:))https?:/i

Note: this _will_trigger on HTTP and HTTPS but I expect they are rare in legitimate URIs

Mixed case TLD:

   uri  URI_TLD_MC    
/\.(?!(?-i:com|net|org|gov|info))(?:com|net|org|gov|info)\b/i

Add TLDs as needed. Again, this _will_ trigger on totally UC TLDs. If that's a problem just add the fully-uppercase TLD to the first TLD list (the case-insensitive zero-width lookahead assertion).

Common domain name parts or subparts:

   uri  URI_GOOG_MC   /(?!(?-i:google))google/i

HTH.

How much are you seeing these in real traffic?

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  Perfect Security and Absolute Safety are unattainable; beware
  those who would try to sell them to you, regardless of the cost,
  for they are trying to sell you your own slavery.
-----------------------------------------------------------------------
 Tomorrow: Veterans Day

Reply via email to