On Sun, 26 Jul 2009, Karsten Br�ckelmann wrote:

On Sun, 2009-07-26 at 17:19 +0200, Karsten Bräckelmann wrote:
On Sat, 2009-07-25 at 16:07 -0500, McDonald, Dan wrote:
... (?:c\s?o\s?m|n\s?e\s?t|o\s?r\s?g)[[:punct:]]?\b/i
                                      ^^^^^^^^^^^^
That part is superfluous. If it matches a punctuation char, its optional variant (matching no char) will make the \b word boundary match as well.

Crap, that's actually wrong. :/ There is exactly *one* char in the POSIX punct char class, that also is a word char -- the underscore...

So that translates to "with an optional underscore after the TLD". Sorry, my bad.

That's an inefficient and confusing way to deal with the fact that underscore breaks \b, and doesn't cover all cases. I prefer:

Before text: \b_*
After text:  _*\b

As in:    /\b_*www_*\b/

--
 John Hardin KA7OHZ                    http://www.impsec.org/~jhardin/
 jhar...@impsec.org    FALaholic #11174     pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
-----------------------------------------------------------------------
  When designing software, any time you think to yourself "a user
  would never be stupid enough to do *that*", you're wrong.
-----------------------------------------------------------------------
 10 days until the 274th anniversary of John Peter Zenger's acquittal

Reply via email to