On Sun, 2009-07-26 at 17:19 +0200, Karsten Bräckelmann wrote: > On Sat, 2009-07-25 at 16:07 -0500, McDonald, Dan wrote: > > ... (?:c\s?o\s?m|n\s?e\s?t|o\s?r\s?g)[[:punct:]]?\b/i > ^^^^^^^^^^^^ > That part is superfluous. If it matches a punctuation char, its optional > variant (matching no char) will make the \b word boundary match as well.
Crap, that's actually wrong. :/ There is exactly *one* char in the POSIX punct char class, that also is a word char -- the underscore... So that translates to "with an optional underscore after the TLD". Sorry, my bad. -- char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}