On 02/09/2013 16:09, Damian Weber wrote:
On Mon, 2 Sep 2013, Andriy Gapon wrote:
re_format(7) says:
There are two special cases? of bracket expressions: the bracket expres?
sions ?[[:<:]]? and ?[[:>:]]? match the null string at the beginning and
end of a word respectively. A word is defined as a sequence of word
characters which is neither preceded nor followed by word characters. A
word character is an alnum character (as defined by ctype(3)) or an
underscore. This is an extension, compatible with but not specified by
IEEE Std 1003.2 (?POSIX.2?), and should be used with caution in software
intended to be portable to other systems.
However I observe the following:
$ echo "cd0 cd1 xx" | sed 's/cd[0-9][^ ]* *//g'
xx
$ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9][^ ]* *//g'
cd1 xx
In my opinion '[[:<:]]' should not affect how the pattern is matched in this
case.
Any thoughts, suggestions?
there are two simpler expressions, whose difference I don't understand either
(tested on 8.4-PRERELEASE)
$ echo "cd0 cd1 xx" | sed 's/cd[0-9] //g'
xx
$ echo "cd0 cd1 xx" | sed 's/[[:<:]]cd[0-9] //g'
cd1 xx
Well, I agree with your analysis, and I think it's certainly a bug.
Do you think that the BUGS line in regex(3) should perhaps be extended
to "never works properly"?:
"""
Word-boundary matching does not work properly in multibyte locales.
"""
[[:<:]] can be replaced by \b in a pcre, which works perfectly fine (of
course)
echo "this word word should be deleted" | perl -pe 's,\bword ,,g' this
should be deleted
Chris
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
_______________________________________________
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"