Denis Barbier wrote:
Miroslav Kure wrote:
Unfortunately, nonbreakable space is not included in
character class \s or [:space:] (aka whitespace). As it
is usually not distinguishable from the ordinary space in
most of the fonts, I would say that nonbreakable space
should be added to the whitespace class in regexp
libraries.
No, that would defeat its purpose; a non-breaking space is
used to glue two words together.
But only in the graphical sense. In the logical sense they
are still two separate words.
Isn't the real problem that Miroslav should have used '\b'
to identify the boundary between word and non-word text or
'[:^word:]' to identify all non-word characters. Ideally
this should even catch the invisible word separators used in
some cases in some languages. This only has one problem;
the soft hyphen is for some reason not classified as a word
character (in da_DK and fo_FO), which it logically should.
Jacob
--
"... there may be many others,
but they haven't been discovered" -- Tom Lehrer
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]