On Fri, Jan 9, 2015, at 16:44, Nick wrote:
> Quoth FRIGN:
> >  - UTF-8: not allowed in POSIX, but in my opinion a must. This
> >           finally allows you to work with UTF-8 streams without
> >           problems or unexpected behaviour.
> 
> I fully agree (unsurprisingly). Anything that relies on the POSIX 
> behaviour to do weird things involving multibyte characters is 
> insane.

Er... http://pubs.opengroup.org/onlinepubs/009696899/utilities/tr.html
has very little mention of the issue one way or another, but does use
the term "characters" rather than "bytes" in all relevant places, and
talks about "multi-byte characters" in a tone that suggests they should
be supported properly when LC_CTYPE has them.

The only _questionable_ bits are some of the language surrounding the
use of octal sequences:

For single characters: "Multi-byte characters require multiple,
concatenated escape sequences of this type, including the leading '\'
for each byte."

I read this as meaning that multi-byte characters are supported, and in
fact that "tr '\303\266o' 'o\303\266' means that \303\266 [two escape
sequences representing one multi-byte character] and o will be swapped -
and that it is not possible to specify multibyte characters with octal
values a dash-separated range specification (but they can be included as
literals).

Or, is it possible that FRIGN misinterpreted the prohibition on
"multi-character collating elements" ?

Reply via email to