On Fri, Jan 9, 2015, at 16:44, Nick wrote: > Quoth FRIGN: > > - UTF-8: not allowed in POSIX, but in my opinion a must. This > > finally allows you to work with UTF-8 streams without > > problems or unexpected behaviour. > > I fully agree (unsurprisingly). Anything that relies on the POSIX > behaviour to do weird things involving multibyte characters is > insane.
Er... http://pubs.opengroup.org/onlinepubs/009696899/utilities/tr.html has very little mention of the issue one way or another, but does use the term "characters" rather than "bytes" in all relevant places, and talks about "multi-byte characters" in a tone that suggests they should be supported properly when LC_CTYPE has them. The only _questionable_ bits are some of the language surrounding the use of octal sequences: For single characters: "Multi-byte characters require multiple, concatenated escape sequences of this type, including the leading '\' for each byte." I read this as meaning that multi-byte characters are supported, and in fact that "tr '\303\266o' 'o\303\266' means that \303\266 [two escape sequences representing one multi-byte character] and o will be swapped - and that it is not possible to specify multibyte characters with octal values a dash-separated range specification (but they can be included as literals). Or, is it possible that FRIGN misinterpreted the prohibition on "multi-character collating elements" ?