On Sun, Apr 16, 2006 at 06:24:58PM -0700, Audrey Tang via RT wrote: > Nicholas Clark wrote: > > IIRC having ASCII as the default was a deliberate design choice to avoid > > the confusion of "is it iso-8859-1 or is it utf-8" when encountering a > > string literal with bytes outside the range 0-127. > > Aye, it was auto-promoting to latin1 and was changed to ascii-by-default > by me and Leo a while ago.
After reading some of the comments and thinking about it some more, I think having double-quoted strings auto-promote to iso-8859-1 by default is probably not a good idea. > > If PGE is always outputting UTF-8 literals, what stops it from always > > prefixing every literal "unicode:", even if it only uses Unicode characters > > 0 to 127? > > Indeed, it would be much easier if unicode:"" on an ascii-only string > can automatically go back to use ascii for representation, and choose to > use utf8 (or better, latin1/ucs2) only iff there is high-bit parts in it. Yes, this would be a good solution also. (It doesn't resolve the original problem that prompted this RFE, so I'll open a separate ticket for that.) Thanks, Pm