On Sun, Apr 16, 2006 at 06:24:58PM -0700, Audrey Tang via RT wrote:
> Nicholas Clark wrote:
> > IIRC having ASCII as the default was a deliberate design choice to avoid 
> > the confusion of "is it iso-8859-1 or is it utf-8" when encountering a
> > string literal with bytes outside the range 0-127.
> 
> Aye, it was auto-promoting to latin1 and was changed to ascii-by-default
> by me and Leo a while ago.

After reading some of the comments and thinking about it some
more, I think having double-quoted strings auto-promote to 
iso-8859-1 by default is probably not a good idea.

> > If PGE is always outputting UTF-8 literals, what stops it from always
> > prefixing every literal "unicode:", even if it only uses Unicode characters
> > 0 to 127?
> 
> Indeed, it would be much easier if unicode:"" on an ascii-only string
> can automatically go back to use ascii for representation, and choose to
> use utf8 (or better, latin1/ucs2) only iff there is high-bit parts in it.

Yes, this would be a good solution also.

(It doesn't resolve the original problem that prompted this
RFE, so I'll open a separate ticket for that.)

Thanks,

Pm


Reply via email to