Re: Best ways of managing text encodings in source/regexes?

tvn Sun, 09 Dec 2007 09:44:54 -0800

Please see the correction from Cliff pasted here after this excerpt.
Tim

> the byte string is ASCII which is a subset of Unicode (IS0-8859-1
> isn't).)


The one comment I'd make is that ASCII and ISO-8859-1 are both subsets
of Unicode, (which relates to the abstract code-points) but ASCII is
also a subset of UTF-8, on the bytestream level, while ISO-8859 is not
a
subset of UTF-8, nor, as far as I can tell, any other unicode
*encoding*.

Thus a file encoded in ascii *is* in fact a utf-8 file.  There is no
way
to distinguish the two.  But an ISO-8859-1 file is not the same (on
the
bytestream level) as a file with identical content in UTF-8 or any
other
unicode encoding.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Best ways of managing text encodings in source/regexes?

Reply via email to