John Bauman wrote:
> UTF-8 shouldn't need a BOM, as it is designed for character streams, and
> there is only one logical ordering of the bytes. Only UTF-16 and greater
> should output a BOM, AFAIK.
Yes and no. Yes, UTF-8 does not need a BOM to identify endianness. No,
usage of the BOM with UTF
John Bauman wrote:
> UTF-8 shouldn't need a BOM, as it is designed for character streams, and
> there is only one logical ordering of the bytes. Only UTF-16 and greater
> should output a BOM, AFAIK.
However there's a pending patch (http://bugs.python.org/1177307) for a
new encoding named utf-
UTF-8 shouldn't need a BOM, as it is designed for character streams, and
there is only one logical ordering of the bytes. Only UTF-16 and greater
should output a BOM, AFAIK.
--
http://mail.python.org/mailman/listinfo/python-list
> 2005/12/23, David Xiao <[EMAIL PROTECTED]>:
> Hi Kuan:
>
> Thanks a lot! One more question here: How to write if I want
> to
> specify locale other than current locale?
>
> For example, running on Korea locale system, and try read a
>
FYI. I had just receive something from a friend, he give me following
nice example!
I have one more question on this: How to write if I want to specify
locale other than current locale? For example, program runn on Korea
locale system, and try reading a UTF-8 file that save chinese
characters.
--
Sorry, I'm newbie in python. I can't help you further, indeed I don't know either.:)2005/12/23, David Xiao <[EMAIL PROTECTED]>:
Hi Kuan:Thanks a lot! One more question here: How to write if I want tospecify locale other than current locale?For example, running on Korea locale system, and try read a
import codecsdef read_utf8_txt_file (filename): fileObj = codecs.open( filename, "r", "utf-8" ) content = fileObj.read() content = content[1:] #exclude BOM
print content
fileObj.close() read_utf8_txt_file("e:\\u.txt")22 Dec 2005 18:12:28 -0800, [EMAIL PROTECTED] <
[EMAIL PROTECT
Hi Friends:
fileObj = codecs.open( filename, "r", "utf-8" )
u = fileObj.read() # Returns a Unicode string from the UTF-8 bytes in
the file
print u
It says error:
UnicodeEncodeError: 'gbk' codec can't encode character u'\ufeff' in
position 0:
illegal multiby