Re: Encoding Questions

Kent Johnson Tue, 19 Apr 2005 11:35:55 -0700

[EMAIL PROTECTED] wrote:

1. I download a page in python using urllib and now want to convert and
keep it as utf-8? I already know the original encoding of the page.
What calls should I make to convert the encoding of the page to utf8?
For example, let's say the page is encoded in gb2312 (simple chinese)
and I want to keep it in utf-8?


Something like
data = urllib.url_open(...).read()
unicodeData = data.decode('gb2312')
utf8Data = unicodeData.encode('utf-8')

You may want to supply the errors parameter to decode() or encode(); see the 
docs for details.
http://docs.python.org/lib/string-methods.html

2. Is this a good approach? Can I keep any pages in any languages in
this way and return them when requested using utf-8 encoding?


Yes, as long as you know reliably what the encoding is for the source pages.

3. Does python 2.4 support all encodings?


I doubt it :-) but it supports many encodings. The list is at
http://docs.python.org/lib/standard-encodings.html

Kent
--
http://mail.python.org/mailman/listinfo/python-list

Re: Encoding Questions

Reply via email to