[ python-Feature Requests-1599329 ] urllib(2) should allow automatic decoding by charset

SourceForge.net Sun, 19 Nov 2006 11:47:36 -0800

Feature Requests item #1599329, was opened at 2006-11-19 14:47
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1599329&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Erik Demaine (edemaine)
Assigned to: Nobody/Anonymous (nobody)
Summary: urllib(2) should allow automatic decoding by charset

Initial Comment:
Currently, urllib.urlopen(...).read() returns a string, not a unicode object.  
Ditto for urllib2.  No attempt is made to decode the data using the charset 
encoding specified in the header ....info()['Content-Type'].

Is it fair to assume that, in Python 3K, urllib....read() will return (Unicode) 
strings instead of bytes, automatically decoding according to the charset?

Do you think we could expose this futuristic functionality in Python 2?  I 
doubt we could change read() without breaking a lot of existing code that 
already does this decoding (e.g., http://zesty.ca/python/scrape.py), but 
perhaps a 'uread()' method could return a unicode object instead of a string.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1599329&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Feature Requests-1599329 ] urllib(2) should allow automatic decoding by charset

Reply via email to