William Xu wrote: > Hi, all, > > This piece of code used to work well. i guess the error occurs after > some upgrade. > > >>> import urllib > >>> from BeautifulSoup import BeautifulSoup > >>> url = 'http://www.google.com' > >>> port = urllib.urlopen(url).read() > >>> soup = BeautifulSoup() > >>> soup.feed(port) > Traceback (most recent call last): > File "<stdin>", line 1, in ? > File "/usr/lib/python2.3/sgmllib.py", line 94, in feed > self.rawdata = self.rawdata + data > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb8 in position 565: > ordinal not in range(128) > >>> > > Any ideas to solve this?
According to the documentation <http://www.crummy.com/software/BeautifulSoup/documentation.html> chapter "Beautiful Soup Gives You Unicode, Dammit" Beautiful Soup fully supports unicode so it's probably a bug. > version info: > > Python 2.3.5 (#2, Mar 7 2006, 12:43:17) > [GCC 4.0.3 20060212 (prerelease) (Debian 4.0.2-9)] on linux2 > > python-beautifulsoup: 3.0.1-1 Upgrading python-beautifulsoup is a good idea, since there were two bug fix releases after 3.0.1 -- http://mail.python.org/mailman/listinfo/python-list