Python and decimal character entities over 128.

bsagert Wed, 09 Jul 2008 16:41:28 -0700

Some web feeds use decimal character entities that seem to confuse
Python (or me). For example, the string "doesn't" may be coded as
"doesn&#8217;t" which should produce a right leaning apostrophe.
Python hates decimal entities beyond 128 so it chokes unless you do
something like string.encode('utf-8'). Even then, what should have
been a right-leaning apostrophe ends up as "â€™". The following script
does just that. Look for the string "The Canuck iPhone: Apple doesnâ
€™t care" after running it.


# coding: UTF-8
import feedparser

s = ''
d = feedparser.parse('http://feeds.feedburner.com/Mathewingramcom/
work')
title = d.feed.title
link = d.feed.link
for i in range(0,4):
    title = d.entries[i].title
    link = d.entries[i].link
    s += title +'\n' + link + '\n'

f = open('c:/x/test.txt', 'w')
f.write(s.encode('utf-8'))
f.close()

This useless script is adapted from a "useful" script. Its only
purpose is to ask the Python community how I can deal with decimal
entities > 128. Thanks in advance, Bill


--
http://mail.python.org/mailman/listinfo/python-list

Python and decimal character entities over 128.

Reply via email to