Re: Character encoding

Frederic Rentsch Wed, 08 Nov 2006 07:31:03 -0800

mp wrote:
> I have html document titles with characters like &gt;, &nbsp;, and
> &#135. How do I decode a string with these values in Python?
>
> Thanks
>
>   
This is definitely the most FAQ. It comes up about once a week.


The stream-editing way is like this:

 >>> import SE
 >>> HTM_Decoder = SE.SE ('htm2iso.se') # Include path

>>> test_string = '''I have html document titles with characters like &gt;, 
>>> &nbsp;, and
&#135;. How do I decode a string with these values in Python?'''
>>> print HTM_Decoder (test_string)
I have html document titles with characters like >,  , and
‡. How do I decode a string with these values in Python?

An SE object does files too.

>>> HTM_Decoder ('with_codes.txt', 'translated_codes.txt')  # Include path

You could download SE from -> http://cheeseshop.python.org/pypi/SE/2.3. The 
translation definitions file "htm2iso.se" is included. If you open it in your 
editor, you can see how to write your own definition files for other 
translation tasks you may have some other time.

Regards

Frederic



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Character encoding

Reply via email to