Re: codec for html/xml entities!?

Martin Bless Sun, 20 Apr 2008 06:51:26 -0700

[Stefan Behnel] wrote & schrieb:

>Martin Bless wrote:
>> What's a good way to encode and decode those entities like &euro; or
>> &#8364; ?
>
>Hmm, since you provide code, I'm not quite sure what your actual question is.


- What's a GOOD way?
- Am I reinventing the wheel?
- Are there well tested, fast, state of the art, builtin ways?
- Is something like line.decode('htmlentities') out there?
- Am I in conformity with relevant RFCs? (I'm hoping so ...)

>So I'll just comment on the code here.
>
>
>> def entity2uc(entity):
>>     """Convert entity like &#123; to unichr.
>> 
>>     Return (result,True) on success or (input string, False)
>>     otherwise. Example:
>>         entity2cp('&euro;')   -> (u'\u20ac',True)
>>         entity2cp('&#x20ac;') -> (u'\u20ac',True)
>>         entity2cp('&#8364;')  -> (u'\u20ac',True)
>>         entity2cp('&foobar;') -> ('&foobar;',False)
>>     """
>
>Is there a reason why you return a tuple instead of just returning the
>converted result and raising an exception if the conversion fails?

Mainly a matter of style. When I'll be using the function in future
this way it's unambigously clear that there might have been
unconverted entities. But I don't have to deal with the details of how
this has been discovered. And may be I'd like to change the algorithm
in future? This way it's nicely encapsulated.

Have a nice day

Martin
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: codec for html/xml entities!?

Reply via email to