Martin v. Löwis wrote: >> In trying to parse html files using ElementTree running under Python >> 3.0a1, and using htmlentitydefs.py to add "character entities" to the >> parser, I found that I needed to create a customized version of >> htmlentitydefs.py to make things work properly. > > Can you please state what precise problem you were seeing? The original > code looks fine to me as it stands.
from what I can tell, his problem is that htmlentitydefs.entitydefs maps to *either* character strings or HTML character references, depending on the character value. he needs a dictionary that maps from entity names to characters for *all* names; something like (untested): entity_map = htmlentitydefs.entitydefs.copy() for name, entity in entity_map.items(): if len(entity) != 1: entity_map[name] = unichr(int(entity[2:-1])) (entitydefs is pretty unusable as it is, but it was added to Python before Python got Unicode strings, and changing it would break ancient code...) </F> -- http://mail.python.org/mailman/listinfo/python-list