According to The Sgmlop Module Handbook [1], the handle_entityref() callback is called for "malformed character entities". What does that mean, exactly? What is a malformed character entity? I've tried mis-spelling them (e.g., dropping the semicolon), but then they're (quite naturally) treated as text/data, with handle_data(). I've tried to use number that is too great, or (equivalently, it turns out) to use names instead of numbers, such as &#foo;. In these cases, I only get an exception, because the number is too high...
So -- how can I produce a malformed character entity? I've tried to read the C code, but I can't say that left me any wiser on the subject; it doesn't seem to have any special-casing for this that I can find. And another thing... For the case where a numeric reference is too high (i.e. it can't be translated into a Unicode character) -- is it possible to ignore it (or replace it, as with encode/decode)? I'm trying to write a parser that will accept *any* input text without complaining -- but simply trapping this exception would seem to disrupt the parsing process... Thanks, - Magnus -- Magnus Lie Hetland Time flies like the wind. Fruit flies http://hetland.org like bananas. -- Groucho Marx -- http://mail.python.org/mailman/listinfo/python-list