In article <[EMAIL PROTECTED]>, David M. Cooke <[EMAIL PROTECTED]> wrote:
> anon <[EMAIL PROTECTED]> writes: > > > So I've encountered a strange behavior that I'm hoping someone can fill > > me in on. i've written a simple handler that works with one small > > exception, when the parser encounters a line with '&' in it, it > > only returns the portion that follows the occurence. > > > > For example, parsing a file with the line : > > <key>mykey</key><value>some%20&%20value</value> > > > > results in getting "%20value" back from the characters method, rather > > than "some%20&%20value". > > > > After looking into this a bit, I found that SAX supports entities and > > that it is probably believing the & to be an entity and processing > > it in some way that i'm unware of. I'm using the default > > EntityResolver. > > Are you sure you're not actually getting three chunks: "some%20", "&", > and "%20value"? The xml.sax.handler.ContentHandler.characters method > (which I presume you're using for SAX, as you don't mention!) is not > guaranteed to get all contiguous character data in one call. Also check > if .skippedEntity() methods are firing. Ya, skippedEntity() wasn't firing, but you are correct about receiving three chunks. The characters handler routine is fired 3 times for a single text block. Why does it do this? Is there a way to prevent doing this? Much thanks. gh -- http://mail.python.org/mailman/listinfo/python-list