anon <[EMAIL PROTECTED]> writes:

> So I've encountered a strange behavior that I'm hoping someone can fill
> me in on.  i've written a simple handler that works with one small
> exception, when the parser encounters a line with '&#38;' in it, it
> only returns the portion that follows the occurence.  
>
> For example, parsing a file with the line :
> <key>mykey</key><value>some%20&#38;%20value</value>
>
> results in getting "%20value" back from the characters method, rather
> than "some%20&#38;%20value".
>
> After looking into this a bit, I found that SAX supports entities and
> that it is probably believing the &#38; to be an entity and processing
> it in some way that i'm unware of.  I'm using the default
> EntityResolver.

Are you sure you're not actually getting three chunks: "some%20", "&",
and "%20value"? The xml.sax.handler.ContentHandler.characters method
(which I presume you're using for SAX, as you don't mention!) is not
guaranteed to get all contiguous character data in one call. Also check
if .skippedEntity() methods are firing.

-- 
|>|\/|<
/--------------------------------------------------------------------------\
|David M. Cooke
|cookedm(at)physics(dot)mcmaster(dot)ca
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to