Re: xmlrpclib and decoding entity references

Bengt Richter Wed, 04 May 2005 09:05:07 -0700

On 3 May 2005 08:07:06 -0700, "Chris Curvey" <[EMAIL PROTECTED]> wrote:


>I'm writing an XMLRPC server, which is receiving a request (from a
>non-Python client) that looks like this (formatted for legibility):
>
><?xml version="1.0"?>
><methodCall>
><methodName>echo</methodName>
><params>
><param>
><value>
><string>Le Martyre de Saint Andr&#xe9; &lt;BR&gt; avec inscription
>&apos;Le Dominiquain.&apos; et &apos;Le tableau fait par le dominicain,
>d&apos;apr&#xe8;s son dessein &#xe0;... est &#xe0; Rome, &#xe0;
>l&apos;&#xe9;glise Saint Andr&#xe9; della Valle&apos; sur le
>cadre&lt;BR&gt; craie noire, plume et encre brune, lavis brun
>rehauss&#xe9; de blanc sur papier brun&lt;BR&gt; 190 x 228 mm. (7 1/2 x
>9 in.)</string>
></value>
></param>
></params>
></methodCall>
>
>But when my "echo" method is invoked, the value of the string is:
>
>Le Martyre de Saint Andr; <BR> avec inscription 'Le Dominiquain.' et
>'Le tableau fait par le dominicain, d'apr:s son dessein 2... est 2
>Rome, 2 l';glise Saint Andr; della Valle' sur le cadre<BR> craie noire,
>plume et encre brune, lavis brun rehauss; de blanc sur papier brun<BR>
>190 x 228 mm. (7 1/2 x 9 in.)
>
>Can anyone give me a lead on how to convert the entity references into
>something that will make it through to my method call?
>
I haven't used XMLRPC but superficially this looks like a quoting and/or 
encoding
problem. IOW, your "request" is XML, and the <string>...</string> part is also 
XML
which is part of the whole, not encapsulated in e.g. <![CDATA[...stuff...]]>
(which would tell an XML parser to suspend markup interpretation of 
...stuff...).

So IWT you would at least need the <string>...</string> content to be converted 
to
unicode to preserve all the represented characters. It wouldn't surprise me if 
the
whole request is routinely converted to unicode, and the "value" you are showing
above is a result of converting from unicode to an encoding that can't represent
everything, and maybe just drops conversion errors. What do you
get if you print repr(value)? (assuming value is passed to you echo method)

If it is a unicode string, you will just have to choose an appropriate 
value.encode('appropriate')
from available codecs. If it looks like e.g., a utf-8 encoding of unicode, you 
could try
value.decode('utf-8').encode('appropriate')

I'm just guessing here. But something is interpreting the basic XML, since
&lt;BR&gt; is being converted to <BR>. Seems not unlikely that the rest are
also being converted, and to unicode. You just wouldn't notice a glitch when
unicode <BR> is converted to any usual western text encoding.

OTOH, if the intent (which I doubt) of the non-python client were to pass 
through
a block of pre-formatted XML as such (possibly for direct pasting into e.g. web 
page XHTML?)
then a way to avoid escaping every & and < would be to use CDATA to encapsulate 
it. That
would have to be fixed on that end.

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xmlrpclib and decoding entity references

Reply via email to