On 3 May 2005 08:07:06 -0700, "Chris Curvey" <[EMAIL PROTECTED]> wrote:
>I'm writing an XMLRPC server, which is receiving a request (from a >non-Python client) that looks like this (formatted for legibility): > ><?xml version="1.0"?> ><methodCall> ><methodName>echo</methodName> ><params> ><param> ><value> ><string>Le Martyre de Saint André <BR> avec inscription >'Le Dominiquain.' et 'Le tableau fait par le dominicain, >d'après son dessein à... est à Rome, à >l'église Saint André della Valle' sur le >cadre<BR> craie noire, plume et encre brune, lavis brun >rehaussé de blanc sur papier brun<BR> 190 x 228 mm. (7 1/2 x >9 in.)</string> ></value> ></param> ></params> ></methodCall> > >But when my "echo" method is invoked, the value of the string is: > >Le Martyre de Saint Andr; <BR> avec inscription 'Le Dominiquain.' et >'Le tableau fait par le dominicain, d'apr:s son dessein 2... est 2 >Rome, 2 l';glise Saint Andr; della Valle' sur le cadre<BR> craie noire, >plume et encre brune, lavis brun rehauss; de blanc sur papier brun<BR> >190 x 228 mm. (7 1/2 x 9 in.) > >Can anyone give me a lead on how to convert the entity references into >something that will make it through to my method call? > I haven't used XMLRPC but superficially this looks like a quoting and/or encoding problem. IOW, your "request" is XML, and the <string>...</string> part is also XML which is part of the whole, not encapsulated in e.g. <![CDATA[...stuff...]]> (which would tell an XML parser to suspend markup interpretation of ...stuff...). So IWT you would at least need the <string>...</string> content to be converted to unicode to preserve all the represented characters. It wouldn't surprise me if the whole request is routinely converted to unicode, and the "value" you are showing above is a result of converting from unicode to an encoding that can't represent everything, and maybe just drops conversion errors. What do you get if you print repr(value)? (assuming value is passed to you echo method) If it is a unicode string, you will just have to choose an appropriate value.encode('appropriate') from available codecs. If it looks like e.g., a utf-8 encoding of unicode, you could try value.decode('utf-8').encode('appropriate') I'm just guessing here. But something is interpreting the basic XML, since <BR> is being converted to <BR>. Seems not unlikely that the rest are also being converted, and to unicode. You just wouldn't notice a glitch when unicode <BR> is converted to any usual western text encoding. OTOH, if the intent (which I doubt) of the non-python client were to pass through a block of pre-formatted XML as such (possibly for direct pasting into e.g. web page XHTML?) then a way to avoid escaping every & and < would be to use CDATA to encapsulate it. That would have to be fixed on that end. Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list