Martin v. Löwis wrote: > Sébastien Boisgérault schrieb: > > I am trying to embed an *arbitrary* (unicode) strings inside > > an XML document. Of course I'd like to be able to reconstruct > > it later from the xml document ... If the naive way to do it does > > not work, can anyone suggest a way to do it ? > > XML does not support arbitrary Unicode characters; a few control > characters are excluded. See the definiton of Char in > > http://www.w3.org/TR/2004/REC-xml-20040204 > > [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | > [#xE000-#xFFFD] | [#x10000-#x10FFFF] > > Now, one might thing you could use a character reference > (e.g. �) to refer to the "missing" characters, but this is not so: > > > [66] CharRef ::= '&#' [0-9]+ ';' > | '&#x' [0-9a-fA-F]+ '; > > Well-formedness constraint: Legal Character > Characters referred to using character references must match the > production for Char. > > As others have explained, if you want to transmit arbitrary characters, > you need to encode it as text in some way. One obvious solution > would be to encode the Unicode data as UTF-8 first, and then encode > the UTF-8 bytes using base64. The receiver of the XML document then > must do the reverse. > > Regards, > Martin
OK ! Thanks a lot for this helpful information. Cheers, SB -- http://mail.python.org/mailman/listinfo/python-list