Can someone confirm that the issue here is I need to encode the xml data using: # encode as UTF-8 utf8_string = xml.encode( 'utf-8' ) and then post it to the server.
Laurent ----- Original Message ---- From: Laurent Luce <laurentluc...@yahoo.com> To: Mark Tolonen <metolone+gm...@gmail.com>; python-list@python.org Sent: Monday, September 7, 2009 10:50:22 PM Subject: Re: unicode + xml The xml data is generated on Windows (python 2.6.2) and sent using a post request to a Django server. The django server is running on Ubuntu server with python 2.6.2. The post data is passed to minidom for parsing. Laurent ----- Original Message ---- From: Mark Tolonen <metolone+gm...@gmail.com> To: python-list@python.org Sent: Monday, September 7, 2009 9:15:15 PM Subject: Re: unicode + xml "Laurent Luce" <laurentluc...@yahoo.com> wrote in message news:255473.44957...@web54203.mail.re2.yahoo.com... > Hello, > > I am trying to do the following: > > - read list of folders in a specific directory: os.listdir() - some folders > have Japanese characters > - post list of folders as xml to a web server: I used content-type 'text/xml' > and I use '<?xml version="1.0" encoding="utf-8"?>' to start the xml data. > - on the server side (Django), I get the data using post_data and I use > minidom.parseString() to parse it. I get an exception because of the > following in the xml for one of the folder name: > '/ufffdX/ufffd^/ufffd[/ufffdg /ufffd/ufffd/ufffdj/ufffd/ufffd/ufffd[' > > The weird thing is that I see 5 bytes for each unicode character: ie: /ufffdX > > Should I format the data differently inside the xml so minidom is happy ? You aren't seeing 5 bytes for each unicode character. You are seeing '\ufffd' (the code point REPLACEMENT_CHARACTER) intermixed with other characters. The wrong encoding was probably used to decode the filename byte strings to Unicode. We can give more specific help if you specify your operating system and version of Python used. -Mark -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list