hi, I am using mosquito.py as the server side client to build a messaging service. I am using Python 2.7.3. Sorry I am quite new to Python, and this is the most difficult issue I've ever met with it in past few months. I hope I can get some help from Python masters here. :)
When I was trying to use payload to pass utf-8 text message. I found that it works perfectly with English and ASCII, but if i add Chinese to the payload text, there are a lot of error like this: UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of data 1. I already saved my python source as 'utf-8' 2. I already set the sys.defaultencoding as 'utf-8' by adding following code to my source code: import sys reload(sys) sys.setdefaultencoding('utf8') I added following test code to my client code, it works perfectly: #testing decoding c = '中国人' #some Chinese text here. print "Chinese = ", c, "repr = ", repr(c), "type = ", type(c), len(c) d = c.decode('utf8') print "Decoded = ", d, "repr = ", repr(d), "type = ", type(d), len(d) FYI, the print output is: Chinese = 中国人 repr = '\xe4\xb8\xad\xe5\x9b\xbd\xe4\xba\xba' type = <type 'str'> 9 Decoded = 中国人 repr = u'\u4e2d\u56fd\u4eba' type = <type 'unicode'> 3 which means the decoding works fine here. I added following code for payload decode: print "Payload = ", msg.payload, "repr = ", repr(msg.payload), "type = ", type(msg.payload), len(msg.payload) text = msg.payload.decode('utf8') When the payload is pure English or number, everything is perfect, print output can be like this: Payload = hi repr = 'hi' type = <type 'str'> 2 Text = hi repr = u'hi' type = <type 'unicode'> 2 if I use '中国人‘ as payload text, the output look like this: Payload = 中 repr = '\xe4\xb8\xad' type = <type 'str'> 3 Text = 中 repr = u'\u4e2d' type = <type 'unicode'> 1 only one Chinese character 中 show up, the left two chars are cut off. why is that? but if I try another 2 different char '你好' in the payload, it didn't went through at all, the error message looks like this. Payload '你好' became question mark here? So the output is different based on what Chinese char i choose. Payload = ? repr = '\xe4\xb8' type = <type 'str'> 2 Traceback (most recent call last): File "messenger.py", line 181, in <module> main_loop() File "messenger.py", line 173, in main_loop while mqttc.loop() == 0: File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 670, in loop rc = self.loop_read(max_packets) File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 840, in loop_read rc = self._packet_read() File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 1151, in _packet_read rc = self._packet_handle() File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 1531, in _packet_handle return self._handle_pubrel() File "/usr/local/lib/python2.7/dist-packages/mosquitto.py", line 1682, in _handle_pubrel self.on_message(self, self._userdata, self._messages[i]) File "messenger.py", line 129, in on_message text = msg.payload.decode('utf8') File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: unexpected end of data I already spent two days trying to fix this, and digging to all kinds of solutions. Really hope can get some help on this. Many thanks! -Horace
-- Mailing list: https://launchpad.net/~mosquitto-users Post to : mosquitto-users@lists.launchpad.net Unsubscribe : https://launchpad.net/~mosquitto-users More help : https://help.launchpad.net/ListHelp