Re: can't get utf8 / unicode strings from embedded python

David M. Cotter Sun, 25 Aug 2013 15:32:47 -0700

fair enough.  I can provide further proof of strangeness.
here is my latest script:  this is saved on disk as a UTF8 encoded file, and 
when viewing as UTF8, it shows the correct characters.


==================
# -*- coding: utf-8 -*- 
import time, kjams, kjams_lib

def log_success(msg, successB, str):
        if successB:
                print msg + " worked: " + str
        else:
                print msg + "failed: " + str

def do_test(orig_str):
        cmd_enum = kjams.enum_cmds()
        
        print "---------------"
        print "Original string: " + orig_str
        print "converting..."

        oldstr = orig_str;
        newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, 
oldstr)
        log_success("first", oldstr == newstr, newstr);
        
        oldstr = unicode(orig_str, "UTF-8")
        newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, 
oldstr)
        newstr = unicode(newstr, "UTF-8")
        log_success("second", oldstr == newstr, newstr);
        
        oldstr = unicode(orig_str, "UTF-8")
        oldstr.encode("UTF-8")
        newstr = kjams_lib.do_command(cmd_enum.kScriptCommand_Unicode_Test, 
oldstr)
        newstr = unicode(newstr, "UTF-8")
        log_success("third", oldstr == newstr, newstr);

        print "---------------"
        
def main():
        do_test("frøânçïé")
        do_test("控件")

#-----------------------------------------------------
if __name__ == "__main__":
        main()

==================
and the latest results:

   20: ---------------
   20: Original string: frøânçïé
   20: converting...
   20: first worked: frøânçïé
   20: second worked: frøânçïé
   20: third worked: frøânçïé
   20: ---------------
   20: ---------------
   20: Original string: 控件
   20: converting...
   20: first worked: 控件
   20: second worked: 控件
   20: third worked: 控件
   20: ---------------

now, given the C++ source code, this should NOT work, given that i'm doing some 
crazy re-coding of the bytes.

so, you see, it does not matter whether i pass "unicode" strings or regular 
"strings", they all translate to the same, weird macroman.  

for completeness, here is the C++ code that the script calls:

===================
                        case kScriptCommand_Unicode_Test: {
                                pyArg = iterP.NextArg_OrSyntaxError();
                                
                                if (pyArg.get()) {
                                        SuperString str = pyArg.GetAs_String();
                                        
                                        resultObjP = PyString_FromString(str);
                                }
                                break;
                        }

===================
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: can't get utf8 / unicode strings from embedded python

Reply via email to