Petr JakeĀ wrote: > John, thanks for your extensive answer. > >> Hi, > >> I am using Python 2.4.3 on Fedora Core4 and "Eric3" Python IDE > >> . > >> Below mentioned code works fine in the Eric3 environment. While trying > >> to start it from the command line, it returns: > >> > >> Traceback (most recent call last): > >> File "pokus_1.py", line 5, in ? > >> print str(a) > >> UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in > >> position 6: ordinal not in range(128) > > JM> So print a works, but print str(a) crashes. > > JM> Instead, insert this: > JM> import sys > JM> print "default", sys.getdefaultencoding() > JM> print "stdout", sys.stdout.encoding > JM> and run your script at the command line. It should print: > JM> default ascii > JM> stdout x > **** in the command line it prints: ***** > default ascii > stdout UTF-8 > JM> here, and crash at the later use of str(a). > JM> Step 2: run your script under Eric3. It will print: > JM> default y > JM> stdout z > > **** in the Eric3 it prints: **** > if the # -*- Eencoding: utf_8 -*- is set than: > > default utf_8 > stdout > unhandled AttributeError, "AsyncFile instance has no attribute > 'encoding' " > > if the encoding is not set than it prints: > > DeprecationWarning: Non-ASCII character '\xc3' in file > /root/eric/analyza_dat_TPC/pokus_1.py on line 26, but no encoding > declared; see http://www.python.org/peps/pep-0263.html for details > execfile(sys.argv[0], self.debugMod.__dict__) > > default latin-1 > stdout > unhandled AttributeError, "AsyncFile instance has no attribute > 'encoding' " > > JM> and then should work properly. It is probable that x == y == z == > JM> 'utf-8' > JM> Step 3: see below. > > >> > >> ========== 8< ============= > >> #!/usr/bin python > >> # -*- Encoding: utf_8 -*- > > JM> There is no UTF8-encoded text in this short test script. Is the above > JM> encoding comment merely a carry-over from your real script, or do you > JM> believe it is necessary or useful in this test script? > Generally, I am working with string like u'DISKOV\xc1 POLE' (I am > getting it from the database) > > My intention to use >> # -*- Encoding: utf_8 -*- was to suppress > DeprecationWarnings if I use utf_8 in the code (like u'DISKOV\xc1 POLE') > > >> > >> a= u'DISKOV\xc1 POLE' > >> print a > >> print str(a) > >> ========== 8< ============= > >> > >> Even it looks strange, I have to use str(a) syntax even I know the "a" > >> variable is a string. > > JM> Some concepts you need to understand: > JM> (a) "a" is not a string, it is a reference to a string. > JM> (b) It is a reference to a unicode object (an implementation of a > JM> conceptual Unicode string) ... > JM> (c) which must be distinguished from a str object, which represents a > JM> conceptual string of bytes. > JM> (d) str(a) is trying to produce a str object from a unicode object. Not > JM> being told what encoding to use, it uses the default encoding > JM> (typically ascii) and naturally this will crash if there are non-ascii > JM> characters in the unicode object. > > >> I am trying to use ChartDirector for Python (charts for Python) and the > >> method "layer.addDataSet()" needs above mentioned syntax otherwise it > >> returns an Error. > > JM> Care to tell us which error??? > you can see the Error description and author comments here: > http://tinyurl.com/ezohe
You have two different episodes on that website; adding the one we have been discussing gives *three* different stories: Episode 1: The error description: "TypeError: Error converting argument 1 to type PCc" -- you should ask him "What is type PCc???" If arg 1 is an arbitrary str object, which byte values could it possibly be objecting to? The author comments: "The error code usually means the filename is not a text string, ..." (1) Input file or output file? Is it possible that one or more bytes are not allowable in a filename? (2) Is it possible for you to give him the exact args that you are passing in (use print repr(arg) before the call), and for him to tell you the *exact* reason, not the "usual" reason? Episode 2: Evidently arg is a str object, but passing in str(arg) and just plain arg give different results??? I doubt it. print repr(arg) and type(arg) and see what you've actually got there. > > >> > >> layer.addDataSet(data, colour, str(dataName)) > I have try to experiment with the code a bit. > the simplest code where I can demonstrate my problems: > #!/usr/bin python > import sys > print "default", sys.getdefaultencoding() > print "stdout", sys.stdout.encoding > > a=['P\xc5\x99\xc3\xad','Petr Jake\xc5\xa1'] > b="my nice try %s" % ''.join(a).encode("utf-8") So ''.join(a) is a str object, encoded in utf-8 *already*. Please try to understand: (1) unicode_object.encode('utf-8') produces a str_object # in utf-8 encoding (2) str_object.decode('utf-8') produces a unicode object # if str_object contains valid utf-8. (3) str_object.encode('anything') is a nonsense; it is the equivalent of str_object.decode('ascii').encode('anything') and will typically fail, as your next error message shows. What were you trying to do?? I don't understand the relationship between this little exercise and Episodes 1, 2, & 3. Try to concentrate on what your data is (u"DISKOetcetc" is a unicode string, but then you say that str(x) should be unnecessary because x is already a str object!?) and what you need to have to get it passed through to that package's methods. > print b > > When I run it from the command line i am getting: > sys:1: DeprecationWarning: Non-ASCII character '\xc3' in file pokus_1.py on > line 26, but no encoding declared; see > http://www.python.org/peps/pep-0263.html for details > > default ascii > stdout UTF-8 > > Traceback (most recent call last): > File "pokus_1.py", line 8, in ? > b="my nice try %s" % ''.join(a).encode("utf-8") > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1: > ordinal not in range(128) > As expected. Regards, John -- http://mail.python.org/mailman/listinfo/python-list