On Apr 13, 5:57 am, "Ben" <[EMAIL PROTECTED]> wrote: > I'm left with some legacy code using plain oldstr, and I need to make > sure it works withunicodeinput/output. I have a simple plan to do > this: > > - Run the code with "python -U" so all the string literals > becomeunicodelitrals.
Requiring that the code is always run with a non-default argument doesn't seem very robust/portable to me. > - Add this statement > > str=unicode > > to all .py files so the type comparison (e.g., type('123') ==str) > would work. > IMVHO (1) doing that merely changes "legacy code" to "kludged legacy code" (2) there is no substitute for reading the code and trying to nut out what it is doing. Do you mean that those two things are the ONLY changes you plan to make? > Did I miss anything? Does this sound like a workable plan? Do you need to make sure it still works with ASCII input? With input in some other encoding e.g. cp1252? What do you mean by "unicode input"? Bear in mind that if you want to work with Python unicode objects internally, input from a file / socket / whatever will need to be decoded i.e. you will have to read the code and make appropriate changes. Data stored in (say) utf_16_le encoding is not "unicode" in the sense that you need; it still has to be decoded. What do you mean by "unicode output"? You are going to need to encode your output. This doesn't work; the output is not "unicode" in any meaningful sense: >>> f = open(u'uout', u'w') ### Warning: you need to hope that all builtins etc that you are calling cope with unicode arguments as well as the above one does. >>> f.write(u'abcde\n') >>> f.close() >>> open(u'uout', u'rb').read() 'abcde\r\n' This doesn't work; it crashes. >>> f = open('uout2', u'w') >>> f.write(u'abcde\xff\n') Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'ascii' codec can't encode character u'\xff' in position 5: ordinal not in range(128) >>> Some object methods work differently with unicode; e.g. (1) str.translate and unicode.translate. (2) >>> 'abc\xA0def'.split() ['abc\xa0def'] >>> u'abc\xA0def'.split() [u'abc', u'def'] NameError: name 'isspace' is not defined >>> '\xA0'.isspace() False >>> u'\xA0'.isspace() True >>> HTH, John -- http://mail.python.org/mailman/listinfo/python-list