>>> How do get a printable unicode version of these path strings if they >>> contain none unicode data? >> >> Define "printable". One way would be to use a regular expression, >> replacing all codes in a certain range with a question mark. > > What I mean by printable is that the string must be valid unicode > that I can print to a UTF-8 console or place as text in a UTF-8 > web page. > > I think your PEP gives me a string that will not encode to > valid UTF-8 that the outside of python world likes. Did I get this > point wrong?
You are right. However, if your *only* requirement is that it should be printable, then this is fairly underspecified. One way to get a printable string would be this function def printable_string(unprintable): return "" This will always return a printable version of the input string... > In our application we are running fedora with the assumption that the > filenames are UTF-8. When Windows systems FTP files to our system > the files are in CP-1251(?) and not valid UTF-8. That would be a bug in your FTP server, no? If you want all file names to be UTF-8, then your FTP server should arrange for that. > Having an algorithm that says if its a string no problem, if its > a byte deal with the exceptions seems simple. > > How do I do this detection with the PEP proposal? > Do I end up using the byte interface and doing the utf-8 decode > myself? No, you should encode using the "strict" error handler, with the locale encoding. If the file name encodes successfully, it's correct, otherwise, it's broken. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list