Re: how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Marko Rauhamaa
Random832 : > On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote: >> I think Windows also gets it almost write: NTFS uses UTF-16, and (I >> think) only allow valid Unicode file names. > > Nope. Windows allows any sequence of 16-bit units (except for a dozen or > so ASCII characters) in filename

Re: how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Random832
On Sun, Jun 12, 2016, at 12:50, Steven D'Aprano wrote: > I think Windows also gets it almost write: NTFS uses UTF-16, and (I > think) only allow valid Unicode file names. Nope. Windows allows any sequence of 16-bit units (except for a dozen or so ASCII characters) in filenames. Of course, you're

Re: how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Steven D'Aprano
On Sun, 12 Jun 2016 10:09 pm, Peter Volkov wrote: > Hi, everybody. > > What is a best practice to deal with filenames in python3? The problem is > that os.walk(src_dir), os.listdir(src_dir), ... return "surrogate" strings > as filenames. Can you give an example? > It is impossible to assume t

how to handle surrogate encoding: read from fs write to database

2016-06-12 Thread Peter Volkov
Hi, everybody. What is a best practice to deal with filenames in python3? The problem is that os.walk(src_dir), os.listdir(src_dir), ... return "surrogate" strings as filenames. It is impossible to assume that they are normal strings that could be print()'ed on unicode terminal or saved as as stri