"Amos Anderson" <amosander...@gmail.com> wrote in message
news:a073a9cf0906242007k5067314dn8e9d7b1c6da62...@mail.gmail.com...
I've run into a bit of an issue iterating through files in python 3.0 and
3.1rc2. When it comes to a files with '\u200b' in the file name it gives
the
error...
Traceback (most recent call last):
File "ListFiles.py", line 19, in <module>
f.write("file:{0}\n".format(i))
File "c:\Python31\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u200b' in
position
30: character maps to <undefined>
Code is as follows...
import os
f = open("dirlist.txt", 'w')
for root, dirs, files in os.walk("C:\\Users\\Filter\\"):
f.write("root:{0}\n".format(root))
f.write("dirs:\n")
for i in dirs:
f.write("dir:{0}\n".format(i))
f.write("files:\n")
for i in files:
f.write("file:{0}\n".format(i))
f.close()
input("done")
The file it's choking on happens to be a link that internet explorer
created. There are two files that appear in explorer to have the same name
but one actually has a zero width space ('\u200b') just before the .url
extension. In playing around with this I've found several files with the
same character throughout my file system. OS: Vista SP2, Language: US
English.
Am I doing something wrong or did I find a bug? It's worth noting that
Python 2.6 just displays this character as a ? just as it appears if you
type dir at the windows command prompt.
In Python 3.x strings default to Unicode. Unless you choose an encoding,
Python will use the default system encoding to encode the Unicode strings
into a file. On Windows, the filesystem uses Unicode and supports the full
character set, but cp1252 (on your system) is the default text file
encoding, which doesn't support zero-width space. Specify an encoding for
the output file such as UTF-8:
f=open('blah.txt','w',encoding='utf8')
f.write('\u200b')
1
f.close()
-Mark
--
http://mail.python.org/mailman/listinfo/python-list