Antoon Pardon <antoon.par...@rece.vub.ac.be> writes: > op 14-05-14 18:24, Akira Li schreef: >> Antoon Pardon <antoon.par...@rece.vub.ac.be> writes: >> >>> This is the code I run (python 3.3) >>> >>> host = ... >>> user = ... >>> passwd = ... >>> >>> from ftplib import FTP >>> >>> ftp = FTP(host, user, passwd) >>> ftp.mkd(b'NewDir') >>> ftp.rmd(b'NewDir') >>> >>> This is the traceback >>> >>> Traceback (most recent call last): >>> File "ftp-problem", line 9, in <module> >>> ftp.mkd(b'NewDir') >>> File "/usr/lib/python3.3/ftplib.py", line 612, in mkd >>> resp = self.voidcmd('MKD ' + dirname) >>> TypeError: Can't convert 'bytes' object to str implicitly >>> >>> The problem is that I do something like this in a backup program. >>> I don't know the locales that other people use. So I manipulate >>> all file and directory names as bytes. >>> >>> Am I doing something wrong? >> >> The error message shows that ftplib expects a string here, not bytes. >> You could use `ftp.mkd(some_bytes.decode(ftp.encoding))` as a >> workaround. > > Sure but what I like to know: Can this be considered a failing of > ftplib. Since python3 generally allows paths to be strings as > well as bytes can't we expect the same of ftplib? > > Especially as I assume that path will be converted to bytes anyway > in order to send it over the network.
bytes are supported for filenames because POSIX systems provide bytes-based interface e.g., on my system anything except / and NUL could be used. You can get away with passing opaque bytes filenames for some time. rfc 959 expects ascii filenames. rfc 2640 recommends UTF8 (if "feat" command returns it). rfc 3659: pathnames could be send as utf-8 *and* "raw". (plus CR LF or CR NUL or IAC or other telnet control codes handling). Using utf-8 might have security implications and some firewalls might interfere with OPTS command and FEAT response. Popular clients such as FileZilla may break on non-utf-8 filenames. It is less likely that ftp clients use the same character encoding and it is more likely that an ftp server performs some unexpected character encoding conversion despite it being non-standard-compliant. You could try to post on python-ideas mailing list anyway, to suggest the enhancement (support bytes where filenames are expected) for your backup application use case: - you might not avoid undecodable filenames -- UnicodeEncodeError in the current implementation if you pass Unicode string created using os.fsdecode(undecodable_bytes) to ftplib - non-python ftp clients should be able to access the content -- no Python error handlers such as surrogateescape or backslashreplace are allowed - set ftp.encoding to utf-8 and pass non-utf-8 filenames as bytes -- to avoid '\U0001F604'.encode('utf-8').decode(ftp.encoding) -- akira -- https://mail.python.org/mailman/listinfo/python-list