STINNER Victor <vstin...@python.org> added the comment:
I'm in favor of changing the default encoding to UTF-8, but it requires good documentation, especially to provide a solution working on Python 3.8 and 3.9 to change the encoding (see below). -- The encoding is used to encode commands with the FTP server and decode the server replies. I expect that most replies are basically letters, digits and spaces. I guess that the most problematic commands are: * send user and password * decode filenames of LIST command reply * encode filename in STOR command I expect that the original FTP protocol doesn't specify any encoding and so that FTP server implementations took some freedom. I would not be surprised to use ANSI code pages used on servers running on Windows. Currently, encoding is a class attribute: it's not convenient to override it depending on the host. I would prefer to have a new parameter for the constructor. Giampaolo: > some servers may enable UTF-8 only if client explicitly sends "OPTS UTF-8 ON" > first, but that is based on an draft RFC. Server implementors usually treat > this command as a no-op and simply assume UTF-8 as the default. > With that said, I am -1 about implementing logic based on FEAT/OPTS: that > should be done before login, and at that point some servers may erroneously > reject any command other than USER, PASS and ACCT. Oh. In this case, always send "OPTS UTF-8 ON" just after the socket is connected sounds like a bad idea. Sebastian: > Since RFC 2640, the industry standard within FTP Clients is UTF-8 (see e.g. > FileZilla here: https://wiki.filezilla-project.org/Character_Encoding, or > WinSCP here: https://winscp.net/eng/docs/faq_utf8). "Internationalization of the File Transfer Protocol" was published in 1999. It recommends the UTF-8. Following a RFC recommendation is good argument to change the default encoding to UTF-8. https://tools.ietf.org/html/rfc2640 Giampaolo: > Personally I think it makes more sense to just use UTF-8 without going > through a deprecation period I concur. Deprecation is usually used for features which are going to be removed (module, function or function parameter). Here it's just about a default parameter value. I expect to have encoding="utf-8" default in the constructor. The annoying part is that Python 3.8 only has a class attribute. The simplest option seems to be creating a FTP object, modify its encoding attribute and *then* logs in. Another options is to subclass the FTP class. IMO the worst is to modify ftplib.FTP.encoding attribute (monkey patch the module). I expect that most users use username, password and filenames encodable to ASCII and so will not notify the change to UTF-8. We can document a solution working on all Python versions to use different encoding name. ---------- nosy: +vstinner _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue39380> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com