STINNER Victor <vstin...@python.org> added the comment:

I'm in favor of changing the default encoding to UTF-8, but it requires good 
documentation, especially to provide a solution working on Python 3.8 and 3.9 
to change the encoding (see below).

--

The encoding is used to encode commands with the FTP server and decode the 
server replies. I expect that most replies are basically letters, digits and 
spaces. I guess that the most problematic commands are:

* send user and password
* decode filenames of LIST command reply
* encode filename in STOR command

I expect that the original FTP protocol doesn't specify any encoding and so 
that FTP server implementations took some freedom. I would not be surprised to 
use ANSI code pages used on servers running on Windows.

Currently, encoding is a class attribute: it's not convenient to override it 
depending on the host. I would prefer to have a new parameter for the 
constructor.

Giampaolo:
> some servers may enable UTF-8 only if client explicitly sends "OPTS UTF-8 ON" 
> first, but that is based on an draft RFC. Server implementors usually treat 
> this command as a no-op and simply assume UTF-8 as the default.
> With that said, I am -1 about implementing logic based on FEAT/OPTS: that 
> should be done before login, and at that point some servers may erroneously 
> reject any command other than USER, PASS and ACCT. 

Oh. In this case, always send "OPTS UTF-8 ON" just after the socket is 
connected sounds like a bad idea.


Sebastian:
> Since RFC 2640, the industry standard within FTP Clients is UTF-8 (see e.g. 
> FileZilla here: https://wiki.filezilla-project.org/Character_Encoding, or 
> WinSCP here: https://winscp.net/eng/docs/faq_utf8).

"Internationalization of the File Transfer Protocol" was published in 1999. It 
recommends the UTF-8. Following a RFC recommendation is good argument to change 
the default encoding to UTF-8.
https://tools.ietf.org/html/rfc2640


Giampaolo:
> Personally I think it makes more sense to just use UTF-8 without going 
> through a deprecation period

I concur. Deprecation is usually used for features which are going to be 
removed (module, function or function parameter). Here it's just about a 
default parameter value. I expect to have encoding="utf-8" default in the 
constructor.

The annoying part is that Python 3.8 only has a class attribute. The simplest 
option seems to be creating a FTP object, modify its encoding attribute and 
*then* logs in. Another options is to subclass the FTP class. IMO the worst is 
to modify ftplib.FTP.encoding attribute (monkey patch the module).

I expect that most users use username, password and filenames encodable to 
ASCII and so will not notify the change to UTF-8. We can document a solution 
working on all Python versions to use different encoding name.

----------
nosy: +vstinner

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39380>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to