On Dec 10 11:15, Nikolay Ilychev wrote: > Hello! > > When using cygwin, i can't list, copy, remove files and directories > with 128 utf-8 symbol long names. > > useless examples that illustrates the problem: > [...] > same problem with other tools - find, perl, rsync from cygwin repo. > > Please, make the MAX_PATH not for 260 bytes, but 260 utf-8 symbols.
Easier said than done. First of all, this is NOT about MAX_PATH. MAX_PATH (260 chars) is the number of characters allowed in the Win32 ANSI file API for a complete path, including the terminating null. Cygwin is using the native NT API and, occasionally, the Win32 UNICODE file API, which allows paths of up to 32767 chars. The problem here is about NAME_MAX. NAME_MAX is per POSIX[1] the "maximum number of bytes in a filename (not including the terminating null)." Note the word *bytes*. Not characters, bytes. UTF-8 chars are 1 to 4 bytes in length. Thus, the maximum number of UTF-8 chars in a filename is potentially less than NAME_MAX: A filename of chars only from the basic latin charset (1 byte in UTF-8) may consist of NAME_MAX characters, a filename solely constructed from chars of the latin-1 supplement (2 byte chars) may consist of NAME_MAX / 2 characters, a filename constructed from emoticons (4 byte chars) only of NAME_MAX / 4 chars. Ok, so we all know that Windows is not using a byte representation of filenames, rather the OS uses UTF-16 to store and handle filenames internally. Filename on Windows filesystems may consist of 255 UTF-16 chars[2]. How do you represent this in a byte-oriented POSIX system? What do you set NAME_MAX to? You can't get it right due to the unfortunate multibyte vs. UTF-16 encoding issue. To cover all UTF-8 chars, NAME_MAX would have to be 1020. But then, applications relying on NAME_MAX will be surprised by ENAMETOOLONG errors for perfectly valid POSIX filenames. If you make it 255, applications will be surprised by ENAMETOOLONG errors for perfectly valid Windows filenames. If you make it 255 on the application level but then return filenames longer than 255 multibyte chars to the application, they will crash due to buffer overflow issues. After all, NAME_MAX is a contractual obligation. There was also the backward compatibility issue. Back in the pre-Cygwin 1.7 days, when Cygwin used the ANSI file API, NAME_MAX was already 255. Changing that to a bigger value might have resulted in the aforementioned application crashes due to buffer overflows as well. So we decided to keep NAME_MAX at the same value as it always was, 255. This restricts the actual filename length when using multibyte characters just as on any other POSIX system with the downside that, occasionally, a Windows filename will be too long to handle. Sorry if that is frustrating in your current situation, but this isn't something we can just change at a whim and go ahead. It would break compatibility with all existing Cygwin executables. Corinna [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/limits.h.html [2] However, this does *not* cover NFS or other filesystems using a byte representation for storing filenames. -- Corinna Vinschen Please, send mails regarding Cygwin to Cygwin Maintainer cygwin AT cygwin DOT com Red Hat
pgp2au0HOT0is.pgp
Description: PGP signature