Abstract -------- I recently had a look at Bug 173097 (Cannot delete a file with "invalid" characters in its name), and unfortunately, this seems to be a surprisingly difficult issue to fix with how KIO is currently designed.
The following should document the current state to the best of my understanding, and ideally spurn discussion on how to improve the situation. The root of the issue here is basically the way Qt handles file paths, but this leads to two kinds of issues: a) How we use Qt for file operations inside of KIO and b) restrictions inherent to KIO's API. Qt and file paths ----------------- Now, why is Qt causing issues? Well, paths are in general represented as QStrings. For instance, QFile's constructor takes a QString as an argument, and QDir::entryList returns a QStringList. According to the documentation “QString stores a string of 16-bit QChars, where each QChar corresponds one Unicode 4.0 character.” However, for file paths, this is overly restrictive: Most Unix file systems allow arbitrary file names, as long as they do not contain '\0' and '/'. NTFS has more restrictions, but still less than “valid Unicode string” as fa ras I can tell. Note that as far as I can tell, functions like QFile::encodeName don't help at all here, because QString upon construction already replaced invalid sequences with a “replacement character”, and there is no way to get the original byte sequence back. This leads me to the conclusion that Qt is currently inadequate for handling arbitrary file names, as opposed to e.g. Boost::Filesystem or the new std::filesystem, or other languages functionality like Python's os module and Rust's fs::path. How this affects KIO -------------------- Implementation wise, KIO naturally uses Qt functionality for the most part, except in some low level layers like the new polkit integration, where platform native functionality is used. As an example, we can look at the Trash KIO Slave. Paths are generally stored as QStrings and TrashImpl::listDir uses QDir::entryList internally. To fix this, we would need to replace the usage of QString with something that preserves arbitrary data, like a QByteArray. Furthermore, Qt's file handling functionality would need to be replaced with something else, like the platform native functions, or some abstractions provided by another library. As you can guess, this obviously already creates quite a lot of code churn. However, even if we did this, it wouldn't actually help all too much. The reason for this is KIO's API. If, for instance, we take a look at KIO::SlaveBase::listDir, we see that it takes a QUrl as its argument. And, surprise, surprise, QUrl's constructor takes a QString. So even if a slave internally only works with a byte preserving representation of paths, clients like Dolphin cannot currently tell it to display the correct path. Without changing the API, the only way I see out here would be to use something like base64 encoding to transmit the path and for usage in UDS_NAME, and have the decoded string in UDS_DISPLAY_NAME. Why we should care ------------------ Some might argue that non Unicode compatible file names are a rare edge case, and in the greater scheme of things, this might even be true. However, Bug 206761 had 101 votes, and 173097 has accumulated 45 votes over its lifetime, which IMHO indicates that some of our users are affected by this. (Nota Bene: Some of the issues in 206761 might already have been mitigated by the usage of QFile::encodeName in appropriate places, but this does only covers a subset of cases). What can we do -------------- Well, we can complain about Qt (I'm not sure if there already is a bug report about Qt's path handling), but even if they care and want to change this, we can assume that this probably won't happen before Qt 6, considering that this touches an essential part of Qt Core. Besides that, I actually have no idea what the correct course of action is. As outlined above, porting KIO away from Qt's file handling is a large scale task, for which I alone realistically have neither the time nor the energy. Also, even then there's still the question how to maintain API and ABI compatibility regarding KIO's current QUrl based interface. So does anyone else have brilliant ideas, provocative thoughts, found a flaw or mistake in what I've wrote or wants to tell us their favourite bike shed colour?