On Sat, 28 Mar 2020 00:32:24 +0100 Mattias Andrée <maand...@kth.se> wrote:
Dear Mattias, > This sounds absolutely horrible. Non-pre-composed characters are not > widely well support and are often rendered terribly, some software > (like the Linux VT) cannot even rendering them. yes, the Linux VT is a good example. To really do the rendering properly, you need a font-library that basically has infinite context to draw characters. This is not possible in a terminal, but you can at least "reserve" one block for one grapheme cluster. To put it another way, though: It's not the problem of the application, but of the font renderer, and with complications in the TTF it only gets more and more complicated. > Why is even the kernel getting into encoding issues?, that should be > an application issue, not a kernel issue. A kernel should only know > bytes. Is it really a security issue? I like to compare it to IDN homograph attacks, where you replace characters like the letters a and e with homographs, in this case those from the cyrillic alphabet а and е (they are not the same, even though it looks like it!). It doesn't take much creativity to see that it's enough to register a domain https://аmаzon.com/ and trick people into visiting it. In Firefox, you can tell it to properly display all URLs with expanded non-ASCII forms, in this case as https://xn--mzon-43db.com/ and I can only recommend that to people. In the case of file systems, I would probably go with a comparable approach and in the kernel only work byte-wise, but when listing a directory with to equivalent file names, I would print both in such an expanded form. This would keep both sides happy. With best regards Laslo