New submission from sogom <so...@outlook.jp>:
On Windows file system, U+03A9 (Greek capital letter Omega) and U+2126 (Ohm sign) are distinguished. In fact, two distinct files "\u03A9.txt" and "\u2126.txt" can exist side by side in the same folder. But os.path.normcase() transforms both U+03A9 and U+2126 to U+03C9 (Greek small letter omega). MSDN reads they use CompareStringOrdinal() to compare NTFS file names: https://docs.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications#sort-strings-ordinally . This document also says "the function maps case using the operating system *uppercasing* table." But I made an experiment and found that at least in the Basic Multilingual Plane, "lowercase two strings by means of LCMapStringEx() and then wcscmp the two" always gives the same result as "compare the two strings with CompareStringOrdinal()". Though this fact is not explicitly mentioned in MSDN https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex , the description of LCMAP_LINGUISTIC_CASING in this page implies that casing rules conform to file system's unless LCMAP_LINGUISTIC_CASING is used. Therefore, I believe that os.path.normcase() should probably call LCMapStringEx(), with the first argument LOCALE_NAME_INVARIANT and the second argument LCMAP_LOWERCASE. ---------- components: Windows messages: 383163 nosy: paul.moore, sogom, steve.dower, tim.golden, zach.ware priority: normal severity: normal status: open title: os.path.normcase() is inconsistent with Windows file system type: behavior versions: Python 3.9 _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42658> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com