New submission from sogom <so...@outlook.jp>:

On Windows file system, U+03A9 (Greek capital letter Omega) and U+2126 (Ohm 
sign) are distinguished. In fact, two distinct files "\u03A9.txt" and 
"\u2126.txt" can exist side by side in the same folder. But os.path.normcase() 
transforms both U+03A9 and U+2126 to U+03C9 (Greek small letter omega).

MSDN reads they use CompareStringOrdinal() to compare NTFS file names: 
https://docs.microsoft.com/en-us/windows/win32/intl/handling-sorting-in-your-applications#sort-strings-ordinally
 . This document also says "the function maps case using the operating system 
*uppercasing* table." But I made an experiment and found that at least in the 
Basic Multilingual Plane, "lowercase two strings by means of LCMapStringEx() 
and then wcscmp the two" always gives the same result as "compare the two 
strings with CompareStringOrdinal()". Though this fact is not explicitly 
mentioned in MSDN 
https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-lcmapstringex
 , the description of LCMAP_LINGUISTIC_CASING in this page implies that casing 
rules conform to file system's unless LCMAP_LINGUISTIC_CASING is used.

Therefore, I believe that os.path.normcase() should probably call 
LCMapStringEx(), with the first argument LOCALE_NAME_INVARIANT and the second 
argument LCMAP_LOWERCASE.

----------
components: Windows
messages: 383163
nosy: paul.moore, sogom, steve.dower, tim.golden, zach.ware
priority: normal
severity: normal
status: open
title: os.path.normcase() is inconsistent with Windows file system
type: behavior
versions: Python 3.9

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42658>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to