[issue32040] Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths
New submission from QbLearningPython : While testing a module, I have found a weird behaviour of pathlib package. I have a list of pathlib.Paths and I sorted() it. I assumed that the order retrieved by sorting a list of Paths would be the same as the order retrieved by sorting the list of their corresponding (string) filenames. But it is not the case. I run the following example: == from pathlib import Path # order string filenames filenames_for_testing = ( '/spam/spams.txt', '/spam/spam.txt', '/spam/another.txt', '/spam/binary.bin', '/spam/spams/spam.ttt', '/spam/spams/spam01.txt', '/spam/spams/spam02.txt', '/spam/spams/spam03.ppp', '/spam/spams/spam04.doc', ) sorted_filenames = sorted(filenames_for_testing) # output ordered list of string filenames print() print("Ordered list of string filenames:") print() [print(f'\t{element}') for element in sorted_filenames] print() # order paths (build from same string filenames) paths_for_testing = [ Path(filename) for filename in filenames_for_testing ] sorted_paths = sorted(paths_for_testing) # outoput ordered list of pathlib.Paths print() print("Ordered list of pathlib.Paths:") print() [print(f'\t{element}') for element in sorted_paths] print() # compare print() if sorted_filenames == [str(path) for path in sorted_paths]: print('Ordered lists of string filenames and pathlib.Paths are EQUAL.') else: print('Ordered lists of string filenames and pathlib.Paths are DIFFERENT.') for element in range(0, len(sorted_filenames)): if sorted_filenames[element] != str(sorted_paths[element]): print() print('First different element:') print(f'\tElement #{element}') print(f'\t{sorted_filenames[element]} != {sorted_paths[element]}') break print() == The output of this script was: == Ordered list of string filenames: /spam/another.txt /spam/binary.bin /spam/spam.txt /spam/spams.txt /spam/spams/spam.ttt /spam/spams/spam01.txt /spam/spams/spam02.txt /spam/spams/spam03.ppp /spam/spams/spam04.doc Ordered list of pathlib.Paths: /spam/another.txt /spam/binary.bin /spam/spam.txt /spam/spams/spam.ttt /spam/spams/spam01.txt /spam/spams/spam02.txt /spam/spams/spam03.ppp /spam/spams/spam04.doc /spam/spams.txt Ordered lists of string filenames and pathlib.Paths are DIFFERENT. First different element: Element #3 /spam/spams.txt != /spam/spams/spam.ttt == As you can see, 'spam/spams.txt' goes in different places if you have sorted by pathlib.Paths than if you have sorted by string filenames. I think that it is weird that sorting pathlib.Paths yields a different result than sorting their string filenames. I think that pathlib.Paths should be ordered by alphabetical order of their corresponding filenames. Thank you. -- components: Extension Modules messages: 306304 nosy: QbLearningPython priority: normal severity: normal status: open title: Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths type: behavior versions: Python 3.6 ___ Python tracker <https://bugs.python.org/issue32040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32040] Sorting pahtlib.Paths does give the same order as sorting the (string) filenames of that pathlib.Paths
QbLearningPython added the comment: Thanks, serhiy.storchaka, for your answer. I am not fully convinced. You have described the current behaviour of the pathlib package. But let me ask: should be this the desired behaviour? Since string filenames and pathlib.Paths are different ways to refer to the same object (a path in a filesystem), should not be they behaved in the same way when sorting? You pointed out that the current behaviour is "more natural order" for pathlib.Paths. I am not truly sure about that. Can you please provide any citation or additional information about that? Thank you. -- ___ Python tracker <https://bugs.python.org/issue32040> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue33660] pathlib.Path.resolve() returns path with double slash when resolving a relative path in root directory
New submission from QbLearningPython : I have recently found a weird behaviour while trying to resolve a relative path located on the root directory on a macOs. I tried to resolve a Path('spam') and the interpreter answered PosixPath('//spam') —double slash for root— instead of (my) expected PosixPath('/spam'). I think that this is a bug. I ran the interpreter from root directory (cd /; python). Once running the interpreter, this is what I did: >>> import pathlib >>> pathlib.Path.cwd() PosixPath('/') # since the interpreter has been launched from root >>> p = pathlib.Path('spam') >>> p PosixPath('spam') # just for checking >>> p.resolve() PosixPath('//spam') # beware of double slash instead of single slash I also checked the behaviour of Path.resolve() in a non-root directory (in my case launching the interpreter from /Applications). >>> import pathlib >>> pathlib.Path.cwd() PosixPath('/Applications') >>> p = pathlib.Path('eggs') >>> p PosixPath('eggs') >>> p.resolve() PosixPath('/Applications/eggs') # just one slash as root in this case (as should be) So it seems that double slashes just appear while resolving relative paths in the root directory. More examples are: >>> pathlib.Path('spam/egg').resolve() PosixPath('//spam/egg') >>> pathlib.Path('./spam').resolve() PosixPath('//spam') >>> pathlib.Path('./spam/egg').resolve() PosixPath('//spam/egg') but >>> pathlib.Path('').resolve() PosixPath('/') >>> pathlib.Path('.').resolve() PosixPath('/') Intriguingly, >>> pathlib.Path('spam').resolve().resolve() PosixPath('/spam') # 'spam'.resolve = '//spam' # '//spam'.resolve = '/spam'!!! >>> pathlib.Path('//spam').resolve() PosixPath('/spam') I have found the same behaviour in several Python versions: Python 3.6.5 (default, May 15 2018, 08:20:57) [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin Python 3.4.8 (default, Mar 29 2018, 16:18:25) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin Python 3.5.5 (default, Mar 29 2018, 16:22:58) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin Python 3.7.0b4 (default, May 4 2018, 22:01:49) [Clang 9.1.0 (clang-902.0.39.1)] on darwin All running on: macOs High Sierra 10.13.4 (17E202) There is also confirmation of same issue on Ubuntu 16.04 (Python 3.5.2) and Opensuse tumbleweed (Python 3.6.5) I have searched for some information on this issue but I did not found anything useful. Python docs (https://docs.python.org/3/library/pathlib.html) talks about "UNC shares" but this is not the case (in using a macOs HFS+ filesystem). PEP 428 (https://www.python.org/dev/peps/pep-0428/) says: Multiple leading slashes are treated differently depending on the path flavour. They are always retained on Windows paths (because of the UNC notation): >>> PureWindowsPath('//some/path') PureWindowsPath('//some/path/') On POSIX, they are collapsed except if there are exactly two leading slashes, which is a special case in the POSIX specification on pathname resolution [8] (this is also necessary for Cygwin compatibility): >>> PurePosixPath('///some/path') PurePosixPath('/some/path') >>> PurePosixPath('//some/path') PurePosixPath('//some/path') I do not think that this is related to the aforementioned issue. However, I also checked the POSIX specification link (http://pubs.opengroup.org/onlinepubs/009...#tag_04_11) and found: A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash. I do not really think that this can cause a double slashes while resolving a relative path on macOs. So, I think that this issue could be a real bug in pathlib.Path.resolve() method. Specifically on POSIX flavour. A user of Python Forum (killerrex) and I have traced the bugs to Lib/pathlib.py:319 in the Python 3.6 repository https://github.com/python/cpython/blob/3...pathlib.py. Specifically, in line 319: newpath = path + sep + name For pathlib.Path('spam').resolve() in the root directory, newpath is '//spam' since: path is '/' sep is '/' name is 'spam' killerrex has suggested two solutions: 1) from line 345 base = '' if path.is_absolute() else os.getcwd() if base == sep