> On 27 Jun 2021, at 12:07, Zbigniew Jędrzejewski-Szmek <[email protected]>
> wrote:
>
> [this is a continuation of https://bugs.python.org/issue44452]
>
> pathlib.Path() has a concatenation operator "/" that allows the
> right-hand-side argument to be an absolute path, which causes the
> left-hand-side argument to be ignored:
>
>>>> pathlib.Path('/foo') / '/bar'
> PosixPath('/bar')
>>>> pathlib.Path('/var/tmp/instroot') / '/some/path' / '/suffix'
> PosixPath('/suffix')
>
> This follows the precedent set by os.path.join(), and probably makes
> sense in the scenario of simulating a user typing 'cd' commands in a
> shell.
>
> But it doesn't work nicely in the case of combining paths from
> two different "namespaces", where we never want to go "up".
>
> For example: a web server takes an URL, strips the host, and wants
> to look up a file:
> https://example.com/some/path → "/some/path" → /src/www/root + /some/path →
> /src/www/root/some/path
>
> or we are constructing a container image and need to refer to a file
> in the container:
> <container foo> + /etc/shadow → /var/lib/machines/foo + /etc/shadow →
> /var/lib/machines/foo/etc/shadow
>
> To do this kind of operation correctly with pathlib.Path, the user
> needs to do two operations: verify that the rhs argument contains
> no '..' [*], and strip leading slashes:
>
>>>> lhs = pathlib.Path('/some/namespace/')
>>>> rhs = '/some/path/to/add'
>>>> if '..' in pathlib.Path(rhs).parts: raise ValueError
>>>> path = lhs / rhs.lstrip('/')
>
> Those last two lines are rather verbose, non-obvious. Also the .lstrip()
> operation attaches on the right side, but operates on the left side, earlier
> than the "/", which is overall not very nice.
>
> Proposal:
>
> add "//"-operator to pathlib.PosixPath() that means "concatenate a rhs path
> that is underneath the lhs". It would disallow paths with '..', and
> concatenate
> paths as relative to the specified lhs:
>
>>>> lhs = pathlib.Path('/some/namespace/')
>>>> lhs // "a/b/c"
> PosixPath('/some/namespace/a/b/c')
>>>> lhs // "/a/b/c"
> PosixPath('/some/namespace/a/b/c')
>>>> lhs // "a/../b/c"
> ValueError: cannot use // with a path with '..' on the right
>
> This would be useful for operations on containers, combining paths from
> namespaces like fs paths and URL components, looking up files
> underneath an unpacked archive, etc.
>
> [*] Why completely disallow '..' ? Components with '..' cannot be
> correctly resolved without access to the filesystem, because a
> component may be a symlink, and then "a/b/../." may not be "a/.", but
> something completely different. Thus, since the goal is to have a path
> underneath lhs, I think it's best to forbid '..'. In principle '..' at
> the beginning can be resolved reliably, by simply ignoring it,
> '/../../../whatever' is the same as '/whatever/'. But it's a tiny
> corner case, and I think it's better to disallow that too.
There are two ideas here.
1. Allow Path() to join a pair of absolute paths.
2. Prevent '..' from escaping into the first absolute path.
For (1) you can do this today:
>>> root=Path('/var/www')
>>> root / y.relative_to('/')
PosixPath('/var/www/a/b')
>>>
I can think if a number of rules that might apply for (2).
(a) raise an error is there is a '..' or '.' in any path component.
(b) resolve() '..' and ',' as pathlib already does
- I'm not sure that use of the filesystem is needed to validate the use of ..
is always needed.
>>> y=Path('/a/b/../v.html')
>>> y.relative_to('/')
PosixPath('a/b/../v.html')
>>> root / y.relative_to('/')
PosixPath('/var/www/a/b/../v.html')
>>> root / y.resolve().relative_to('/')
PosixPath('/var/www/a/v.html')
and show that no escape to root happens:
>>> y=Path('/../a//v.html')
>>> root / y.resolve().relative_to('/')
PosixPath('/var/www/a/v.html')
>>>
Barry
>
> Zbyszek
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/IXYPKVINLD57BOV6VHU4U4ZJCQCQPAHT/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/6HR4IAUAUIQXK5SJAWKVFVOFZ374C4W3/
Code of Conduct: http://python.org/psf/codeofconduct/