Based on responses to my previous proposal, I am convinced that it was
over-ambitious and not appropriate for inclusion in the Python standard
library, so starting over with a more narrowly scoped suggestion.
Proposal:
Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for
use as a single component of a path. In the default case, the value must also
not be a reference to the current or parent directory ("." or "..") and must
not contain control characters.
When an invalid character is encountered, then `ValueError` will be raised in
the default case, or the character may be replaced or escaped.
When an invalid name is encountered, then `ValueError` will be raised in the
default case, or the first character may be replaced, escaped, or prefixed.
Control characters (those in the Unicode general category of "C") are treated
as invalid by default.
After applying any transformations, if the result would still be invalid, then
an exception is raised.
Proposed function signature: `sanitizepart(name, replace=None, escape=None,
prefix=None, flags=0)`
When `replace` is supplied, it is used as a replacement for any invalid
characters or for the first character of an invalid name. When `prefix` is not
also supplied, this is also used as the replacement for the first character of
the name if it is invalid, not simply due to containing invalid characters.
When `escape` is supplied (typically "%") it is used as the escape character in
the same way that "%" is used in URL encoding. When a non-ASCII character is
escaped, it is represented as a sequence of encoded bytes/octets. When `prefix`
is not also supplied, this is also used to escape the first character of the
name if it is invalid, not simply due to containing invalid characters.
`replace` and `escape` are mutually exclusive.
When `prefix` is supplied (typically "_"), it is prepended the name if it is
invalid, not simply due to containing invalid characters.
Flags:
- path.PERMIT_RELATIVE (1): Permit relative path values ("." "..")
- path.PERMIT_CTRL (2): Permit characters in the Unicode general category of
"C".
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/LRIKMG3G4I4YQNK6BTU7MICHT7X67MEF/
Code of Conduct: http://python.org/psf/codeofconduct/