Based on responses to my previous proposal, I am convinced that it was 
over-ambitious and not appropriate for inclusion in the Python standard 
library, so starting over with a more narrowly scoped suggestion.

Proposal:

Add a new function (possibly `os.path.sanitizepart`) to sanitize a value for 
use as a single component of a path. In the default case, the value must also 
not be a reference to the current or parent directory ("." or "..") and must 
not contain control characters.

When an invalid character is encountered, then `ValueError` will be raised in 
the default case, or the character may be replaced or escaped.
When an invalid name is encountered, then `ValueError` will be raised in the 
default case, or the first character may be replaced, escaped, or prefixed.
Control characters (those in the Unicode general category of "C") are treated 
as invalid by default.
After applying any transformations, if the result would still be invalid, then 
an exception is raised.

Proposed function signature: `sanitizepart(name, replace=None, escape=None, 
prefix=None, flags=0)`

When `replace` is supplied, it is used as a replacement for any invalid 
characters or for the first character of an invalid name. When `prefix` is not 
also supplied, this is also used as the replacement for the first character of 
the name if it is invalid, not simply due to containing invalid characters.

When `escape` is supplied (typically "%") it is used as the escape character in 
the same way that "%" is used in URL encoding. When a non-ASCII character is 
escaped, it is represented as a sequence of encoded bytes/octets. When `prefix` 
is not also supplied, this is also used to escape the first character of the 
name if it is invalid, not simply due to containing invalid characters.

`replace` and `escape` are mutually exclusive.

When `prefix` is supplied (typically "_"), it is prepended the name if it is 
invalid, not simply due to containing invalid characters.

Flags:
- path.PERMIT_RELATIVE (1): Permit relative path values ("." "..")
- path.PERMIT_CTRL (2): Permit characters in the Unicode general category of 
"C".
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/LRIKMG3G4I4YQNK6BTU7MICHT7X67MEF/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to