New submission from STINNER Victor <victor.stin...@haypocalc.com>:

r72313 (PEP 383) created the PyUnicode_FSConverter() function: encode an object 
to a byte string using the default file system encoding. PyBytes and 
PyByteArray are leaved unchanged (just increment the reference counter), 
PyUnicode is encoded to PyBytes (if the encoder produces something else, an 
error is raised).

In my opinion, a file name is a character string (Windows) or a byte string 
(POSIX) and a bytearray object is unexpected. Only few function support this 
type: no function of os.path accept them. In the Python test suite, no function 
use bytearray for filenames.

It's already complex to support 2 types (bytes and str) for filenames in 
os.path, I think that a third type is too much and has no real world use case 
(the module manipuling filenames is os.path and it doesn't support bytearray 
and nobody noticed that).

Suppport bytearray is complex because we need to acquire a lock (using 
PyObject_GetBuffer()) and release the lock 
(o->ob_type->tp_as_buffer->bf_releasebuffer(o, 0)... that's not really 
intuitive...). posixmodule.c uses functions bytes2str() and realease_bytes() to 
handle bytearray easily. But these functions are static and other modules have 
to reimplement them.

I propose the reject bytearray in PyUnicode_FSConverter(), or to simplify the 
API and fix Python to accept bytearray filename everywhere especially in 
os.path.

***

Reject bytearray in PyUnicode_FSConverter() is trivial only there is only one 
test that have to be changed in the whole Python3 test suite: 
test_empty_bytearray in test_bytes.py. This function shows that bytearray is 
complex and introduce subtle bugs (the lock/GIL issue). All code using 
PyUnicode_FSConverter() would become simpler.

Example:

----
/* Release the lock, decref the object. */
static void
release_bytes(PyObject* o)
{
        if (PyByteArray_Check(o))
                o->ob_type->tp_as_buffer->bf_releasebuffer(o, 0);
        Py_DECREF(o);
}

...
realease_byte(path);
----

becomes "Py_DECREF(path);" and release_bytes() can be removed.

And
------------
static char*
bytes2str(PyObject* o, int lock)
{
        if(PyBytes_Check(o))
                return PyBytes_AsString(o);
        else if(PyByteArray_Check(o)) {
                if (lock && PyObject_GetBuffer(o, NULL, 0) < 0)
                        /* On a bytearray, this should not fail. */
                        PyErr_BadInternalCall();
                return PyByteArray_AsString(o);
        } else {
                /* The FS converter should have verified that this
                   is either bytes or bytearray. */
                Py_FatalError("bad object passed to bytes2str");
                /* not reached. */
                return "";
        }
}

...
path = bytes2str(opath);
------------

becomes "path = PyBytes_AS_STRING(opath);" (or maybe "path = 
PyBytes_AsString(opath);" if you don't trust PyUnicode_FSConverter)  and 
bytes2str() can be removed.

***

Simplify the API means that bytes2str() and release_bytes() should become 
protected functions (not static, but prefixed by "_Py_") or maybe public ("Py_" 
prefix).

But the most complex part is to modify os.path to support bytearray, and this 
task would not be funny :-)

----------
components: Interpreter Core, Library (Lib), Unicode
messages: 103829
nosy: haypo
severity: normal
status: open
title: Don't accept bytearray as filenames, or simplify the API
versions: Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue8485>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to