[issue13538] Docstring of str() and/or behavior
New submission from Guillaume Bouchard : The docstring associated with str() says: str(string[, encoding[, errors]]) -> str Create a new string object from the given encoded string. encoding defaults to the current default string encoding. errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'. When it is stated in the on-line documentation:: When only object is given, this returns its nicely printable representation. My issue comes when I tried to convert bytes to str. As stated in the documentation, and to avoid implicit behavior, converting str to bytes cannot be done without giving an encoding (using bytes(my_str, encoding=..) or my_str.encode(...). bytes(my_str) will raise a TypeError). But if you try to convert bytes to str using str(my_bytes), python will returns you the so-called nicely printable representation of the bytes object). ie. :: >>> bytes("foo") Traceback (most recent call last): File "", line 1, in TypeError: string argument without an encoding >>> str(b"foo") "b'foo'" As a matter of coherency and to avoid silent errors, I suggest that str() of a byte object without encoding raise an exception. I think it is usually what people want. If one wants a *nicely printable representation* of their bytes object, they can call explicitly the repr() function and will quickly see that what they just printed is wrong. But if they want to convert a byte object to its unicode representation, they will prefer an exception rather than a silently failing converting which leads to an unicode string starting with 'b"' and ending with '"'. -- components: Interpreter Core messages: 148914 nosy: Guillaume.Bouchard priority: normal severity: normal status: open title: Docstring of str() and/or behavior versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue13538> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13538] Docstring of str() and/or behavior
Guillaume Bouchard added the comment: > str always falls back to the repr; in general str(obj) should always return > some value, otherwise the assumptions of a *lot* of Python code would be > broken. Perhaps it may raises a warning ? ie, the only reason encoding exists if for the conversion of bytes (or something which looks like bytes) to str. Do you think it may be possible to special case the use of str for bytes (and bytesarray) with something like this: def str(object, encoding=None, errors=None): if encoding is not None: # usual work else: if isinstance(object, (bytes, bytesarray)): warning('Converting bytes/bytesarray to str without encoding, it may not be what you expect') return object.__str__() But by the way, adding warnings and special case everywhere seems not too pythonic. > Do you want to propose a doc patch? The docstring for str() should looks like something like, in my frenglish way of writing english :: Create a new string object from the given encoded string. If object is bytes, bytesarray or a buffer-like object, encoding and error can be set. errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'. WARNING, if encoding is not set, the object is converted to a nicely printable representation, which is totally different from what you may expect. Perhaps a warning may be added in the on-line documentation, such as :: .. warning:: When str() converts a bytes/bytesarray or a buffer-like object and *encoding* is not specified, the result will an unicode nicely printable representation, which is totally different from the unicode representation of you object using a specified encoding. Whould you like a .diff on top of the current mercurial repository ? -- ___ Python tracker <http://bugs.python.org/issue13538> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com