On 9 November 2012 11:08, Helmut Jarausch <jarau...@igpm.rwth-aachen.de> wrote: > On Fri, 09 Nov 2012 10:37:11 +0100, Stefan Behnel wrote: > >> Helmut Jarausch, 09.11.2012 10:18: >>> probably I'm missing something. >>> >>> Using str(Arg) works just fine if Arg is a list. >>> But >>> str([],encoding='latin-1') >>> >>> gives the error >>> TypeError: coercing to str: need bytes, bytearray or buffer-like object, >>> list found >>> >>> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general. >>> Do I need to flatten any data structure which is normally excepted by str() >>> ? >> >> Funny idea to call this a bug in Python. What your code is asking for is to >> decode the object you pass in using the "latin-1" encoding. Since a list is >> not something that is "encoded", let alone in latin-1, you get an error, >> and actually a rather clear one. >> >> Note that this is not specific to Python3.3 or even 3.x. It's the same >> thing in Py2 when you call the equivalent unicode() function. >> > > For me it's not funny, at all.
I think the problem is that the str constructor does two fundamentally different things depending on whether you have supplied the encoding argument. From help(str) in Python 3.2: | str(object[, encoding[, errors]]) -> str | | Create a new string object from the given object. If encoding or | errors is specified, then the object must expose a data buffer | that will be decoded using the given encoding and error handler. | Otherwise, returns the result of object.__str__() (if defined) | or repr(object). | encoding defaults to sys.getdefaultencoding(). | errors defaults to 'strict'. So str(obj) returns obj.__str__() but str(obj, encoding='xxx') decodes a byte string (or a similar object) using a given encoding. In most cases obj will be a byte string and it will be equivalent to using obj.decode('xxx'). I think the help text is a little confusing. It says that encoding defaults to sys.getdefaultencoding() but doesn't clarify but this only applies if errors is given as a keyword argument since otherwise no decoding is performed. Perhaps the help text would be clearer if it listed the two operations as two separate cases e.g.: str(object) Returns a string object from object.__str__() if it is defined or otherwise object.__repr__(). Raises TypeError if the returned result is not a string object. str(bytes, [encoding[, errors]]) If either encoding or errors is supplied, creates a new string object by decoding bytes with the specified encoding. The bytes argument can be any object that supports the buffer interface. encoding defaults to sys.getdefaultencoding() and errors defaults to 'strict'. > Whenever Python3 encounters a bytestring it needs an encoding to convert it to > a string. Well actually Python 3.3 will happily convert it to a string using bytes.__repr__ if you don't supply the encoding argument: >>> str(b'this is a byte string') "b'this is a byte string'" > If I feed a list of bytestrings or a list of list of bytestrings to > 'str' , etc, it should use the encoding for each bytestring component of the > given data structure. You can always do: [str(obj, encoding='xxx') for obj in list_of_byte_strings] > How can I convert a data strucure of arbitrarily complex nature, which > contains > bytestrings somewhere, to a string? Using str(obj) or repr(obj). Of course this relies on the author of type(obj) defining the appropriate methods and writing the code that actually converts the object into a string. > This problem has arisen while converting a working Python2 script to > Python3.3. > Since Python2 doesn't have bytestrings it just works. In Python 2 ordinary strings are byte strings. > Tell me how to convert str(obj) from Python2 to Python3 if obj is an > arbitrarily complex data structure containing bytestrings somewhere > which have to be converted to strings with a given encoding? The str function when used to convert a non-string object into a string knows nothing about the object you provide except whether it has __str__ or __repr__ methods. The only processing that is done is to check that the returned result was actually a string: >>> class A: ... def __str__(self): ... return [] ... >>> a = A() >>> str(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: __str__ returned non-string (type list) Perhaps it would help if you would explain why you want the string object. I would only use str(complex_object) as something to print for debugging so I would actually want it to show me which strings were byte strings by marking them with a 'b' prefix and I would also want it to show non-ascii characters with a \x hex code as it already does: >>> a = [1, 2, b'caf\xe9'] >>> str(a) "[1, 2, b'caf\\xe9']" If I wanted to convert the object to a string in order to e.g. save it to a file or database then I would write a function to create the string that I wanted. I would only use str() to convert elementary types like int and float into strings. Oscar -- http://mail.python.org/mailman/listinfo/python-list