Re: inconvenient unicode conversion of non-string arguments

Leo Kislov Wed, 13 Dec 2006 02:07:09 -0800

Holger Joukl wrote:
> Hi there,
>
> I consider the behaviour of unicode() inconvenient wrt to conversion of
> non-string
> arguments.
> While you can do:
>
> >>> unicode(17.3)
> u'17.3'
>
> you cannot do:
>
> >>> unicode(17.3, 'ISO-8859-1', 'replace')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> TypeError: coercing to Unicode: need string or buffer, float found
> >>>
>
> This is somehow annoying when you want to convert a mixed-type argument
> list
> to unicode strings, e.g. for a logging system (that's where it bit me) and
> want to make sure that possible raw string arguments are also converted to
> unicode without errors (although by force).
> Especially as this is a performance-critical part in my application so I
> really
> do not like to wrap unicode() into some custom tounicode() function that
> handles
> such cases by distinction of argument types.
>
> Any reason why unicode() with a non-string argument should not allow the
> encoding and errors arguments?


There is reason: encoding is a property of bytes, it is not applicable
to other objects.

> Or some good solution to work around my problem?

Do not put undecoded bytes in a mixed-type argument list. A rule of
thumb working with unicode: decode as soon as possible, encode as late
as possible.

  -- Leo

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: inconvenient unicode conversion of non-string arguments

Reply via email to