Re: unicode() vs. s.decode()

John Machin Thu, 06 Aug 2009 19:03:10 -0700

Jason Tackaberry <tack <at> urandom.ca> writes:

> On Thu, 2009-08-06 at 01:31 +0000, John Machin wrote:


> > Suggested further avenues of investigation:
> > 
> > (1) Try the timing again with "cp1252" and "utf8" and "utf_8"
> > 
> > (2) grep "utf-8" <Python2.X_source_code>/Objects/unicodeobject.c
> 
> Very pedagogical of you. :)  Indeed, it looks like bigger player in the
> performance difference is the fact that the code path for unicode(s,
> enc) short-circuits the codec registry for common encodings (which
> includes 'utf-8' specifically), whereas s.decode('utf-8') necessarily
> consults the codec registry.

So the next question (the answer to which may benefit all users
of .encode() and .decode()) is:

    Why does consulting the codec registry take so long,
    and can this be improved?



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode() vs. s.decode()

Reply via email to