Jason Tackaberry <tack <at> urandom.ca> writes: > On Thu, 2009-08-06 at 01:31 +0000, John Machin wrote:
> > Suggested further avenues of investigation: > > > > (1) Try the timing again with "cp1252" and "utf8" and "utf_8" > > > > (2) grep "utf-8" <Python2.X_source_code>/Objects/unicodeobject.c > > Very pedagogical of you. :) Indeed, it looks like bigger player in the > performance difference is the fact that the code path for unicode(s, > enc) short-circuits the codec registry for common encodings (which > includes 'utf-8' specifically), whereas s.decode('utf-8') necessarily > consults the codec registry. So the next question (the answer to which may benefit all users of .encode() and .decode()) is: Why does consulting the codec registry take so long, and can this be improved? -- http://mail.python.org/mailman/listinfo/python-list