On Wed, 2009-08-05 at 16:43 +0200, Michael Ströder wrote: > These both expressions are equivalent but which is faster or should be used > for any reason? > > u = unicode(s,'utf-8') > > u = s.decode('utf-8') # looks nicer
It is sometimes non-obvious which constructs are faster than others in Python. I also regularly have these questions, but it's pretty easy to run quick (albeit naive) benchmarks to see. The first thing to try is to have a look at the bytecode for each: >>> import dis >>> dis.dis(lambda s: s.decode('utf-8')) 1 0 LOAD_FAST 0 (s) 3 LOAD_ATTR 0 (decode) 6 LOAD_CONST 0 ('utf-8') 9 CALL_FUNCTION 1 12 RETURN_VALUE >>> dis.dis(lambda s: unicode(s, 'utf-8')) 1 0 LOAD_GLOBAL 0 (unicode) 3 LOAD_FAST 0 (s) 6 LOAD_CONST 0 ('utf-8') 9 CALL_FUNCTION 2 12 RETURN_VALUE The presence of LOAD_ATTR in the first form hints that this is probably going to be slower. Next, actually try it: >>> import timeit >>> timeit.timeit('"foobarbaz".decode("utf-8")') 1.698289155960083 >>> timeit.timeit('unicode("foobarbaz", "utf-8")') 0.53305888175964355 So indeed, uncode(s, 'utf-8') is faster by a fair margin. On the other hand, unless you need to do this in a tight loop several tens of thousands of times, I'd prefer the slower form s.decode('utf-8') because it's, as you pointed out, cleaner and more readable code. Cheers, Jason. -- http://mail.python.org/mailman/listinfo/python-list