On Tue, Nov 29, 2005 at 09:48:15AM +0100, David Siroky wrote: > Hi! > > I need to enlighten myself in Python unicode speed and implementation. > > My platform is AMD [EMAIL PROTECTED] (x86-32), Debian, Python 2.4. > > First a simple example (and time results): > > x = "a"*50000000 > real 0m0.195s > user 0m0.144s > sys 0m0.046s > > x = u"a"*50000000 > real 0m2.477s > user 0m2.119s > sys 0m0.225s > > So my first question is why creation of a unicode string lasts more then 10x > longer than non-unicode string?
string objects have the optimization described in the log message below. The same optimization hasn't been made to unicode_repeat, though it would probably also benefit from it. ------------------------------------------------------------------------ r30616 | rhettinger | 2003-01-06 04:33:56 -0600 (Mon, 06 Jan 2003) | 11 lines Optimize string_repeat. Christian Tismer pointed out the high cost of the loop overhead and function call overhead for 'c' * n where n is large. Accordingly, the new code only makes lg2(n) loops. Interestingly, 'c' * 1000 * 1000 ran a bit faster with old code. At some point, the loop and function call overhead became cheaper than invalidating the cache with lengthy memcpys. But for more typical sizes of n, the new code runs much faster and for larger values of n it runs only a bit slower. ------------------------------------------------------------------------ If you're a "C" coder too, consider creating and submitting a patch to do this to the patch tracker on http://sf.net/projects/python . That's the best thing you can do to ensure the optimization is considered for a future release of Python. Jeff
pgpHIYtCvVjwy.pgp
Description: PGP signature
-- http://mail.python.org/mailman/listinfo/python-list