Hi! I need to enlighten myself in Python unicode speed and implementation.
My platform is AMD [EMAIL PROTECTED] (x86-32), Debian, Python 2.4. First a simple example (and time results): x = "a"*50000000 real 0m0.195s user 0m0.144s sys 0m0.046s x = u"a"*50000000 real 0m2.477s user 0m2.119s sys 0m0.225s So my first question is why creation of a unicode string lasts more then 10x longer than non-unicode string? Another situation: speed problem with long strings I have a simple function for removing diacritics from a string: #!/usr/bin/python2.4 # -*- coding: UTF-8 -*- import unicodedata def no_diacritics(line): if type(line) != unicode: line = unicode(line, 'utf-8') line = unicodedata.normalize('NFKD', line) output = '' for c in line: if not unicodedata.combining(c): output += c return output Now the calling sequence (and time results): for i in xrange(1): x = u"a"*50000 y = no_diacritics(x) real 0m17.021s user 0m11.139s sys 0m5.116s for i in xrange(5): x = u"a"*10000 y = no_diacritics(x) real 0m0.548s user 0m0.502s sys 0m0.004s In both cases the total amount of data is equal but when I use shorter strings it is much faster. Maybe it has nothing to do with Python unicode but I would like to know the reason. Thanks for notes! David -- http://mail.python.org/mailman/listinfo/python-list