On 07/20/2013 01:03 PM, Joshua Landau wrote:
On 20 July 2013 12:57, Serhiy Storchaka <storch...@gmail.com> wrote:
20.07.13 14:16, Joshua Landau написав(ла):


    <snip>

However, some quick timing shows that translate has a very
high penalty for missing characters and is a tad slower any way.

Really, though, there should be no reason for .translate() to be
slower than replace -- at worst it should just be "reduce(lambda s,
ab: s.replace(*ab), mapping.items()¹, original_str)" and end up the
*same* speed as iterated replace.


It doesn't work such way. Consider
'ab'.translate({ord('a'):'b',ord('b'):'a'}).

*sad*

Still, it seems to me that it should be optimizable for sensible
builtin types such that .translate is significantly faster, as there's
no theoretical extra work that .translate *has* to do that .replace
does not, and .replace also has to rebuild the string a lot of times.


translate is going to be faster (than replace) for Unicode if it has a "large" table. For example, to translate from ASCII to EBCDIC, where every character in the string is replaced by a new one. I have no idea what the cutoff is. But of course, for a case like ASCII to EBCDIC, it would be very tricky to do it with replaces, probably taking much more than the expected 96 passes.

translate for byte strings is undoubtedly tons faster. For byte strings, the translation table is 256 bytes, and the inner loop is a simple lookup. But for Unicode, the table is a dict (or something very like it, I looked at the C code, not the Python code).

So for every character in the input string, it does a dict-type lookup, before it can even decide if the character is going to change.

Just for reference, the two files I was looking at were:

objects/unicodeobject.c
objects/bytesobject.c

Extracted from the bz2 downloaded from the page:
    http://hg.python.org/cpython


--
DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to