Re: Find and Replace Simplification

Dave Angel Sat, 20 Jul 2013 11:08:47 -0700

On 07/20/2013 01:03 PM, Joshua Landau wrote:

On 20 July 2013 12:57, Serhiy Storchaka <storch...@gmail.com> wrote:

20.07.13 14:16, Joshua Landau написав(ла):


    <snip>

However, some quick timing shows that translate has a very
high penalty for missing characters and is a tad slower any way.

Really, though, there should be no reason for .translate() to be
slower than replace -- at worst it should just be "reduce(lambda s,
ab: s.replace(*ab), mapping.items()¹, original_str)" and end up the
*same* speed as iterated replace.



It doesn't work such way. Consider
'ab'.translate({ord('a'):'b',ord('b'):'a'}).


*sad*

Still, it seems to me that it should be optimizable for sensible
builtin types such that .translate is significantly faster, as there's
no theoretical extra work that .translate *has* to do that .replace
does not, and .replace also has to rebuild the string a lot of times.

translate is going to be faster (than replace) for Unicode if it has a"large" table. For example, to translate from ASCII to EBCDIC, whereevery character in the string is replaced by a new one. I have no ideawhat the cutoff is. But of course, for a case like ASCII to EBCDIC, itwould be very tricky to do it with replaces, probably taking much morethan the expected 96 passes.

translate for byte strings is undoubtedly tons faster. For bytestrings, the translation table is 256 bytes, and the inner loop is asimple lookup. But for Unicode, the table is a dict (or something verylike it, I looked at the C code, not the Python code).

So for every character in the input string, it does a dict-type lookup,before it can even decide if the character is going to change.


Just for reference, the two files I was looking at were:

objects/unicodeobject.c
objects/bytesobject.c

Extracted from the bz2 downloaded from the page:
    http://hg.python.org/cpython


--
DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: Find and Replace Simplification

Reply via email to