Steven D'Aprano wrote: > On Mon, 31 Oct 2011 22:12:26 -0400, Dave Angel wrote: > >> I would claim that a well-written (in C) translate function, without >> using the delete option, should be much quicker than any python loop, >> even if it does copy the data. > > I think you are selling short the speed of the Python interpreter. Even > for short strings, it's faster to iterate over a string in Python 3 than > to copy it with translate: > >>>> from timeit import Timer >>>> t1 = Timer('for c in text: pass', 'text = "abcd"') >>>> t2 = Timer('text.translate(mapping)', > ... 'text = "abcd"; mapping = "".maketrans("", "")') >>>> min(t1.repeat()) > 0.450606107711792 >>>> min(t2.repeat()) > 0.9279451370239258
Lies, damn lies, and benchmarks ;) Copying is fast: >>> Timer("text + 'x'", "text='abcde '*10**6").timeit(100) 1.819761037826538 >>> Timer("for c in text: pass", "text='abcde '*10**6").timeit(100) 18.89239192008972 The problem with str.translate() (unicode.translate() in 2.x) is that it needs a dictionary lookup for every character. However, if like the OP you are going to read data from a file to check whether it's (a subset of) ascii, there's no point converting to a string, and for bytes (where a lookup table with the byte as an index into that table can be used) the numbers look quite different: >>> t1 = Timer("for c in text: pass", "text = b'abcd '*10**6") >>> t1.timeit(100) 15.818882942199707 >>> t2 = Timer("text.translate(mapping)", "text = b'abcd '*10**6; mapping = b''.maketrans(b'', b'')") >>> t2.timeit(100) 2.821769952774048 -- http://mail.python.org/mailman/listinfo/python-list