On Mon, 31 Oct 2011 22:12:26 -0400, Dave Angel wrote: > I would claim that a well-written (in C) translate function, without > using the delete option, should be much quicker than any python loop, > even if it does copy the data.
I think you are selling short the speed of the Python interpreter. Even for short strings, it's faster to iterate over a string in Python 3 than to copy it with translate: >>> from timeit import Timer >>> t1 = Timer('for c in text: pass', 'text = "abcd"') >>> t2 = Timer('text.translate(mapping)', ... 'text = "abcd"; mapping = "".maketrans("", "")') >>> min(t1.repeat()) 0.450606107711792 >>> min(t2.repeat()) 0.9279451370239258 > Incidentally, on the Pentium family, > there's a machine instruction for that, to do the whole loop in one > instruction (with rep prefix). I'm pretty sure that there isn't a machine instruction for copying an entire terabyte of data in one step. Since the OP explicitly said he was checking text up to a TB in size, whatever solution is used has to scale well. -- Steven -- http://mail.python.org/mailman/listinfo/python-list