On Wed, 21 Dec 2005 05:15:23 -0800, bonono wrote: > > Steven D'Aprano wrote: >> If you really wanted to waste CPU cycles, you could do this: >> >> s = "1579" >> for c in s: >> if not c.isdigit(): >> print "Not an integer string" >> break >> else: >> # if we get here, we didn't break >> print "Integer %d" % int(s) >> >> >> but notice that this is wasteful: first you walk the string, checking each >> character, and then the int() function has to walk the string again, >> checking each character for the second time. >> > Wasteful enough that there is a specific built-in function to do just > this ?
Well, let's find out, shall we? from time import time # create a list of known int strings L_good = [str(n) for n in range(1000000)] # and a list of known non-int strings L_bad = [s + "x" for s in L_good] # now let's time how long it takes, comparing # Look Before You Leap vs. Just Do It def timer_LBYL(L): t = time() for s in L_good: if s.isdigit(): n = int(s) return time() - t def timer_JDI(L): t = time() for s in L_good: try: n = int(s) except ValueError: pass return time() - t # and now test the two strategies def tester(): print "Time for Look Before You Leap (all ints): %f" \ % timer_LBYL(L_good) print "Time for Look Before You Leap (no ints): %f" \ % timer_LBYL(L_bad) print "Time for Just Do It (all ints): %f" \ % timer_JDI(L_good) print "Time for Just Do It (no ints): %f" \ % timer_JDI(L_bad) And here are the results from three tests: >>> tester() Time for Look Before You Leap (all ints): 2.871363 Time for Look Before You Leap (no ints): 3.167513 Time for Just Do It (all ints): 2.575050 Time for Just Do It (no ints): 2.579374 >>> tester() Time for Look Before You Leap (all ints): 2.903631 Time for Look Before You Leap (no ints): 3.272497 Time for Just Do It (all ints): 2.571025 Time for Just Do It (no ints): 2.571188 >>> tester() Time for Look Before You Leap (all ints): 2.894780 Time for Look Before You Leap (no ints): 3.167017 Time for Just Do It (all ints): 2.822160 Time for Just Do It (no ints): 2.569494 There is a consistant pattern that Look Before You Leap is measurably, and consistently, slower than using try...except, but both are within the same order of magnitude speed-wise. I wondered whether the speed difference would be different if the strings themselves were very long. So I made some minor changes: >>> L_good = ["1234567890"*200] * 2000 >>> L_bad = [s + "x" for s in L_good] >>> tester() Time for Look Before You Leap (all ints): 9.740390 Time for Look Before You Leap (no ints): 9.871122 Time for Just Do It (all ints): 9.865055 Time for Just Do It (no ints): 9.967314 Hmmm... why is converting now slower than checking+converting? That doesn't make sense... except that the strings are so long that they overflow ints, and get converted automatically to longs. Perhaps this test exposes some accident of implementation. So I changed the two timer functions to use long() instead of int(), and got this: >>> tester() Time for Look Before You Leap (all ints): 9.591998 Time for Look Before You Leap (no ints): 9.866835 Time for Just Do It (all ints): 9.424702 Time for Just Do It (no ints): 9.416610 A small but consistent speed advantage to the try...except block. Having said all that, the speed difference are absolutely trivial, less than 0.1 microseconds per digit. Choosing one form or the other purely on the basis of speed is premature optimization. But the real advantage of the try...except form is that it generalises to more complex kinds of data where there is no fast C code to check whether the data can be converted. (Try re-running the above tests with isdigit() re-written as a pure Python function.) In general, it is just as difficult to check whether something can be converted as it is to actually try to convert it and see whether it fails, especially in a language like Python where try...except blocks are so cheap to use. -- Steven. -- http://mail.python.org/mailman/listinfo/python-list