[MTD] > I've been messing about for fun creating a trial division factorizing > function and I'm naturally interested in optimising it as much as > possible. > > I've been told that iteration in python is generally more > time-efficient than recursion. Is that true?
Since you heard it from me to begin with, I suppose saying "yes" again won't really help ;-) You can use time.time() or time.clock(), or the `timeit` module, to measure speed. Try timing a dead-simple factorial functions both ways. BTW, I've worked on Python's implementation for close to 15 years now, so when I say something about Python, it's _usually_ safe to just believe it :-) Questioning is fine, but in cases like this where it's easy to find out for yourself, I probably won't make time to respond. > Here is my algorithm as it stands. Any suggestions appreciated! > > from math import * > > def factorize(x): > """ Return factors of x """ > factors = [] > lim = int(sqrt(x)) Here you're limited to integers for which a floating sqrt is accurate. That's 53 bits on most platforms. For integers larger than that, the code may produce incorrect results in mysterious ways at times (because the square root may be inaccurate). > y = divmod(x,2) Most people would use "unpacking assignment" here instead, like q, r = divmod(x, 2) Then later code gets to use meaningful names instead of messing with mysterious little integer subscripts. It's also quicker to use local names than to build subscripting expressions. > if y[1] == 0: # x is even > factors = factors + [2] + factorize(y[0]) Since `factors` upon entry to this block must be [], it's kinda contorted. More obvious (if you _have_ to use recursion) would be: return [2] + factorize(y[0]) and then unindent the rest of the code a level. > else: # x is odd > i = 3 > while i <= lim: > y = divmod(x,i) Again more commonly written: q, r = divmod(x, i) > if y[1] == 0: > factors = factors + [i] + factorize(y[0]) It hardly matters in this context, but appending to a list is _generally_ much faster than building a brand new list from scratch. At a higher level, your recursions always start from 2 again; e.g., if i happens to be 434349 here, the factorize(y[0]) call starts over from 2, despite that you already know that no divisor less than 434349 can succeed. To do this _efficiently_ with recursion requires passing more knowledge across recursion levels. > i = lim+1 > else: > i += 2 > > if factors == []: # necessary step for correct recursion > factors = [x] > > return factors Recursion is pretty bizarre here. Not _wrong_, just "pretty bizarre", and is leading you into higher-level inefficiences. There are all sorts of ways to repair that, but I'm not going to talk about them because it's easy to write an iterative algorithm instead. Here's one way to do that, avoiding float sqrt (this works correctly for integers of any size, although trial division is hopelessly _inefficient_ for finding "large" factors), and skipping multiples of 3 in the inner loop too: def trial(n): if n <= 1: return [n] factors = [] for tiny in 2, 3: while n % tiny == 0: factors.append(tiny) n //= tiny delta = 2 d = 5 while 1: q, r = divmod(n, d) if r: if q < d: break d += delta delta = 6 - delta else: factors.append(d) n = q if n > 1: factors.append(n) return factors If we call the original input N, the key invariants holding at the top of the loop are: 1. N >= 2, and N == n * the product of the integers in `factors` 2. n is not divisible by any prime < d 3. all elements in `factors` are prime It may take a bit of thought to see that d marches over all integers >= 5 that aren't divisible by either 2 or 3: 5, 7, 11, 13, 17, 19, 23, 25, 29, 31, ... It's more-or-less generally true that establishing and proving loop invariants in an iterative algorithm is "the same thing" as establishing and proving pre- and post-conditions in a recursive algorithm. While it may feel strained at first, it's very much worthwhile learning how to do that. For example, from #2 it's easy to prove that if q < d, n must either be 1 or a prime. That's what justifies the "break" statement inside the loop. Then since that's the only way to get out of the loop, n is either 1 or a prime when the loop finishes, and that explains the rest of the code. -- http://mail.python.org/mailman/listinfo/python-list