On Tue, Jan 19, 2016 at 2:27 AM, Arshpreet Singh <arsh...@gmail.com> wrote:
> > I was playing with Generators and found that using Generators time is bit > more than list-comprehensions or I am doing it wrong? > > > Function with List comprehensions: > > def sum_text(number_range): > return sum([i*i for i in xrange(number_range)]) > > %timeit sum_text(100000000) > 1 loops, best of 3: 14.8 s per loop > > Using generator Expressions: > > def sum_text(number_range): > > return sum((i*i for i in xrange(number_range))) > > %timeit sum_text(100000000) > > 1 loops, best of 3: 16.4 s per loop > Steven already pointed out the additional overhead in a generator expression vs. a list comprehension. In addition to the memory savings you get via generator expressions, though, you can also get significant time savings when generator expressions have the ability to short-circuit. For instance, have a look at the following: In [1]: import random In [2]: %timeit all(random.random() < 0.5 for i in range(1000)) The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 3.57 µs per loop In [3]: %timeit all([random.random() < 0.5 for i in range(1000)]) 1000 loops, best of 3: 422 µs per loop In [4]: %timeit any(random.random() < 0.5 for i in range(1000)) 100000 loops, best of 3: 3.18 µs per loop In [5]: %timeit any([random.random() < 0.5 for i in range(1000)]) 1000 loops, best of 3: 408 µs per loop This is using IPython with Python 3.5. The difference here is that for functions that short-circuit (like any and all), the generator expression does not have to exhaust all of its elements (particularly since for each element there's a 50-50 chance of being True or False in each case). In this case, the difference is a couple orders of magnitude. The larger the range argument is, the bigger this difference. Also, in Python 2, the generator expression does not leak into the global namespace, while the list comprehension does: Python 2.7.10 (default, Jul 14 2015, 19:46:27) [GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> list(i for i in range(10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> i Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'i' is not defined >>> [i for i in range(10)] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> i 9 Python 3 does not leak the iterator variable in either case. However, it would be madness to have code actually relying on this behavior :). At the end of the day, I use list comprehensions in the following circumstances: - I *know* I won't blow memory with a too-large list - I want to iterate over the object multiple times or I want/may want non-sequential access - I know I want all the elements I'm creating (i.e., no chance of short-circuiting) I use generator expressions when - I *might* want to All the best, Jason P.S. There is a "cross-over" point where the memory requirements of the list comp passes the generator overhead. For instance: In [17]: %timeit sum(i for i in range(10000000)) 1 loops, best of 3: 2.08 s per loop In [18]: %timeit sum([i for i in range(10000000)]) 1 loops, best of 3: 1.86 s per loop In [19]: %timeit sum(i for i in range(100000000)) 1 loops, best of 3: 21.8 s per loop In [20]: %timeit sum([i for i in range(100000000)]) 1 loops, best of 3: 26.1 s per loop -- Jason M. Swails BioMaPS, Rutgers University Postdoctoral Researcher -- https://mail.python.org/mailman/listinfo/python-list