On Mon, 11 Oct 2021 at 23:00, Christian Gollwitzer <aurio...@gmx.de> wrote: > > Am 10.10.21 um 10:49 schrieb Steve Keller: > > I have found the sum() function to be much slower than to loop over the > > operands myself: > > > > def sum_products(seq1, seq2): > > return sum([a * b for a, b in zip(seq1, seq2)]) > > > > def sum_products2(seq1, seq2): > > sum = 0 > > for a, b in zip(seq1, seq2): > > sum += a * b > > return sum > > > > In a program I generate about 14 million pairs of sequences of ints each > > of length 15 which need to be summed. The first version with sum() needs > > 44 seconds while the second version runs in 37 seconds. > > The first version constructs a list, sums it up and throws the list > away, while the second version only keeps the running sum in memory. How > about a generator expression instead, i.e. > > > sum((a * b for a, b in zip(seq1, seq2)))
What really matters in cases like this is interpreter overhead. Typically a generator expression has slightly more overhead compared to a list comprehension. You get a slightly slower per-item speed in exchange for O(1) memory consumption. A list comprehension like result = sum([expr for foo in bar]) Translates internally to something like: def _func(): data = [] for foo in bar: data.append(foo) return foo _stuff = _func() result = sum(_stuff) Running this really does bring in the overhead of a function call _func() because it needs to create a frame with locals and so on. However it is only the cost of a single function call. Afterwards if the elements in the list _stuff are built in types like int etc with builtin __add__ methods then sum(_stuff) does not involve the interpreter at all: it's a built-in function operating on a built-in container of built-in objects. With a generator expression like result = sum(expr for foo in bar) This translates internally to def _genfunc(): for foo in bar: yield foo _gen = _genfunc() result = sum(_gen) Now the _genfunc() function call simply creates the generator and sets up its frame which I think is more or less equivalent to the overhead of the _func() function call. However the sum(_gen) line is no longer a pure built-in operation. Internally each time sum calls _gen.__next__() the execution frame for _genfunc() is reactivated so that the interpreter can execute the bytecodes taking the frame from one yield to the next. This is almost the overhead of a function call for each item in bar although this is often unnoticeable in practice and other factors can affect this. The fastest version eliminates the interpreter altogether at least when operating on pure built-in elements: In [7]: nums1 = nums2 = list(range(10**5)) In [10]: %timeit sum([a*b for a, b in zip(nums1, nums2)]) 9.83 ms ± 21.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [11]: %timeit sum((a*b for a, b in zip(nums1, nums2))) 10.3 ms ± 109 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [12]: from operator import mul In [13]: %timeit sum(map(mul, nums1, nums2)) 7.25 ms ± 33 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Of course if the elements being multiplied and summed have pure-Python __add__/__mul__ methods or the container has a pure Python __iter__/__next__ then any of those methods will typically dominate the runtime over any of the overheads considered here. -- Oscar -- https://mail.python.org/mailman/listinfo/python-list