Peter Otten wrote: > Michael Rudolf wrote: > > > Am 09.03.2010 13:02, schrieb Peter Otten: > >>>>> [sum(a for a,b in zip(x,y) if b==c)/y.count(c)for c in y] > >> [1.5, 1.5, 8.0, 4.0, 4.0, 4.0] > >> Peter > > > > ... pwned. > > Should be the fastest and shortest way to do it. > > It may be short, but it is not particularly efficient. A dict-based approach > is probably the fastest. If y is guaranteed to be sorted itertools.groupby() > may also be worth a try. > > $ cat tmp_average_compare.py > from __future__ import division > from collections import defaultdict > try: > from itertools import izip as zip > except ImportError: > pass > > x = [1 ,2, 8, 5, 0, 7] > y = ['a', 'a', 'b', 'c', 'c', 'c' ] > > def f(x=x, y=y): > p = defaultdict(int) > q = defaultdict(int) > for a, b in zip(x, y): > p[b] += a > q[b] += 1 > return [p[b]/q[b] for b in y] > > def g(x=x, y=y): > return [sum(a for a,b in zip(x,y)if b==c)/y.count(c)for c in y] > > if __name__ == "__main__": > print(f()) > print(g()) > assert f() == g() > $ python3 -m timeit -s 'from tmp_average_compare import f, g' 'f()' > 100000 loops, best of 3: 11.4 usec per loop > $ python3 -m timeit -s 'from tmp_average_compare import f, g' 'g()' > 10000 loops, best of 3: 22.8 usec per loop > > Peter
I converged to the same solution but had an extra reduction step in case there were a lot of repeats in the input. I think it is a good compromise between efficiency, readability and succinctness. x = [1 ,2, 8, 5, 0, 7] y = ['a', 'a', 'b', 'c', 'c', 'c' ] from collections import defaultdict totdct = defaultdict(int) cntdct = defaultdict(int) for name, num in zip(y,x): totdct[name] += num cntdct[name] += 1 avgdct = {name : totdct[name]/cnts for name, cnts in cntdct.items()} w = [avgdct[name] for name in y] -- http://mail.python.org/mailman/listinfo/python-list