Bruno Desthuilliers a écrit : > Kent Johnson a écrit : > >> sophie_newbie wrote: >> >>> Hey Bruno, >>> >>> I got an invalid syntax error when i tried using your "str_counts = >>> dict((s, str_list.count(s) for s in set(str_list))" bit of code? Maybe >>> there is a missing bracket or comma? Or maybe I need to import >>> something. >> >> >> >> It should be >> str_counts = dict((s, str_list.count(s)) for s in set(str_list)) > > > Of course, my bad :( > >> or for Python < 2.4 > > > from sets import Set as set > >> str_counts = dict([(s, str_list.count(s)) for s in set(str_list)]) >> >> Note that this solution iterates str_list once for each element of >> str_list > > > once for each *distinct* element or str_list (it iterates over the set > created from str_list). > >> - the call to count traverses the entire list to create the count. I >> expect Paul Rubin's solution will be dramatically faster for large >> lists as it only iterates str_list once. > > Yeps. But I would benchmark it before choosing one or the other solution.
And of course, I was right. My solution seems to be faster than Paul's one (but slower than bearophile's), be it on small, medium or large lists. nb: A is mine, B is Paul's and C is bearophile's, and the number after is the size of the list... A100 (10000 times): 1.5801050663 B100 (10000 times): 1.87287902832 C100 (10000 times): 0.991976976395 A10000 (100 times): 1.083589077 B10000 (100 times): 1.30713891983 C10000 (100 times): 0.988032817841 A1000000 (10 times): 10.5345788002 B1000000 (10 times): 13.094493866 C1000000 (10 times): 9.67438292503 source: def countA(lst): return dict((s, lst.count(s)) for s in set(lst)) def countB(lst): counts = {} for s in lst: counts[s] = counts.get(s, 0)+1 return counts def countC(lst): counts = {} for s in lst: if s in counts: counts[s] += 1 else: counts[s] = 1 return counts def mklst(ln): from random import choice items = ['abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stu', 'vwx', 'yz'] return [choice(items) for i in range(ln)] lst100 = mklst(100) lst10000 = mklst(10000) lst1000000 = mklst(1000000) def run(): from timeit import Timer timers = [ ('A100', Timer('countA(lst100)', 'from __main__ import countA, lst100'), 10000), ('B100', Timer('countB(lst100)', 'from __main__ import countB, lst100'), 10000), ('C100', Timer('countC(lst100)', 'from __main__ import countC, lst100'), 10000), ('A10000', Timer('countA(lst10000)', 'from __main__ import countA, lst10000'), 100), ('B10000', Timer('countB(lst10000)', 'from __main__ import countB, lst10000'), 100), ('C10000', Timer('countC(lst10000)', 'from __main__ import countC, lst10000'), 100), ('A1000000', Timer('countA(lst1000000)', 'from __main__ import countA, lst1000000'), 10), ('B1000000', Timer('countB(lst1000000)', 'from __main__ import countB, lst1000000'), 10), ('C1000000', Timer('countC(lst1000000)', 'from __main__ import countC, lst1000000'), 10), ] for name, timer, repeat in timers: print "%s (%s times): %s" % (name, repeat, timer.timeit(repeat)) -- http://mail.python.org/mailman/listinfo/python-list