Re: collections.Counter surprisingly slow

2013-07-30 Thread Serhiy Storchaka
29.07.13 14:49, Joshua Landau написав(ла): I find it hard to agree that counter should be optimised for the unique-data case, as surely it's much more oft used when there's a point to counting? Different methods are faster for different data. LBYL approach is best for the mostly unique data ca

Re: collections.Counter surprisingly slow

2013-07-29 Thread Stefan Behnel
Stefan Behnel, 30.07.2013 08:39: > Serhiy Storchaka, 29.07.2013 21:37: >> 29.07.13 20:19, Ian Kelly написав(ла): >>> On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote: Also, couldn't Counter just extend from defaultdict? >>> >>> It could, but I expect the C helper function in 3.4 will be fa

Re: collections.Counter surprisingly slow

2013-07-29 Thread Stefan Behnel
Serhiy Storchaka, 29.07.2013 21:37: > 29.07.13 20:19, Ian Kelly написав(ла): >> On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote: >>> Also, couldn't Counter just extend from defaultdict? >> >> It could, but I expect the C helper function in 3.4 will be faster >> since it doesn't even need to ca

Re: collections.Counter surprisingly slow

2013-07-29 Thread Serhiy Storchaka
29.07.13 20:19, Ian Kelly написав(ла): On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote: Also, couldn't Counter just extend from defaultdict? It could, but I expect the C helper function in 3.4 will be faster since it doesn't even need to call __missing__ in the first place. I'm surpris

Re: collections.Counter surprisingly slow

2013-07-29 Thread Ian Kelly
On Mon, Jul 29, 2013 at 5:49 AM, Joshua Landau wrote: > Also, couldn't Counter just extend from defaultdict? It could, but I expect the C helper function in 3.4 will be faster since it doesn't even need to call __missing__ in the first place. And the cost (both in terms of maintenance and run-tim

Re: collections.Counter surprisingly slow

2013-07-29 Thread Joshua Landau
On 29 July 2013 12:46, Stefan Behnel wrote: > Steven D'Aprano, 28.07.2013 22:51: > > Calling Counter ends up calling essentially this code: > > > > for elem in iterable: > > self[elem] = self.get(elem, 0) + 1 > > > > (although micro-optimized), where "iterable" is your data (lines). > > Calli

Re: collections.Counter surprisingly slow

2013-07-29 Thread Joshua Landau
On 29 July 2013 07:25, Serhiy Storchaka wrote: > 28.07.13 22:59, Roy Smith написав(ла): > >The input is an 8.8 Mbyte file containing about 570,000 lines (11,000 >> unique strings). >> > > Repeat you tests with totally unique lines. Counter is about ½ the speed of defaultdict in that case (a

Re: collections.Counter surprisingly slow

2013-07-29 Thread Stefan Behnel
Steven D'Aprano, 28.07.2013 22:51: > Calling Counter ends up calling essentially this code: > > for elem in iterable: > self[elem] = self.get(elem, 0) + 1 > > (although micro-optimized), where "iterable" is your data (lines). > Calling the get method has higher overhead than dict[key], that

Re: collections.Counter surprisingly slow

2013-07-28 Thread Serhiy Storchaka
28.07.13 22:59, Roy Smith написав(ла): The input is an 8.8 Mbyte file containing about 570,000 lines (11,000 unique strings). Repeat you tests with totally unique lines. The full profiler dump is at the end of this message, but the gist of it is: Profiler affects execution time. In partic

Re: collections.Counter surprisingly slow

2013-07-28 Thread Roy Smith
In article <51f5843f$0$29971$c3e8da3$54964...@news.astraweb.com>, Steven D'Aprano wrote: > > Why is count() [i.e. collections.Counter] so slow? > > It's within a factor of 2 of test, and 3 of exception or default (give or > take). I don't think that's surprisingly slow. It is for a module whi

Re: collections.Counter surprisingly slow

2013-07-28 Thread Steven D'Aprano
On Sun, 28 Jul 2013 15:59:04 -0400, Roy Smith wrote: [...] > I'm rather shocked to discover that count() is the slowest > of all! I expected it to be the fastest. Or, certainly, no slower than > default(). > > The full profiler dump is at the end of this message, but the gist of it > is: > > n