Dictionary vs binary search lookups [was: Populating a dictionary, fast [SOLVED]]

2007-11-21 Thread Francesc Altet
A Tuesday 20 November 2007, Istvan Albert escrigué: > On Nov 19, 2:33 pm, Francesc Altet <[EMAIL PROTECTED]> wrote: > > Just for the record. I was unable to stop thinking about this, and > > after some investigation, I guess that my rememberings were > > gathered from some benchmarks with code in

Re: Populating a dictionary, fast [SOLVED]

2007-11-20 Thread Istvan Albert
On Nov 19, 2:33 pm, Francesc Altet <[EMAIL PROTECTED]> wrote: > Just for the record. I was unable to stop thinking about this, and > after some investigation, I guess that my rememberings were gathered > from some benchmarks with code in Pyrex (i.e. pretty near to C speed). Pretty interesting an

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-20 Thread harri
On Nov 15, 9:51 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > > Since some people missed the EUREKA!, here's the executive summary: > > Python2.3: about 45 minutes > Python2.4: about 45 minutes > Python2.5: about _30 seconds_ FYI, I tried on two 64 bit SMP machines

Re: Populating a dictionary, fast [SOLVED]

2007-11-19 Thread Francesc Altet
A Wednesday 14 November 2007, Francesc Altet escrigué: > I think I've got messed on some benchmarks that I've done on that > subject some time ago, but either my memory is bad or I've made some > mistake on those experiments. My apologies. Just for the record. I was unable to stop thinking about

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-17 Thread Hendrik van Rooyen
"Michael Bacarella" wrote: > I am so sorry to have ruined the decorum. Oh dear! I confidently predict that this thread will now degenerate to include such things as dignitas and gravitas. - Hendrik -- http://mail.python.org/mailman/listinfo/python-list

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Steven D'Aprano
On Fri, 16 Nov 2007 11:24:24 +0100, Hrvoje Niksic wrote: > Steven D'Aprano <[EMAIL PROTECTED]> writes: > >>> Can you post minimal code that exhibits this behavior on Python 2.5.1? >>> The OP posted a lot of different versions, most of which worked just >>> fine for most people. >> >> Who were tes

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Jeffrey Froman
Steven D'Aprano wrote: > Can you try it running in 64-bit mode? Here are my results using the following test.py: $ cat test.py #!/usr/bin/python import time print "Starting: %s" % time.ctime() v = {} for line in open('keys.txt'): v[long(line.strip())] = True print "Finished: %s" % time.ctime(

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Istvan Albert
On Nov 16, 1:18 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > You're right, it is completely inappropriate for us to be showing our > dirty laundry to the public. you are misinterpreting my words on many levels, (and I of course could have refrained from the "chair-monitor" jab as well)

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Michael Bacarella
> Do you really believe that you cannot create or delete a large > dictionary with python versions less than 2.5 (on a 64 bit or multi- > cpu system)? That a bug of this magnitude has not been noticed until > someone posted on clp? You're right, it is completely inappropriate for us to be showing

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-16 Thread Hrvoje Niksic
Steven D'Aprano <[EMAIL PROTECTED]> writes: >> Can you post minimal code that exhibits this behavior on Python 2.5.1? >> The OP posted a lot of different versions, most of which worked just >> fine for most people. > > Who were testing it on single-CPU, 32 bit systems. Still, I'd like to see a te

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Istvan Albert
On Nov 15, 4:11 pm, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > Unless you're accusing both myself and the original poster of outright > lying, of faking our results, what's your explanation? I don't attribute it to malice, I think you're simply measuring something else. You b

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Terry Reedy
"Steven D'Aprano" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] | (7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1. | | Do we treat this as a solved problem and move on? If the problem was fixed accidentally as an undocumented by product of a patch aimed at something else,

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Paul Rubin
Steven D'Aprano <[EMAIL PROTECTED]> writes: > (7) It occurs in Python 2.3, 2.4 and 2.5, but not 2.5.1. > Do we treat this as a solved problem and move on? I'm not comfortable with this. Better to identify the cause for certain. I'll look at it sometime if I get a chance. I have some 64-bit mult

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 21:51:21 +0100, Hrvoje Niksic wrote: > Steven D'Aprano <[EMAIL PROTECTED]> writes: > Someone please summarize. >>> >>> Yes, that would be good. >> >> On systems with multiple CPUs or 64-bit systems, or both, creating >> and/or deleting a multi-megabyte dictionary in rece

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Michael Bacarella
> On Thu, 15 Nov 2007 15:51:25 -0500, Michael Bacarella wrote: > > > Since some people missed the EUREKA!, here's the executive summary: > > > > Python2.3: about 45 minutes > > Python2.4: about 45 minutes > > Python2.5: about _30 seconds_ > > I'm really happy that upgrading to 2.5 sol

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Chris Mellon
On Nov 15, 2007 2:57 PM, Steven D'Aprano <[EMAIL PROTECTED]> wrote: > On Thu, 15 Nov 2007 10:51:08 -0600, Chris Mellon wrote: > > > I can't duplicate this in a dual CPU (64 bit, but running in 32 bit mode > > with a 32 bit OS) system. > > Can you try it running in 64-bit mode? > > What Python versi

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 15:51:25 -0500, Michael Bacarella wrote: > Since some people missed the EUREKA!, here's the executive summary: > > Python2.3: about 45 minutes > Python2.4: about 45 minutes > Python2.5: about _30 seconds_ I'm really happy that upgrading to 2.5 solved the is

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Hrvoje Niksic
Steven D'Aprano <[EMAIL PROTECTED]> writes: >>> Someone please summarize. >> >> Yes, that would be good. > > On systems with multiple CPUs or 64-bit systems, or both, creating and/or > deleting a multi-megabyte dictionary in recent versions of Python (2.3, > 2.4, 2.5 at least) takes a LONG time

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 10:51:08 -0600, Chris Mellon wrote: > I can't duplicate this in a dual CPU (64 bit, but running in 32 bit mode > with a 32 bit OS) system. Can you try it running in 64-bit mode? What Python version are you running? -- Steven. -- http://mail.python.org/mailman/listinfo/pyth

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Steven D'Aprano
On Thu, 15 Nov 2007 11:11:57 -0800, Istvan Albert wrote: > On Nov 14, 6:26 pm, Steven D'Aprano <[EMAIL PROTECTED] > cybersource.com.au> wrote: > >> On systems with multiple CPUs or 64-bit systems, or both, creating >> and/or deleting a multi-megabyte dictionary in recent versions of >> Python (2.

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Michael Bacarella
> On Nov 15, 2:11 pm, Istvan Albert <[EMAIL PROTECTED]> wrote: > > There is nothing wrong with neither creating nor deleting > > dictionaries. > > I suspect what happened is this: on 64 bit > machines the data structures for creating dictionaries > are larger (because pointers take twice as much s

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Aaron Watters
On Nov 15, 2:11 pm, Istvan Albert <[EMAIL PROTECTED]> wrote: > There is nothing wrong with neither creating nor deleting > dictionaries. I suspect what happened is this: on 64 bit machines the data structures for creating dictionaries are larger (because pointers take twice as much space), so you

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Istvan Albert
On Nov 14, 6:26 pm, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > On systems with multiple CPUs or 64-bit systems, or both, creating and/or > deleting a multi-megabyte dictionary in recent versions of Python (2.3, > 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minute

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Chris Mellon
On Nov 14, 2007 5:26 PM, Steven D'Aprano <[EMAIL PROTECTED]> wrote: > On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote: > > > Aaron Watters <[EMAIL PROTECTED]> writes: > > > >> On Nov 12, 12:46 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > >>> > >>> > It takes about 20 seconds for me.

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Aaron Watters
On Nov 14, 6:26 pm, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > On systems with multiple CPUs or 64-bit systems, or both, creating and/or > deleting a multi-megabyte dictionary in recent versions of Python (2.3, > 2.4, 2.5 at least) takes a LONG time, of the order of 30+ minute

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-15 Thread Aaron Watters
On Nov 14, 6:26 pm, Steven D'Aprano <[EMAIL PROTECTED] cybersource.com.au> wrote: > >> Someone please summarize. > > > Yes, that would be good. > > On systems with multiple CPUs or 64-bit systems, or both, creating and/or > deleting a multi-megabyte dictionary in recent versions of Python (2.3, > 2

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-14 Thread Steven D'Aprano
On Wed, 14 Nov 2007 18:16:25 +0100, Hrvoje Niksic wrote: > Aaron Watters <[EMAIL PROTECTED]> writes: > >> On Nov 12, 12:46 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: >>> >>> > It takes about 20 seconds for me. It's possible it's related to >>> > int/long >>> > unification - try using Pyth

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-14 Thread Hrvoje Niksic
Aaron Watters <[EMAIL PROTECTED]> writes: > On Nov 12, 12:46 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: >> >> > It takes about 20 seconds for me. It's possible it's related to >> > int/long >> > unification - try using Python 2.5. If you can't switch to 2.5, try >> > using string keys inst

Re: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-14 Thread Aaron Watters
On Nov 12, 12:46 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > > > It takes about 20 seconds for me. It's possible it's related to > > int/long > > unification - try using Python 2.5. If you can't switch to 2.5, try > > using string keys instead of longs. > > Yes, this was it. It ran *very*

Re: Populating a dictionary, fast [SOLVED]

2007-11-14 Thread Francesc Altet
A Wednesday 14 November 2007, Istvan Albert escrigué: > On Nov 13, 11:27 am, Francesc Altet <[EMAIL PROTECTED]> wrote: > > Another possibility is using an indexed column in a table in a DB. > > Lookups there should be much faster than using a dictionary as > > well. > > I would agree with others wh

Re: Populating a dictionary, fast [SOLVED]

2007-11-14 Thread Francesc Altet
A Tuesday 13 November 2007, Steven D'Aprano escrigué: > On Tue, 13 Nov 2007 17:27:11 +0100, Francesc Altet wrote: > > I don't know exactly why do you need a dictionary for keeping the > > data, but in case you want ultra-fast access to values, there is no > > replacement for keeping a sorted list o

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Istvan Albert
On Nov 13, 11:27 am, Francesc Altet <[EMAIL PROTECTED]> wrote: > Another possibility is using an indexed column in a table in a DB. > Lookups there should be much faster than using a dictionary as well. I would agree with others who state that for a simple key based lookup nothing beats a diction

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Steven D'Aprano
On Tue, 13 Nov 2007 17:27:11 +0100, Francesc Altet wrote: > I don't know exactly why do you need a dictionary for keeping the data, > but in case you want ultra-fast access to values, there is no > replacement for keeping a sorted list of keys and a list with the > original indices to values, and

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Hrvoje Niksic
Francesc Altet <[EMAIL PROTECTED]> writes: > I don't know exactly why do you need a dictionary for keeping the > data, but in case you want ultra-fast access to values, there is no > replacement for keeping a sorted list of keys and a list with the > original indices to values, and the proper list

RE: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Michael Bacarella
> Shouldn't this be: > > id2name[key >> 40][key & 0xff] = name Yes, exactly, I had done hex(pow(2,40)) when I meant hex(pow(2,40)-1) I sent my correction a few minutes afterwards but Mailman queued it for moderator approval (condition with replying to myself?) -- http://mail.pytho

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Paul McGuire
On Nov 12, 11:32 am, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > See end for solution. > > > >> (3) Are you sure you need all eight-million-plus items in the cache > > >> all at once? > > > > Yes. > > > I remain skeptical, but what do I know, I don't even know what you're > > doing with the da

Re: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Francesc Altet
A Monday 12 November 2007, Michael Bacarella escrigué: > As for the solution, after trying a half-dozen different integer > hashing functions > and hash table sizes (the brute force approach), on a total whim I > switched to a > model with two dictionary tiers and got whole orders of magnitude > be

RE: Populating a dictionary, fast [SOLVED]

2007-11-13 Thread Michael Bacarella
See end for solution. > >> (3) Are you sure you need all eight-million-plus items in the cache > >> all at once? > > > > Yes. > > I remain skeptical, but what do I know, I don't even know what you're > doing with the data once you have it :-) It's OK, I'd be skeptical too. ;) > $ cat /proc/cpui

Re: Populating a dictionary, fast

2007-11-12 Thread MrJean1
On MacOS** your version #!/usr/bin/python v = {} for line in open('keys.txt'): v[long(line.strip())] = True using these keys takes 24.9 secs with ActivePython 2.5.1 and 212.3 secs with Apple's Python 2.3.5. However, this version #!/usr/bin/py

Re: Populating a dictionary, fast

2007-11-12 Thread Istvan Albert
On Nov 12, 12:39 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > The win32 Python or the cygwin Python? > > What CPU architecture? it is the win32 version, a dual core laptop with T5220 Core 2 -- http://mail.python.org/mailman/listinfo/python-list

Re: Populating a dictionary, fast [SOLVED]

2007-11-12 Thread Paul Rubin
"Michael Bacarella" <[EMAIL PROTECTED]> writes: > > id2name[key >> 40][key & 0x100] = name > Oops, typo. It's actually: > Id2name[key >> 40][key & 0xff] = name Are you saying this is a patch that appeared after python 2.3? Somehow I don't think it's come up in this threa

RE: Populating a dictionary, fast [SOLVED SOLVED]

2007-11-12 Thread Michael Bacarella
> > You can download the list of keys from here, it's 43M gzipped: > > http://www.sendspace.com/file/9530i7 > > > > and see it take about 45 minutes with this: > > > > $ cat cache-keys.py > > #!/usr/bin/python > > v = {} > > for line in open('keys.txt'): > > v[long(line.strip())] = True

RE: Populating a dictionary, fast

2007-11-12 Thread Michael Bacarella
> > and see it take about 45 minutes with this: > > > > $ cat cache-keys.py > > #!/usr/bin/python > > v = {} > > for line in open('keys.txt'): > > v[long(line.strip())] = True > > On my system (windows vista) your code (using your data) runs in: > > 36 seconds with python 2.4 > 25 seconds

RE: Populating a dictionary, fast [SOLVED]

2007-11-12 Thread Michael Bacarella
> id2name[key >> 40][key & 0x100] = name Oops, typo. It's actually: Id2name[key >> 40][key & 0xff] = name -- http://mail.python.org/mailman/listinfo/python-list

Re: Populating a dictionary, fast

2007-11-12 Thread Jeffrey Froman
Hrvoje Niksic wrote: > Note that both machines are x86_64.  Please try your test on a 32-bit > machine and report the results.  I suspect performance won't degrade > there. This theory seems to be supported by my findings. Running the test on a 32-bit machine took 45 seconds, while the same test

Re: Populating a dictionary, fast

2007-11-12 Thread Hrvoje Niksic
Michael Bacarella <[EMAIL PROTECTED]> writes: > $ uname -a > Linux xxx 2.6.9-22.ELsmp #1 SMP Sat Oct 8 21:32:36 BST 2005 x86_64 x86_64 > x86_64 GNU/Linux > > We've also tried it on this version (on a different machine): > > $ uname -a > Linux yyy 2.6.18-8.el5 #1 SMP Thu Mar 15 19:46:53 EDT 2007 x

Re: Populating a dictionary, fast

2007-11-12 Thread Steven D'Aprano
On Sun, 11 Nov 2007 08:25:15 -0800, Michael Bacarella wrote: > Firstly, thank you for all of your help so far, I really appreciate it. > >> > So, you think the Python's dict implementation degrades towards > O(N) >> > performance when it's fed millions of 64-bit pseudo-random longs? >> >> No. >

Re: Populating a dictionary, fast

2007-11-12 Thread Steven D'Aprano
On Sun, 11 Nov 2007 11:42:08 -0700, Arkanes wrote: > It takes about 20 seconds for me. It's possible it's related to int/long > unification - try using Python 2.5. If you can't switch to 2.5, try > using string keys instead of longs. I'm getting similar behaviour to the Original Poster, and I'm u

Re: Populating a dictionary, fast

2007-11-12 Thread Hrvoje Niksic
Paul Rubin writes: > Michael Bacarella <[EMAIL PROTECTED]> writes: >> If only it were so easy. > > I think I know what's going on, the dictionary updates are sending the > GC into quadratic behavior. Try turning off the GC: > > import gc > gc.disable() This is

Re: Populating a dictionary, fast

2007-11-11 Thread Istvan Albert
On Nov 11, 11:51 am, Michael Bacarella <[EMAIL PROTECTED]> wrote: > and see it take about 45 minutes with this: > > $ cat cache-keys.py > #!/usr/bin/python > v = {} > for line in open('keys.txt'): > v[long(line.strip())] = True On my system (windows vista) your code (using your data) runs

Re: Populating a dictionary, fast

2007-11-11 Thread Ben Finney
DouhetSukd <[EMAIL PROTECTED]> writes: > Ah, well, just noticed Ben's suggested this already. Mind you, his > code, while correct in intent, does look a bit fishy (missing those > square brackets) By "missing those square brackets", what would be a list comprehension (allocating an entire list i

Re: Populating a dictionary, fast

2007-11-11 Thread Ricardo Aráoz
Michael Bacarella wrote: >>> This would seem to implicate the line id2name[id] = name as being > excruciatingly slow. >> As others have pointed out there is no way that this takes 45 >> minutes.Must be something with your system or setup. >> >> A functionally equivalent code for me runs in about 4

Re: Populating a dictionary, fast

2007-11-11 Thread Marc 'BlackJack' Rintsch
On Sun, 11 Nov 2007 08:51:37 -0800, Michael Bacarella wrote: >> As others have pointed out there is no way that this takes 45 >> minutes.Must be something with your system or setup. >> >> A functionally equivalent code for me runs in about 49 seconds! >> (it ends up using about twice as much ram a

Re: Populating a dictionary, fast

2007-11-11 Thread Alberto Berti
> "Michael" == Michael Bacarella <[EMAIL PROTECTED]> writes: >> > This would seem to implicate the line id2name[id] = name as >> being Michael> excruciatingly slow. >> >> As others have pointed out there is no way that this takes 45 >> minutes.Must be something with y

Re: Populating a dictionary, fast

2007-11-11 Thread Arkanes
Michael Bacarella wrote: > You can download the list of keys from here, it's 43M gzipped: > http://www.sendspace.com/file/9530i7 > > and see it take about 45 minutes with this: > > $ cat cache-keys.py > #!/usr/bin/python > v = {} > for line in open('keys.txt'): > v[long(line.strip())] =

Re: Populating a dictionary, fast

2007-11-11 Thread Michael Bacarella
> > I tried your code (with one change, time on feedback lines) and got the > > same terrible > > performance against my data set. > > > > To prove that my machine is sane, I ran the same against your generated >> sample file and got _excellent_ performance. Start to finish in under a minute.

Re: Populating a dictionary, fast

2007-11-11 Thread Michael Bacarella
> > This would seem to implicate the line id2name[id] = name as being excruciatingly slow. > > As others have pointed out there is no way that this takes 45 > minutes.Must be something with your system or setup. > > A functionally equivalent code for me runs in about 49 seconds! > (it ends up usi

Re: Populating a dictionary, fast

2007-11-11 Thread Istvan Albert
On Nov 11, 11:25 am, Michael Bacarella <[EMAIL PROTECTED]> wrote: > I tried your code (with one change, time on feedback lines) and got the > same terrible > performance against my data set. > To prove that my machine is sane, I ran the same against your generated > sample file and got _excellen

Re: Populating a dictionary, fast

2007-11-11 Thread Michael Bacarella
Firstly, thank you for all of your help so far, I really appreciate it. > > So, you think the Python's dict implementation degrades towards O(N) > > performance when it's fed millions of 64-bit pseudo-random longs? > > No. Yes. I tried your code (with one change, time on feedback lines) and go

Re: Populating a dictionary, fast

2007-11-11 Thread Istvan Albert
On Nov 10, 4:56 pm, Michael Bacarella <[EMAIL PROTECTED]> wrote: > This would seem to implicate the line id2name[id] = name as being > excruciatingly slow. As others have pointed out there is no way that this takes 45 minutes.Must be something with your system or setup. A functionally equivalen

Re: Populating a dictionary, fast

2007-11-11 Thread Michael Bacarella
> Steven D'Aprano wrote: > > (2) More memory will help avoid paging. If you can't get more memory, try > > more virtual memory. It will still be slow, but at least the operating > > system doesn't have to try moving blocks around as much. > > Based on his previous post, it would seem he has 7GB

Re: Populating a dictionary, fast

2007-11-11 Thread DouhetSukd
Ah, well, just noticed Ben's suggested this already. Mind you, his code, while correct in intent, does look a bit fishy (missing those square brackets), so don't dismiss it just because you had trouble running it (or mine). Definitely worth a try and I'd be curious to know if it makes a differenc

Re: Populating a dictionary, fast

2007-11-11 Thread Alberto Berti
> "Steven" == Steven D'Aprano <[EMAIL PROTECTED]> writes: Steven> $ time ./slurp_dict.py Starting at Sun Nov 11 14:26:51 Steven> 2007 Line 0 Line 100 Line 200 Line 300 Line Steven> 400 Line 500 Line 600 Line 700 Line Steven> 800 Items in dict: 81

Re: Populating a dictionary, fast

2007-11-11 Thread DouhetSukd
On Nov 11, 7:35 am, Michael Bacarella <[EMAIL PROTECTED]> wrote: > Tried that already. No difference. :( Not sure if it would make a difference, and it would imply re- organizing your preceding lines, but what about doing the dictionary build in one go, rather than incrementally? Using the dict

Re: Populating a dictionary, fast

2007-11-11 Thread Michael Bacarella
> - Original Message > From: Paul Rubin <http://[EMAIL PROTECTED]> > To: python-list@python.org > Sent: Sunday, November 11, 2007 12:45:44 AM > Subject: Re: Populating a dictionary, fast > > Michael Bacarella <[EMAIL PROTECTED]> writes: > > If on

Re: Populating a dictionary, fast

2007-11-10 Thread Paul Rubin
Michael Bacarella <[EMAIL PROTECTED]> writes: > If only it were so easy. I think I know what's going on, the dictionary updates are sending the GC into quadratic behavior. Try turning off the GC: import gc gc.disable() -- http://mail.python.org/mailman/listinfo/python-list

Re: Populating a dictionary, fast

2007-11-10 Thread Yu-Xi Lim
Steven D'Aprano wrote: > (2) More memory will help avoid paging. If you can't get more memory, try > more virtual memory. It will still be slow, but at least the operating > system doesn't have to try moving blocks around as much. > Based on his previous post, it would seem he has 7GB of RAM (w

Re: Populating a dictionary, fast

2007-11-10 Thread Steven D'Aprano
On Sat, 10 Nov 2007 17:18:37 -0800, Michael Bacarella wrote: > So, you think the Python's dict implementation degrades towards O(N) > performance when it's fed millions of 64-bit pseudo-random longs? No. Here's my sample file: $ wc -l id2name.txt 8191180 id2name.txt $ ls -lh id2name.txt -rw-rw

Re: Populating a dictionary, fast

2007-11-10 Thread Michael Bacarella
> That's an awfully complicated way to iterate over a file. Try this > instead: > > id2name = {} > for line in open('id2name.txt'): >id,name = line.strip().split(':') >id = long(id) >id2name[id] = name > > > This takes about 45 *minutes* > > > On my system, it takes about a minute an

Re: Populating a dictionary, fast

2007-11-10 Thread Paul Rubin
Michael Bacarella <[EMAIL PROTECTED]> writes: > Is there a fast, functionally equivalent way of doing this? > > (Yes, I really do need this cached. No, an RDBMS or disk-based hash > is not fast enough.) As Steven says maybe you need to add more ram to your system. The memory overhead of diction

Re: Populating a dictionary, fast

2007-11-10 Thread Steven D'Aprano
On Sat, 10 Nov 2007 13:56:35 -0800, Michael Bacarella wrote: > The id2name.txt file is an index of primary keys to strings. They look > like this: > > 11293102971459182412:Descriptive unique name for this record\n > 950918240981208142:Another name for another record\n > > The file's properties

Re: Populating a dictionary, fast

2007-11-10 Thread Ben Finney
Michael Bacarella <[EMAIL PROTECTED]> writes: > id2name = {} > for line in iter(open('id2name.txt').readline,''): > id,name = line.strip().split(':') > id = long(id) > id2name[id] = name > > This takes about 45 *minutes* > > If I comment out the last line in the loop body it takes on

Populating a dictionary, fast

2007-11-10 Thread Michael Bacarella
The id2name.txt file is an index of primary keys to strings. They look like this: 11293102971459182412:Descriptive unique name for this record\n 950918240981208142:Another name for another record\n The file's properties are: # wc -l id2name.txt 8191180 id2name.txt # du -h id2name.txt 517M