Re: Populating huge data structures from disk

2007-11-07 Thread Rhamphoryncus
On Nov 6, 2:42 pm, "Michael Bacarella" <[EMAIL PROTECTED]> wrote: > > Note that you're not doing the same thing at all. You're > > pre-allocating the array in the C code, but not in Python (and I don't > > think you can). Is there some reason you're growing a 8 gig array 8 > > bytes at a time? > >

Re: Populating huge data structures from disk

2007-11-06 Thread Hrvoje Niksic
"Chris Mellon" <[EMAIL PROTECTED]> writes: > It is a little annoying that there's no way to pre-allocate an > array. It doesn't over-allocate, either, so building on a few bytes > at a time is pretty much worst case behavior. The fine source says: array_resize(arrayobject *self, Py_ssize_t news

Re: Populating huge data structures from disk

2007-11-06 Thread Paul Rubin
"Michael Bacarella" <[EMAIL PROTECTED]> writes: > > The way I do it is run a separate process that mmaps the file and > > reads one byte from each page every half hour or so. You are right > > that it makes a huge difference. > > Why not just disable swap? The system is demand paged. If swap is

Re: Populating huge data structures from disk

2007-11-06 Thread Chris Mellon
On Nov 6, 2007 3:42 PM, Michael Bacarella <[EMAIL PROTECTED]> wrote: > > > Note that you're not doing the same thing at all. You're > > pre-allocating the array in the C code, but not in Python (and I don't > > think you can). Is there some reason you're growing a 8 gig array 8 > > bytes at a time?

RE: Populating huge data structures from disk

2007-11-06 Thread Michael Bacarella
> > Very sure. If we hit the disk at all performance drops > > unacceptably. The application has low locality of reference so > > on-demand caching isn't an option. We get the behavior we want when > > we pre-cache; the issue is simply that it takes so long to build > > this cache. > > The way I

Re: Populating huge data structures from disk

2007-11-06 Thread Paul Rubin
"Michael Bacarella" <[EMAIL PROTECTED]> writes: > Very sure. If we hit the disk at all performance drops > unacceptably. The application has low locality of reference so > on-demand caching isn't an option. We get the behavior we want when > we pre-cache; the issue is simply that it takes so lon

Re: Populating huge data structures from disk

2007-11-06 Thread Hrvoje Niksic
"Michael Bacarella" <[EMAIL PROTECTED]> writes: > cPickle with protocol 2 has some promise but is more complicated because > arrays can't be pickled. This is not true: >>> import array >>> a = array.array('L') >>> a.extend(xrange(10)) >>> a array('L', [0L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L]) >>

RE: Populating huge data structures from disk

2007-11-06 Thread Michael Bacarella
> Note that you're not doing the same thing at all. You're > pre-allocating the array in the C code, but not in Python (and I don't > think you can). Is there some reason you're growing a 8 gig array 8 > bytes at a time? > > They spend about the same amount of time in system, but Python spends 4.7

Re: Populating huge data structures from disk

2007-11-06 Thread Neil Cerutti
On 2007-11-06, Michael Bacarella <[EMAIL PROTECTED]> wrote: > And there's no solace in lists either: > > $ time python eat800.py > > real4m2.796s > user3m57.865s > sys 0m3.638s > > $ cat eat800.py > #!/usr/bin/python > > import struct > > d = [] > f = open('/dev/zero') > for i in xr

Re: Populating huge data structures from disk

2007-11-06 Thread Chris Mellon
On Nov 6, 2007 2:40 PM, Michael Bacarella <[EMAIL PROTECTED]> wrote: > > > > For various reasons I need to cache about 8GB of data from disk into > core on > > > application startup. > > > > Are you sure? On PC hardware, at least, doing this doesn't make any > > guarantee that accessing it actually

RE: Populating huge data structures from disk

2007-11-06 Thread Michael Bacarella
> > For various reasons I need to cache about 8GB of data from disk into core on > > application startup. > > Are you sure? On PC hardware, at least, doing this doesn't make any > guarantee that accessing it actually going to be any faster. Is just > mmap()ing the file a problem for some reason? >

Re: Populating huge data structures from disk

2007-11-06 Thread Chris Mellon
On Nov 6, 2007 12:18 PM, Michael Bacarella <[EMAIL PROTECTED]> wrote: > > > > > For various reasons I need to cache about 8GB of data from disk into core on > application startup. > Are you sure? On PC hardware, at least, doing this doesn't make any guarantee that accessing it actually going to be