> > For various reasons I need to cache about 8GB of data from disk into core on > > application startup. > > Are you sure? On PC hardware, at least, doing this doesn't make any > guarantee that accessing it actually going to be any faster. Is just > mmap()ing the file a problem for some reason? > > I assume you're on a 64 bit machine.
Very sure. If we hit the disk at all performance drops unacceptably. The application has low locality of reference so on-demand caching isn't an option. We get the behavior we want when we pre-cache; the issue is simply that it takes so long to build this cache. > > Building this cache takes nearly 2 hours on modern hardware. I am surprised > > to discover that the bottleneck here is CPU. > > > > The reason this is surprising is because I expect something like this to be > > very fast: > > > > #!python > > import array > > > > a = array.array('L') > > > > f = open('/dev/zero','r') > > > > while True: > > > > a.fromstring(f.read(8)) > > This just creates the same array over and over, forever. Is this > really the code you meant to write? I don't know why you'd expect an > infinite loop to be "fast"... Not exactly. fromstring() appends to the array. It's growing the array towards infinity. Since infinity never finishes it's hard to get an idea of how slow this looks. Let's do 800MB instead. Here's an example of loading 800MB in C: $ time ./eat800 real 0m44.939s user 0m10.620s sys 0m34.303s $ cat eat800.c #include <stdio.h> #include <stdlib.h> #include <fcntl.h> int main(void) { int f = open("/dev/zero",O_RDONLY); int vlen = 8; long *v = malloc((sizeof (long)) * vlen); int i; for (i = 0; i < 100000000; i++) { if (i >= vlen) { vlen *= 2; v = (long *)realloc(v,(sizeof (long)) * vlen); } read(f,v+i,sizeof (long)); } return 0; } Here's the similar operation in Python: $ time python eat800.py real 3m8.407s user 2m40.189s sys 0m27.934s $ cat eat800.py #!/usr/bin/python import array a = array.array('L') f = open('/dev/zero') for i in xrange(100000000): a.fromstring(f.read(8)) They spend about the same amount of time in system, but Python spends 4.7x as much CPU in userland as C does. And there's no solace in lists either: $ time python eat800.py real 4m2.796s user 3m57.865s sys 0m3.638s $ cat eat800.py #!/usr/bin/python import struct d = [] f = open('/dev/zero') for i in xrange(100000000): d.append(struct.unpack('L',f.read(8))[0]) cPickle with protocol 2 has some promise but is more complicated because arrays can't be pickled. In a perfect world I could do something like this somewhere in the backroom: x = lengthy_number_crunching() magic.save_mmap("/important-data") and in the application do... x = magic.mmap("/important-data") magic.mlock("/important-data") and once the mlock finishes bringing important-data into RAM, at the speed of your disk I/O subsystem, all accesses to x will be hits against RAM. Any thoughts? -- http://mail.python.org/mailman/listinfo/python-list