On Nov 6, 2007 12:18 PM, Michael Bacarella <[EMAIL PROTECTED]> wrote: > > > > > For various reasons I need to cache about 8GB of data from disk into core on > application startup. >
Are you sure? On PC hardware, at least, doing this doesn't make any guarantee that accessing it actually going to be any faster. Is just mmap()ing the file a problem for some reason? I assume you're on a 64 bit machine. > Building this cache takes nearly 2 hours on modern hardware. I am surprised > to discover that the bottleneck here is CPU. > > > > The reason this is surprising is because I expect something like this to be > very fast: > > > > #!python > > > > import array > > a = array.array('L') > > f = open('/dev/zero','r') > > while True: > > a.fromstring(f.read(8)) > > This just creates the same array over and over, forever. Is this really the code you meant to write? I don't know why you'd expect an infinite loop to be "fast"... > > > > Profiling this application shows all of the time is spent inside > a.fromstring. > Obviously, because that's all that's inside your while True loop. There's nothing else that it could spend time on. > Little difference if I use list instead of array. > > > > Is there anything I could tell the Python runtime to help it run this > pathologically slanted case faster? > This code executes in a couple seconds for me (size reduced to fit in my 32 bit memory space): Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import array >>> s = '\x00' * ((1024 **3)/2) >>> len(s) 536870912 >>> a = array.array('L') >>> a.fromstring(s) >>> You might also want to look at array.fromfile() -- http://mail.python.org/mailman/listinfo/python-list