> Note that you're not doing the same thing at all. You're > pre-allocating the array in the C code, but not in Python (and I don't > think you can). Is there some reason you're growing a 8 gig array 8 > bytes at a time? > > They spend about the same amount of time in system, but Python spends 4.7x > as much > CPU in userland as C does. > > Python has to grow the array. It's possible that this is tripping a > degenerate case in the gc behavior also (I don't know if array uses > PyObjects for its internal buffer), and if it is you'll see an > improvement by disabling GC.
That does explain why it's consuming 4.7x as much CPU. > > x = lengthy_number_crunching() > > magic.save_mmap("/important-data") > > > > and in the application do... > > > > x = magic.mmap("/important-data") > > magic.mlock("/important-data") > > > > and once the mlock finishes bringing important-data into RAM, at > > the speed of your disk I/O subsystem, all accesses to x will be > > hits against RAM. > > You've basically described what mmap does, as far as I can tell. Have > you tried just mmapping the file? Yes, that would be why my fantasy functions have 'mmap' in their names. However, in C you can mmap arbitrarily complex data structures whereas in Python all you can mmap without transformations is an array or a string. I didn't say this earlier, but I do need to pull more than arrays and strings into RAM. Not being able to pre-allocate storage is a big loser for this approach. -- http://mail.python.org/mailman/listinfo/python-list