On Feb 19, 9:34 am, Lionel <lionel.ke...@gmail.com> wrote: > On Feb 18, 12:35 pm, Carl Banks <pavlovevide...@gmail.com> wrote: > > > > > > > On Feb 18, 10:48 am, Lionel <lionel.ke...@gmail.com> wrote: > > > > Thanks Carl, I like your solution. Am I correct in my understanding > > > that memory is allocated at the slicing step in your example i.e. when > > > "reshaped_data" is sliced using "interesting_data = reshaped_data[:, > > > 50:100]"? In other words, given a huge (say 1Gb) file, a memmap object > > > is constructed that memmaps the entire file. Some relatively small > > > amount of memory is allocated for the memmap operation, but the bulk > > > memory allocation occurs when I generate my final numpy sub-array by > > > slicing, and this accounts for the memory efficiency of using memmap? > > > No, what accounts for the memory efficienty is there is no bulk > > allocation at all. The ndarray you have points to the memory that's > > in the mmap. There is no copying data or separate array allocation. > > Does this mean that everytime I iterate through an ndarray that is > sourced from a memmap, the data is read from the disc? The sliced > array is at no time wholly resident in memory? What are the > performance implications of this?
Ok, sorry for the confusion. What I should have said is that there is no bulk allocation *by numpy* at all. The call to mmap does allocate a chunk of RAM to reflect file contents, but the numpy arrays don't allocate any memory of their own: they use the same memory as was allocated by the mmap call. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list