On May 22, 3:20 pm, [EMAIL PROTECTED] wrote: > On May 22, 8:51 am, "A.T.Hofkamp" <[EMAIL PROTECTED]> wrote: > > > > > On 2008-05-22, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > > > > Hi, I wanted to know how cautious it is to do something like: > > > > f = file("filename", "rb") > > > f.read() > > > > for a possibly huge file. When calling f.read(), and not doing > > > anything with the return value, what is Python doing internally? Is it > > > loading the content of the file into memory (regardless of whether it > > > is discarding it immediately)? > > > I am not a Python interpreter developer, but as user, yes I'd expect that to > > happen. The method doesn't know you are not doing anything with its return > > value. > > > > In my case, what I'm doing is sending the return value through a > > > socket: > > > > sock.send(f.read()) > > > > Is that gonna make a difference (memory-wise)? I guess I'm just > > > concerned with whether I can do a file.read() for any file in the > > > system in an efficient and memory-kind way, and with low overhead in > > > general. (For one thing, I'm not loading the contents into a > > > variable.) > > > Doesn't matter. You allocate a string in which the contents is loaded (the > > return value of 'f.read()', and you hand over (a reference to) that string > > to > > the 'send()' method. > > > Note that memory is allocated by data *values*, not by *variables* in Python > > (they are merely references to values). > > > > Not that I'm saying that loading a huge file into memory will horribly > > > crash the system, but it's good to try to program in the safest way > > > possibly. For example, if you try something like this in the > > > Depends on your system, and your biggest file. > > > At a 32 bit platform, anything bigger than about 4GB (usually already at > > around > > 3GB) will crash the program for the simple reason that you are running out > > of > > address space to store bytes in. > > > To fix, read and write blocks by specifying a block-size in the 'read()' > > call. > > I see... Thanks for the reply. > > So what would be a good approach to solve that problem? The best I can > think of is something like: > > MAX_BUF_SIZE = 100000000 # about 100 MBs > > f = file("filename", "rb") > f.seek(0, 2) # relative to EOF > length = f.tell() > bPos = 0 > > while bPos < length: > f.seek(bPos) > bPos += sock.send(f.read(MAX_BUF_SIZE))
I would go with: f = file("filename", "rb") while True: data = f.read(MAX_BUF_SIZE) if not data: break sock.sendall(data) -- http://mail.python.org/mailman/listinfo/python-list