Kent Johnson wrote:
2009/10/10 Xbox Muncher <xboxmunc...@gmail.com>:
What does flush do technically?
"Flush the internal buffer, like stdio‘s fflush(). This may be a no-op on some 
file-like objects."

The reason I thought that closing the file after I've written about 500MB file 
data to it, was smart -> was because I thought that python stores that data in 
memory or keeps info about it somehow and only deletes this memory of it when I 
close the file.
When I write to a file in 'wb' mode at 500 bytes at a time.. I see that the 
file size changes as I continue to add more data, maybe not in exact 500 byte 
sequences as my code logic but it becomes bigger as I make more iterations 
still.

Seeing this, I know that the data is definitely being written pretty immediately to the 
file and not being held in memory for very long. Or is it...? Does it still keep it in 
this "internal buffer" if I don't close the file. If it does, then flush() is 
exactly what I need to free the internal buffer, which is what I was trying to do when I 
closed the file anyways...

However, from your replies I take it that python doesn't store this data in an 
internal buffer and DOES immediately dispose of the data into the file itself 
(of course it still exists in variables I put it in). So, closing the file 
doesn't free up any more memory.

Python file I/O is buffered. That means that there is a memory buffer
that is used to hold a small amount of the file as it is read or
written.

You original example writes 5 bytes at a time. With unbuffered I/O,
this would write to the disk on every call to write(). (The OS also
has some buffering, I'm ignoring that.)

With buffered writes, there is a memory buffer allocated to hold the
data. The write() call just puts data into the buffer; when it is
full, the buffer is written to the disk. This is a flush. Calling
flush() forces the buffer to be written.

So, a few points about your questions:
- calling flush() after each write() will cause a disk write. This is
probably not what you want, it will slow down the output considerably.
- calling flush() does not de-allocate the buffer, it just writes its
contents. So calling flush() should not change the amount of memory
used.
- the buffer is pretty small, maybe 8K or 32K. You can specify the
buffer size as an argument to open() but really you probably want the
system default.

Kent

What Kent said.

I brought up flush(), not because you should do it on every write, but because you might want to do it on a file that's open a long time, either because it's very large, or because you're doing other things while keeping the file open. A flush() pretty much assures that this portion of the file is recoverable, in case of subsequent crash.

The operating system itself is also doing some buffering. After all, the disk drive writes sectors in multiples of at least 512 bytes, so if you write 12 bytes and flush, it needs at least to know about the other 504 bytes in the one sector. The strategy of this buffering varies depending on lots of things outside of the control of your Python program. For example, a removable drive can be mounted either for "fast access" or for "most likely to be recoverable if removed unexpectedly." These parameters (OS specific, and even version specific) will do much more buffering than Python or the C runtime library will ever do.

Incidentally, just because you can see that the file has grown, that doesn't mean the disk drive itself has been updated. It just means that the in-memory version of the directory entries has been updated. Those are buffered as well, naturally. If they weren't, then writing performance would be truly horrendous.

Anyway, don't bother closing and re-opening, unless it's to let some other process get access to the file. And use flush() judiciously, if at all, considering the tradeoffs.

Did you follow my comment about using the modulo operator to do something every nth time through a loop?

DaveA



_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to