On 22 Oct, 19:54, Mike Kent <[EMAIL PROTECTED]> wrote: > Before I file a bug report against Python 2.5.2, I want to run this by > the newsgroup to make sure I'm not being stupid. > > I have a text file of fixed-length records I want to read in random > order. That file is being changed in real-time by another process, > and my process want to see the changes to the file. What I'm seeing > is that, once I've opened the file and read a record, all subsequent > seeks to and reads of that same record will return the same data as > the first read of the record, so long as I don't close and reopen the > file. This indicates some sort of buffering and caching is going on. > > Consider the following: > > $ echo "hi" >foo.txt # Create my test file > $ python2.5 # Run Python > Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07) > [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > > >>> f = open('foo.txt') # Open my test file > >>> f.seek(0) # Seek to the beginning of the file > >>> f.readline() # Read the line, I get the data I expected > 'hi\n' > >>> # At this point, in another shell I execute 'echo "bye" >foo.txt'. > >>> 'foo.txt' now has been changed > >>> # on the disk, and now contains 'bye\n'. > >>> f.seek(0) # Seek to the beginning of the still-open file > >>> f.readline() # Read the line, I don't get 'bye\n', I get the > >>> original data, which is no longer there. > 'hi\n' > >>> f.close() # Now I close the file... > >>> f = open('foo.txt') # ... and reopen it > >>> f.seek(0) # Seek to the beginning of the file > >>> f.readline() # Read the line, I get the expected 'bye\n' > 'bye\n' > > It seems pretty clear to me that this is wrong. If there is any > caching going on, it should clearly be discarded if I do a seek. Note > that it's not just readline() that's returning me the wrong, cached > data, as I've also tried this with read(), and I get the same > results. It's not acceptable that I have to close and reopen the file > before every read when I'm doing random record access. > > So, is this a bug, or am I being stupid?
Hello Mike, I'm guessing that this is not a bug. I'm no expert, but I'd guess that the open(file, mode) function simply loads the file into memory, and that further operations (such as seek or read) are performed on the in-memory data rather than the data on disk. Thus changes to the file are only observed after a fresh open operation. This behaviour is probably enforced by the C library on the machine that you are using. If you want to be able to pick up data changes like this then you're better off using a database package that has support for concurrent access, locking and transactions. Cheers, Kev -- http://mail.python.org/mailman/listinfo/python-list