On 2008-10-22 23:00, kdwyer wrote: > On 22 Oct, 19:54, Mike Kent <[EMAIL PROTECTED]> wrote: >> Before I file a bug report against Python 2.5.2, I want to run this by >> the newsgroup to make sure I'm not being stupid. >> >> I have a text file of fixed-length records I want to read in random >> order. That file is being changed in real-time by another process, >> and my process want to see the changes to the file. What I'm seeing >> is that, once I've opened the file and read a record, all subsequent >> seeks to and reads of that same record will return the same data as >> the first read of the record, so long as I don't close and reopen the >> file. This indicates some sort of buffering and caching is going on.
The C lib uses a buffer for reading files and you are seeing the affects of this. Try using f = open('foo.txt', 'r', 0) http://www.python.org/doc/2.5.2/lib/built-in-funcs.html#l2h-54 >> Consider the following: >> >> $ echo "hi" >foo.txt # Create my test file >> $ python2.5 # Run Python >> Python 2.5.2 (r252:60911, Sep 22 2008, 16:13:07) >> [GCC 3.4.6 20060404 (Red Hat 3.4.6-9)] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>>>> f = open('foo.txt') # Open my test file >>>>> f.seek(0) # Seek to the beginning of the file >>>>> f.readline() # Read the line, I get the data I expected >> 'hi\n' >>>>> # At this point, in another shell I execute 'echo "bye" >foo.txt'. >>>>> 'foo.txt' now has been changed >>>>> # on the disk, and now contains 'bye\n'. >>>>> f.seek(0) # Seek to the beginning of the still-open file >>>>> f.readline() # Read the line, I don't get 'bye\n', I get the >>>>> original data, which is no longer there. >> 'hi\n' >>>>> f.close() # Now I close the file... >>>>> f = open('foo.txt') # ... and reopen it >>>>> f.seek(0) # Seek to the beginning of the file >>>>> f.readline() # Read the line, I get the expected 'bye\n' >> 'bye\n' >> >> It seems pretty clear to me that this is wrong. If there is any >> caching going on, it should clearly be discarded if I do a seek. Note >> that it's not just readline() that's returning me the wrong, cached >> data, as I've also tried this with read(), and I get the same >> results. It's not acceptable that I have to close and reopen the file >> before every read when I'm doing random record access. >> >> So, is this a bug, or am I being stupid? > > Hello Mike, > > I'm guessing that this is not a bug. I'm no expert, but I'd guess > that the open(file, mode) function simply loads the file into memory, > and that further operations (such as seek or read) are performed on > the in-memory data rather than the data on disk. Thus changes to the > file are only observed after a fresh open operation. > > This behaviour is probably enforced by the C library on the machine > that you are using. If you want to be able to pick up data changes > like this then you're better off using a database package that has > support for concurrent access, locking and transactions. > > Cheers, > > Kev > -- > http://mail.python.org/mailman/listinfo/python-list -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Oct 23 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 -- http://mail.python.org/mailman/listinfo/python-list