Bugs item #1729170, was opened at 2007-05-31 23:24 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1729170&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Joe (joesalmeri) Assigned to: Martin v. Löwis (loewis) Summary: os.stat producing incorrect / invalid results Initial Comment: Python 2.51 Windows XP SP2 + all updates *** NOTE all code has worked on previous python versions I just upgraded from Python 2.4.2 to Python 2.5.1 and have found some unexpected behavior that appears to be a bug in the os.stat module. My OS is Windows XP SP2 + all updates. I have several programs that have worked flawlessly on all previous Python versions for years and they are now producing incorrect results in the code that uses os.stat. Searching through the 2.5.1 release notes I found the following: Use Win32 API to implement os.stat/fstat. As a result, subsecond timestamps are reported, the limit on path name lengths is removed, and stat reports WindowsError now (instead of OSError). ********************* * Overview of the problem: ********************* Reviewing my code it seems that it should still work with the 2.5.1 os.stat changes however that does not appear to be the case. Timestamps reported by os.stat() are no longer correct and the results are not even consistent. In my first test case ALL 3 timestamps reported by Python are 1 hour less than the actual timestamp on the file. In my second test case the created and last accessed timestamps reported by Python are correct but the last write timestamp is 1 hour less than the actual timestamp on the file. As a third test I looked at the last write timestamps on approximately 21,000 files. Python determined wrong last write timestamp on approximately 1581 files. Assuming there is no error in the following code that prints out the timestamps using the new return value from os.stat() then it would appear that the 2.5.1 os.stat changes have a bug. print 'Creation Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', time.localtime(file_stats[stat.ST_CTIME])) print 'Last Access Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', time.localtime(file_stats[stat.ST_ATIME])) print 'Last Write Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', time.localtime(file_stats[stat.ST_MTIME])) ********************* * Detailed test results ********************* To demonstrate the problem I have created the following test. Here are the files that will be used in my test and their associated timestamps as reported the the dir command. 01/02/2003 12:34 PM 0 broke_test 03/06/2007 05:24 PM 3,497,177 broke_test2 05/31/2007 04:35 PM 254 runtest.cmd 05/31/2007 04:31 PM 513 broke.py The file named broke_test has a timestamp of 01/02/2003 12:34:56 (dir does not show seconds). The runtest.cmd script shows the created, last accessed, and last write timestamps as reported by the dir command (and also verified via right clicking on the file and selecting properties in Explorer. ------ >>> START runtest.cmd script <<<< @Echo Off echo Create TimeStamp reported by dir command dir /tc %1 echo Last Access TimeStamp reported by dir command dir /ta %1 echo Last Write TimeStamp reported by dir command dir /tw %1 echo Python 2.5.1 timestamp info broke.py %1 ------ >>> END runtest.cmd script <<<< The broke.py script prints out the created, last accessed, last write timestamps as Python sees them. ------ >>> START broke.py script <<<< import sys import os import stat import time file_name = sys.argv[1] file_stats = os.stat(file_name) print print 'File Name : %s' % (file_name) print print 'Creation Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', time.localtime(file_stats[stat.ST_CTIME])) print 'Last Access Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', time.localtime(file_stats[stat.ST_ATIME])) print 'Last Write Time: %s' % time.strftime('%m/%d/%Y %H:%M:%S', time.localtime(file_stats[stat.ST_MTIME])) print ------ >>> END broke.py script <<<< # # Test 1 on file broke_test # runtest broke_test Create TimeStamp reported by dir command 01/02/2003 12:34 PM 0 broke_test Last Access TimeStamp reported by dir command 01/02/2003 12:34 PM 0 broke_test Last Write TimeStamp reported by dir command 01/02/2003 12:34 PM 0 broke_test Python 2.5.1 timestamp info File Name : broke_test Creation Time: 01/02/2003 11:34:56 -- Python results are WRONG hours reported are 1 hour short Last Access Time: 01/02/2003 11:34:56 -- Python results are WRONG hours reported are 1 hour short Last Write Time: 01/02/2003 11:34:56 -- Python results are WRONG hours reported are 1 hour short # # Test 2 on file broke_test2 # runtest broke_test2 Create TimeStamp reported by dir command 05/31/2007 04:26 PM 3,497,177 broke_test2 Last Access TimeStamp reported by dir command 05/31/2007 04:26 PM 3,497,177 broke_test2 Last Write TimeStamp reported by dir command 03/06/2007 05:24 PM 3,497,177 broke_test2 Python 2.5.1 timestamp info File Name : broke_test2 Creation Time: 05/31/2007 16:26:36 -- Python results are correct Last Access Time: 05/31/2007 16:26:38 -- Python results are correct Last Write Time: 03/06/2007 16:24:21 -- Python results are WRONG hours reported are 1 hour short ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2007-06-02 11:04 Message: Logged In: YES user_id=21627 Originator: NO I think there are several things going on here, and I believe that Python is behaving correctly. On Win32, there are two ways to find out the time stamps of a file: FindFirstFile reads it from the directory, and GetFileAttributesEx gets it directly from the file (i.e. on NTFS, from the MFT record). When a file is modified, NTFS normally modifies the file only, and updates the directory only occasionally (due to hard links, there might be multiple directories to update also). In particular, the last access time is rarely written back to the directory. You can easily see this for yourself with the following procedure: 1. Inspect the last-access time with dir /ta, or in Explorer. 2. Open the file in an editor, but don't edit it; close the editor. 3. Inspect the last-access time with dir /ta, and with broke.py 4. Wait 10 minutes. 5. Repeat steps 2 and 3. You should see that broke.py will report that the file was accessed, whereas Explorer and dir/ta will report an "old" last-acecssed time. I created a program that displays both the directory-stored and the file-stored last-acecss time (attached). This program may produce outputs such as C:\Pyinst\25\python>..\..\unstop\pa\Debug\pa.exe .svn\entries Last Access according to: FindFirstFile: 128252459235000000 2007-6-2 8:18:43 GetFileAttributesEx: 128252466734062500 2007-6-2 8:31:13 Please refer to http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/fsutil_behavior.mspx?mfr=true There, they explain that the access time (if it is stored at all, which can be turned off) gets updated on disk only if the actual last-access time and the stored last-access time differ by 1 hour. So if you perform steps 2 and 3 above after 1 hour again, you will indeed see that the last access time gets updated in explorer, too. Python 2.4 did use the same (incorrect) source that dir/ta and the explorer uses; Python 2.5 has fixed that bug. So I think we should remove the last-access time from our discussion. File Added: pa.zip ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2007-06-01 07:39 Message: Logged In: YES user_id=33168 Originator: NO Martin, was there a similar problem to this one? I looked in the log for posixmodule.c and didn't see this fixed. ---------------------------------------------------------------------- Comment By: Joe (joesalmeri) Date: 2007-06-01 03:19 Message: Logged In: YES user_id=1235002 Originator: YES More details and a program that demonstrates the problem I created a file and specifically set the created date, last accessed date and last write date to 01/02/2003 12:34:56 After setting the date and time on the file I verified the following: The dir /tc command shows the correct Create date of 01/02/2003 12:34 The dir /ta command shows the correct Access date of 01/02/2003 12:34 The dir /tw command shows the correct Write date of 01/02/2003 12:34 Looking at the properties of the file in Windows Explorer also shows that all 3 dates match. When Python 2.4.2 os.stat is used to get those timestamps all 3 are CORRECT When Python 2.5.1 os.stat is used to get those timestamps all 3 are WRONG and they are off by exact 1 hour 01/02/2003 11:34:56 instead of 01/02/2003 12:34:56 In the case of my above test I know exactly what the timestamp on the file is because I manually set it so that all 3 timestamps are the same. Since Python 2.5.1 does not return the known values for that files timestamps how can it not be a Python 2.5.1 bug? Further testing shows that the results are somewhat inconsistent, many times the create and access date are correct but the Last Write timestamp is wrong. It is generally off by one hour but I have seen situations where it was +1 hour and other situations where it was -1 hour. I even found situations where the python timestamp was 1 minute later. (I know about the 2 second timestamps on FAT, all my filesystems are NTFS). I just found a situation where the python timestamp was 02:51 PM and the windows timestamp was 02:12 PM. DST or timezone changes are not going to make the results be off by 39 minutes? (My timezone is GMT - 5:00). As an additional test I wrote the following Python program which takes a directory name as a paramater. It retrieves all 3 dates for all files in all directories and subdirectories for the specified directory name. First it gets the dates using Python and os.stat, then it runs the dir /tc, dir /ta, and dir /tw commands on the file. Finally it compares the dates and times returned by Python against the dates and times that the Windows dir command is reporting. If you run the program using Python 2.4.2 then ALL dates and times returned by Python 2.4.2 correctly match the dates and times that Windows reports. If you run the program using Python 2.5.1 against the same directory then you get intermittient results. As a quick test I ran it against a directory on my home directory. Python 2.4.2 all dates matched between what the Windows dir command reported and what Python 2.4.2 reported. Python 2.5.1 had the following differences (more than 50% of the time it was wrong) Total Files Processed : 149 Matched All 3 Dates : 70 Did NOT Match All 3 Dates: 79 Matched Create Date : 69 Did NOT Match Create Date: 10 Matched Access Date : 59 Did NOT Match Access Date: 20 Matched Write Date : 11 Did NOT Match Write Date : 68 Here's the source for the check_dates.py program. Run it for youself and see what your results are. import os import stat import sys import time if len(sys.argv) == 1: print print '%s path_to_check' % (sys.argv[0]) print raise SystemExit else: path_to_check = sys.argv[1] file_count = 0 file_match_count = 0 file_no_match_count = 0 create_match_count = 0 create_no_match_count = 0 access_match_count = 0 access_no_match_count = 0 write_match_count = 0 write_no_match_count = 0 for root, dirs, files in os.walk(r'%s' % path_to_check): for file in files: file_count = file_count + 1 file_name = os.path.join(root, file) create_ts_match = False access_ts_match = False write_ts_match = False file_stats = os.stat(file_name) python_create_ts = time.strftime('%m/%d/%Y %I:%M %p', time.localtime(file_stats[stat.ST_CTIME])) python_access_ts = time.strftime('%m/%d/%Y %I:%M %p', time.localtime(file_stats[stat.ST_ATIME])) python_write_ts = time.strftime('%m/%d/%Y %I:%M %p', time.localtime(file_stats[stat.ST_MTIME])) win_create_ts = os.popen('dir /a /tc "%s"' % (file_name)).readlines()[5][0:20] win_access_ts = os.popen('dir /a /ta "%s"' % (file_name)).readlines()[5][0:20] win_write_ts = os.popen('dir /a /tw "%s"' % (file_name)).readlines()[5][0:20] if python_create_ts == win_create_ts: create_ts_match = True if python_access_ts == win_access_ts: access_ts_match = True if python_write_ts == win_write_ts: write_ts_match = True if create_ts_match and access_ts_match and write_ts_match: # All timestamps match file_match_count = file_match_count + 1 else: file_no_match_count = file_no_match_count + 1 print print 'File Name : %s' % (file_name) print if create_ts_match: create_match_count = create_match_count + 1 else: create_no_match_count = create_no_match_count + 1 print 'python_create_ts: %s' % (python_create_ts) print 'win_create_ts : %s' % (win_create_ts) print if access_ts_match: access_match_count = access_match_count + 1 else: access_no_match_count = access_no_match_count + 1 print 'python_access_ts: %s' % (python_access_ts) print 'win_access_ts : %s' % (win_access_ts) print if write_ts_match: write_match_count = write_match_count + 1 else: write_no_match_count = write_no_match_count + 1 print 'python_write_ts : %s' % (python_write_ts) print 'win_write_ts : %s' % (win_write_ts) print # # Show Count Results # print 'Total Files Processed : %s' % (file_count) print print 'Matched All 3 Dates : %s' % (file_match_count) print 'Did NOT Match All 3 Dates: %s' % (file_no_match_count) print print 'Matched Create Date : %s' % (create_match_count) print 'Did NOT Match Create Date: %s' % (create_no_match_count) print print 'Matched Access Date : %s' % (access_match_count) print 'Did NOT Match Access Date: %s' % (access_no_match_count) print print 'Matched Write Date : %s' % (write_match_count) print 'Did NOT Match Write Date : %s' % (write_no_match_count) print ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1729170&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com