Eric Blake wrote: > [adding bug-gnulib] > > According to Eric Blake on 12/15/2009 7:48 PM: >> According to John Stanley on 12/15/2009 4:42 PM: >>> Basically, what's happening is that 'touch -a ..' updated ctime in >>> coreutils-7.6, >>> but does not update ctime in coreutils-8.2 (hence misc/ls-time fails). >> >> Ouch. That's a bug in the kernel; I can reproduce it: >> >> $ uname -a >> Linux fencepost 2.6.26-2-xen-amd64 #1 SMP Thu Nov 5 04:27:12 UTC 2009 >> x86_64 GNU/Linux >> $ touch q >> $ stat -c '%x %z' q >> 2009-12-15 21:46:33.186677568 -0500 2009-12-15 21:46:33.186677568 -0500 >> $ touch -a q >> $ stat -c '%x %z' q >> 2009-12-15 21:47:15.157175384 -0500 2009-12-15 21:46:33.186677568 -0500 >> $ > > According to strace, coreutils 6.10 used syscall_280 (which I'm assuming > is utimensat, and that strace is just behind the times compared to the > kernel); ltrace says it was via: > futimesat(0, 0, 0x7fff0568c900, 0, 3) = 0 > > The newer coreutils likewise uses syscall_280, but via: > > futimens(0, 0x7fff5b31a450, 0x60ebd0, 0x7fff5b31a450, 3) = 0 > > By comparing the results of 'touch f' and 'touch -a f', it appears that > the kernel ctime bug is only triggered when UTIME_OMIT is passed as one of > the two timestamps (which is only possible via futimens/utimensat, not > futimesat). And that is consistent with the fact that coreutils didn't > use UTIME_OMIT until coreutils 8.1. > > Also, it means that I can probably devise a way to work around the bug in > gnulib while we wait for the kernel folks to fix their bug. However, > there's a question of the minimal number of syscalls needed to fix the > problem. It may be that UTIME_NOW also has an impact. My current idea: > > Keep a cache variable that shows whether UTIME_OMIT works (0=unknown, > 1=yes, -1=no). If the variable is -1, then treat UTIME_OMIT the same was > as we do for futimesat (that is, call stat()/gettime() to populate the > struct timespec prior to making the syscall). If the variable is 1, then > the kernel has been fixed. > > If the variable is 0, then perform [f]stat both before and after the > utimensat call; if the times differ, set the cache variable to 1 and we're > done. Otherwise, ctime didn't change, so also call gettime(). If gettime > is within 10 ms of the second stat, the results are inconclusive (given > that we have proven that some filesystems have a quantization boundary of > 10 ms where multiple actions within that window all end up with the > timestamp), so leave the cache at 0, but re-call utimensat anyways with > the times learned by stat/gettime(). Otherwise, the current time and the > second ctime differ by more than 10 ms, so utimensat UTIME_OMIT is broken; > set cache to -1, and fix the problem by re-calling utimensat with the > times learned by stat/gettime(). > > Sounds quite hairy. Any ideas for improvements?
Thanks for investigating and scoping out the solution. I agree that it sounds hairy, but it also sounds like the required approach. > And how best to report this bug to the kernel folks? Posting a minimal demo to lkml should do it. It'd be good to identify the affected kernel versions so we can document it and have a chance at someday removing the work-around code when those kernels are no longer relevant.