Launchpad has imported 66 comments from the remote bug at https://bugzilla.redhat.com/show_bug.cgi?id=501848.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2009-05-21T02:29:39+00:00 Issue wrote: Escalated to Bugzilla from IssueTracker Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/0 ------------------------------------------------------------------------ On 2009-05-21T02:29:41+00:00 Issue wrote: Description of problem: If you run a du -h on a directory with .snapshot sub directories with coreutils-6.10+ (Could be lower, but >5.97-20) you will get a fts_read error: du: fts_read failed: No such file or directory How reproducible: Every time. Steps to Reproduce: 1. Use F10 or anything with the higher versions of coreutils. and a machine with the .snapshot directories created by netapp. 2. du -h 3. wait. Actual results: du: fts_read failed: No such file or directory Expected results: The size listing of all files and/or directories Additional info: This event sent from IssueTracker by cwyse [Pixar Animation Studios - Fedora Queue] issue 298936 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/1 ------------------------------------------------------------------------ On 2009-05-21T03:18:21+00:00 Issue wrote: I guess I spoke to soon. With coreutils-6.9-2 the problem is less noticeable. On smaller directories it doesn't show up at all. But with larger directories (directories with many sub directories) it is still there, so some users will notice it and some will not. Problem is now between: coreutils-5.97-19 > and < coreutils-6.9-2. I tried compiling the 6.7.* coreutils package but it keeps failing on build and isn't saying what or why. I will look more into this and update with what I find. This event sent from IssueTracker by cwyse issue 298936 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/2 ------------------------------------------------------------------------ On 2009-05-21T07:36:22+00:00 Kamil wrote: It can be caused by the on-the-fly changes within the directory. It just want to traverse a directory (or file?) which no more exists. I am pretty sure you can't see the errors if you mount the file system read- only. But there is no doubt the error message might be more verbose, it is listed as FIXME in du.c. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/3 ------------------------------------------------------------------------ On 2009-05-21T09:44:01+00:00 Ondrej wrote: Additionally - I guess Fedora version should be changed to something not EOL (as F-8 is EOL and F-9 will be EOL in ~2 months). From the comments I think version should be changed to F-10 - correct? Or some RHEL version? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/4 ------------------------------------------------------------------------ On 2009-05-21T09:47:01+00:00 Ondrej wrote: Additionally strace from the failure could be useful to better analyze what's the culprit... Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/5 ------------------------------------------------------------------------ On 2009-05-21T17:14:53+00:00 Charlie wrote: Created attachment 344994 strace of failure I agree, changing version to F10, I originally just set it for the first version I noticed this problem in. Also, here is an strace of the failure on a F10 machine. I'm gonna try some of the 6.7 packages again and see if I can narrow down the window in which this fails. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/6 ------------------------------------------------------------------------ On 2009-05-21T17:49:54+00:00 Issue wrote: Finally got 6.7-1 compiled. It shows the same fts_read issue. 5.97-22 >< 6.7-1 This is about as narrowed as I can get it, I'm gonna try diff'ing up a patch between du.c and... see what happens. This event sent from IssueTracker by cwyse issue 298936 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/7 ------------------------------------------------------------------------ On 2009-05-21T18:46:02+00:00 Issue wrote: patching fts.c was a fail on compile. There looks like a fts.c.du file in 6.7 and a fts.c.inaccessibledirs in 5.7. Since the files do not exist in both trees I'm kinda not sure how to test patching that. This event sent from IssueTracker by cwyse issue 298936 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/8 ------------------------------------------------------------------------ On 2009-05-21T19:55:42+00:00 Kamil wrote: Could you please try following on the same directory? $ find -printf %b\\n Does it give the same errors? Different errors? No errors? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/9 ------------------------------------------------------------------------ On 2009-05-21T20:54:59+00:00 Issue wrote: Ran "find -printf %b\n" I didn't get any errors, it took over an hour and ran my cpu at 121.6%, but no errors, running it again to verify. This event sent from IssueTracker by cwyse issue 298936 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/10 ------------------------------------------------------------------------ On 2009-05-21T21:23:49+00:00 Kamil wrote: Does 'du' print just one error message and then die? Is the output obviously incomplete? Or the problem is only about the error message and return code? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/11 ------------------------------------------------------------------------ On 2009-06-10T12:03:46+00:00 Ondrej wrote: Something interesting to read (about the same issue and how to reduce impact): http://www.unixtutorial.org/2009/02/troubleshooting-du-fts_read-no-such-file-or-directory-error/ >From what I have quickly checked, if find's fts_read() returns NULL, it just closes FTS structure and goes to next argument. If du's fts_read() returns NULL, it checks for errno - and spits corresponding diagnostics. Difference is in checking function - find has a bit more complex checking function consider_visiting() - maybe some parts from it should be used/adapted in du's process_file() function. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/12 ------------------------------------------------------------------------ On 2009-07-20T19:08:25+00:00 Ondrej wrote: Played a bit with that bz again - fts_read error is being set on lib/fts.c:2000 - hardcoding ENOENT to errno. Error occurs when ".." entry is not cached yet, so with more repeating after mount, it seems to be possible to get rid off those errors and to get correct result. It seems that check on fts.c:1997/1998 has to be extended to handle properly that situation with NetApp .snapshot dir. Using du -Lsh also helps in some cases. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/13 ------------------------------------------------------------------------ On 2009-07-21T11:27:05+00:00 Ondrej wrote: Created attachment 354463 Workaround for ".." directories and ?race conditions? Played a bit more with that fts_read failure, attached patch is workarounding the issue. It seems that due to "maybe caching race condition" after fstat on ".." fts entry it sometimes has device number of the parent directory (first run after mount). e.g. (variable: fts_value : fstat_value): devicenum: 25 : 33 inode: 8217100 : 8217100 Next run on the same place has correctly same values for fts_value and fstat_value and it looks like: devicenum: 33 : 33 inode: 8217100 : 8217100 I'm quite sure that patch is NOT correct way how to solve that issue, that race condition should be eliminated - but I'm not really sure where. Filesystem? Kamil - any idea? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/14 ------------------------------------------------------------------------ On 2009-07-21T11:37:27+00:00 Ondrej wrote: Created attachment 354464 Better one ;) workaround for ".." directories and ?race conditions? Damned, previous one was obviously not correct ... that one should be better... Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/15 ------------------------------------------------------------------------ On 2009-07-21T19:31:50+00:00 Charlie wrote: I added the patch to the latest coreutils package and I haven't seen the error yet. I ran a du -h over my lunch break. I'm letting my customer try it out and give it his stamp of approval. But so far it looks like it resolves the issue. I'll let you know if anything changes. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/16 ------------------------------------------------------------------------ On 2009-07-22T14:01:18+00:00 Kamil wrote: I've narrowed down the strange behavior to sort of minimal example (/mnt/archive is a NetApp mount point): umount /mnt/archive && mount /mnt/archive \ && stat --printf "%d\t%i\t%n\n" /mnt/archive/.snapshot \ && stat --printf "%d\t%i\t%n\n" /mnt/archive/.snapshot/hourly.0 \ && stat --printf "%d\t%i\t%n\n" /mnt/archive/.snapshot The output is following: 20 67 /mnt/archive/.snapshot 26 222 /mnt/archive/.snapshot/hourly.0 26 67 /mnt/archive/.snapshot The device number is being changed on the fly while the inode number stays unchanged. It sounds like a file system bug to me. It's 100% reproducible on my box. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/17 ------------------------------------------------------------------------ On 2009-07-22T18:36:17+00:00 Charlie wrote: I ran the patched coreutils package on my .snapshot directory 3 times and didn't see a single error. It takes about 30 minutes to go through the .snapshot directory. Before Ovasik's patch it would run for about 30 seconds then fail. Kdudka, are you using the patch and still noticing this? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/18 ------------------------------------------------------------------------ On 2009-07-22T18:51:16+00:00 Issue wrote: Event posted on 07-22-2009 02:51pm EDT by cwyse Customer just got back to me with some comments. This new package creates .snapshot directories on his desktop. This was a problem in F10 which went away with F11. So it looks like a slight regression? This event sent from IssueTracker by cwyse issue 298936 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/19 ------------------------------------------------------------------------ On 2009-07-22T18:53:17+00:00 Charlie wrote: Here are the previous bugs that were related to the .snapshots showing up on the desktop. Just posting them here in case they help. As noted in https://bugzilla.redhat.com/show_bug.cgi?id=472778 and https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb44598, NetApp filers use different FSIDs for the hidden snapshot directories they provide. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/20 ------------------------------------------------------------------------ On 2009-07-23T10:52:33+00:00 Kamil wrote: (In reply to comment #21) > Kdudka, are you using the patch and still noticing this? The patch is only workaround for 'du' utility. It works for me, too. But it does not fix the file system bug. The minimal example uses 'stat', so it has nothing to do with that patch. The comment #22 is missing some context here. Which package does create the .snapshot directories on customer's desktop? I am quite sure that coreutils does not. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/21 ------------------------------------------------------------------------ On 2009-08-05T14:53:25+00:00 Kamil wrote: The problem persists with latest rawhide kernel: Linux 2.6.31-0.122.rc5.git2.fc12.x86_64 #1 SMP Mon Aug 3 12:58:47 EDT 2009 x86_64 /etc/fstab: filer-eng.brq.redhat.com:/vol/engineering/share /mnt/archive nfs ro 0 0 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/22 ------------------------------------------------------------------------ On 2009-08-05T17:34:32+00:00 Jeff wrote: Looking at the capture, it doesn't appear that the server is returning inconsistent inode info. However Kamil's reproducer seems to indicate that the client is changing the device number after it traverses into the directory. I suspect that this means that the client isn't doing the shrinkable mount before returning the info on the first stat call. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/23 ------------------------------------------------------------------------ On 2009-09-02T11:54:16+00:00 Jeff wrote: Confirmed...same behavior in rawhide too. I can also reproduce this with a non-netapp server simply by exporting a filesystem and then exporting another filesystem mounted onto a subdir of the first fs. Nothing netapp-specific here. There's also a somewhat related problem...if a submount is done and then gets automatically unmounted, then the device numbers can change and even be reused for a completely different submount. This is a bit tricky. On the one hand, the device number seems to change and that's probably bad for some apps. On the other hand, do we really want to trigger a mount just because someone did a stat() on the directory where we would eventually do a submount? If I have a ton of exports that are subdirs of another exported filesystem I don't think I really want to do submounts of all of those filesystems just because someone did a "ls -l" in that directory. Unfortunately, the device numbers for NFS are allocated on the fly during mount. So we can't easily "fake up" the device numbers and expect them to remain consistent without actually triggering a mount. The device number may be different once the submount gets done. I suspect that the best we can probably do is to just make sure the device number is different from that of the parent filesystem, but we probably won't be able to make it consistent. That is, it'll change as soon as you walk into the dir... I'll plan to do a writeup of this problem in the near future and post it to the upstream mailing list. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/24 ------------------------------------------------------------------------ On 2009-09-02T16:47:45+00:00 Jeff wrote: This problem is really no different than how autofs works. When you run stat on an autofs mountpoint, you'll just get the directory until you walk into that directory. That's actually correct behavior since you're adding a new mount when that occurs. This is almost completely the same thing, it's just that the kernel does a new mount w/o needing autofs. I'm not sure this is actually bug, rather you're just seeing expected results when the kernel adds a new mount on the fly. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/25 ------------------------------------------------------------------------ On 2009-09-02T17:07:37+00:00 Kamil wrote: Jeff, thanks for the analysis. I'll look at the fts code again and possibly reassign back to coreutils. Good to know it's reproducible independently on the NetApp mount point. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/26 ------------------------------------------------------------------------ On 2009-09-03T17:55:10+00:00 Jeff wrote: Sounds good. I'll reassign this back to you for now. Let me know if you need further clarification. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/27 ------------------------------------------------------------------------ On 2009-10-18T15:51:21+00:00 Kamil wrote: making the bug public... Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/28 ------------------------------------------------------------------------ On 2009-10-21T18:27:51+00:00 Kamil wrote: Reported upstream: http://lists.gnu.org/archive/html/bug-gnulib/2009-10/msg00207.html Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/29 ------------------------------------------------------------------------ On 2009-10-31T12:11:36+00:00 Jim wrote: Hello, Is this happening because the device number is assigned first to one value initially, and later to another value -- all during a single hierarchy traversal? If so, I'll have to push this back into the kernel/file-system court. I think we'll have to make the file system present a consistent device and inode number for any file it serves. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/30 ------------------------------------------------------------------------ On 2009-10-31T12:34:27+00:00 Kamil wrote: (In reply to comment #40) > Is this happening because the device number is assigned first to one value > initially, and later to another value -- all during a single hierarchy > traversal? It looks like a sort of expected behavior to me. If the file system is not mounted, the device number describes the directory which belongs to the surrounding file system. Once you trigger the mount, the same path (directory) belongs to the newly mounted file system, thus gets a new device number. In fact I was more likely surprised how the inode number could stay consistent among the mounts. > If so, I'll have to push this back into the kernel/file-system court. > I think we'll have to make the file system present a consistent device and > inode number for any file it serves. Well, I try to prepare a complete client/server reproducer first since the one from comment #20 uses our internal server, not available to others for testing. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/31 ------------------------------------------------------------------------ On 2009-10-31T12:58:53+00:00 Jim wrote: What event triggers the mount? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/32 ------------------------------------------------------------------------ On 2009-10-31T13:13:44+00:00 Kamil wrote: (In reply to comment #42) > What event triggers the mount? >From my observation with gdb: 1. calling fstatat() with AT_SYMLINK_NOFOLLOW does NOT trigger the mount. 2. calling fstatat() without AT_SYMLINK_NOFOLLOW triggers the mount, opening a directory as well. If you are asking which events are guaranteed to trigger the mount and/or which events are guaranteed to NOT trigger the mount, kernel guys might give you a reliable answer. Jeff, any idea? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/33 ------------------------------------------------------------------------ On 2009-11-02T19:01:06+00:00 Jeff wrote: Submounts are triggered via the follow_link inode operation, so in some ways these are treated like symlinks... The short answer is that the mount will be triggered whenever you walk a path in such a way that, if this component were a symlink it would be resolved to its target. Longer answer: If the place where you transition into a new filesystem is in the middle of a path, then generally the path will be resolved. If it's the last component of the path, then it depends on whether the LOOKUP_FOLLOW link flag is set in nameidata in the kernel. That varies with the type of operation -- for instance, lstat() won't have that set, but a "normal" stat() generally will. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/34 ------------------------------------------------------------------------ On 2009-11-03T12:41:30+00:00 Kamil wrote: Minimal example which works reliably on my Fedora 11 installation: # mount | grep ^/ /dev/sda1 on / type ext3 (rw) /dev/sda3 on /home type ext4 (rw) # ls -d /home/test /home/test # printf "/ *(fsid=0,crossmnt)\n/home *(crossmnt)\n" \ > /etc/exports # service nfs restart # mkdir /tmp/mnt # mount -t nfs4 localhost:/ /tmp/mnt \ && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home \ && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home/test \ && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home 29 2 /tmp/mnt/home 30 12 /tmp/mnt/home/test 30 2 /tmp/mnt/home Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/35 ------------------------------------------------------------------------ On 2009-11-03T20:22:57+00:00 Kamil wrote: A patch for gnulib proposed upstream: http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/36 ------------------------------------------------------------------------ On 2009-11-04T16:37:03+00:00 Kamil wrote: (In reply to comment #46) > A patch for gnulib proposed upstream: > > http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html The patch has been rejected by upstream because of performance impact in some obscure situations (namely traversing a directory which consists of 200000 directories nested in each other): http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00032.html As solution it was proposed to find (or perhaps implement?) a low cost way of recognizing a mount point during the traversal. "low cost" means cheaper than a stat call here. Since there seems to be nothing I can do with this bug at the moment, I am reassigning it back to kernel. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/37 ------------------------------------------------------------------------ On 2009-11-04T18:32:30+00:00 Jim wrote: Hi Kamil, Using your reproducer (above, thanks!) let's print one more dev/ino pair (this is on F12): $ stat --printf "%d %i %n\n" /tmp/mnt/home /tmp/mnt 24 2 /tmp/mnt/home 24 2 /tmp/mnt That shows a big problem: two distinct directories have the same dev/ino pair, and fts rightly objects, returning FTS_DC to indicate the directory cycle. Because when fts encounters the same dev/ino pair twice in a traversal, and when not traversing symlinks, that represents a hard-linked directory cycle, which is usually a big problem. [Note that currently du does not diagnose this problem, but I'll fix that shortly. ] Even if the above kernel/nfs bug is fixed, I am becoming more and more convinced that this varying-device-number problem is something that must be addressed in the kernel, and not in every single application that must perform dev/ino checks for security. Thanks for reassigning to the kernel. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/38 ------------------------------------------------------------------------ On 2009-11-04T18:51:58+00:00 Kamil wrote: (In reply to comment #48) > $ stat --printf "%d %i %n\n" /tmp/mnt/home /tmp/mnt > 24 2 /tmp/mnt/home > 24 2 /tmp/mnt Good catch! Though I don't think you hit the cause of the original bug report, this looks indeed broken. The dev/ino pair should be unique per whole VFS, or am I wrong? Jeff, what do think about the example? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/39 ------------------------------------------------------------------------ On 2009-11-04T19:18:08+00:00 Jeff wrote: I'd have to look at the example more closely, but it's likely that the kernel code is picking up the inode number of the root inode of the underlying filesystem. I think what's happening is that the server sends the inode number of /tmp/mnt/home and a new fsid, but the client doesn't actually spawn a new submount there. So the device ID ends up the same. In fact, all of my ext3/4 filesystems seem to give the root inode st_ino == 2, so that's probably what's happening. The trivial workaround here is to probably use stat() instead of lstat() here (-L option to the stat program), but I imagine that won't be suitable? How to fix this? I don't think there is a way to do so without triggering a submount even when we don't want to follow symlinks. That's going to be very costly for performance in many cases (if it's even reasonably doable). Imagine cd'ing into a directory that has a 1000 exported filesystems under it. Simply doing a readdir() in there is going to make the client spawn 1000 new mounts. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/40 ------------------------------------------------------------------------ On 2009-11-04T19:34:32+00:00 Kamil wrote: (In reply to comment #50) > The trivial workaround here is to probably use stat() instead of lstat() here > (-L option to the stat program), but I imagine that won't be suitable? Yep, this suppresses the bug as well as du -L in the original bug report. But we get a different result, so it's really not suitable. > How to fix this? I don't think there is a way to do so without triggering a > submount even when we don't want to follow symlinks. I think this *should* be fixed since it breaks one of the basic axioms about VFS. > That's going to be very costly for performance in many cases (if it's even > reasonably doable). Imagine cd'ing into a directory that has a 1000 exported > filesystems under it. Simply doing a readdir() in there is going to make the > client spawn 1000 new mounts. No chance to get unique dev/ino pairs without triggering the mount first? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/41 ------------------------------------------------------------------------ On 2009-11-04T19:50:18+00:00 Peter wrote: No, sorry, no way to determine what the ino is for the new file system without talking to the server. Doing an ls in a directory full of many autofs mounted file systems should not trigger mounts for all of those file systems. This will cause a bigger performance problem than the original perceived problem ever did. Perhaps the right way to address this is to flag the returned directory entries to the user level with something which indicates that the metadata information for that entry will change if the file system which would be mounted there was actually mounted there. This would eliminate most of the extra stat calls that Jim Meyering is worried about. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/42 ------------------------------------------------------------------------ On 2009-11-04T20:07:51+00:00 Jim wrote: FYI, I've (re)raised the issue on LKML: http://lkml.org/lkml/2009/11/4/451 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/43 ------------------------------------------------------------------------ On 2009-11-04T20:13:13+00:00 Jeff wrote: Minor nit...we get the correct st_ino for the directory. The problem is that we don't have accurate st_dev info at that point since the mount hasn't occurred yet. That said...it would be nice to be able to flag the entries in the way that Peter suggests. The question is how to do that in a way that's compatible with POSIX here. Maybe we could declare a new S_IF* value for st_mode: S_IFXDEV 020000 That should allow us to leave the S_IFDIR bit set and it employs a bit that's outside of __S_IFMT. The kernel could set this bit in the statbuf when it detects that the fsid on the inode is not the same as that of the parent directory. The big question is whether and if someone wants to implement this and then sell it upstream :) Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/44 ------------------------------------------------------------------------ On 2009-11-05T16:18:53+00:00 Kamil wrote: Another question is how coreutils will detect that running kernel has the ability to indicate mount points, thus decide whether to use the optimization or not. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/45 ------------------------------------------------------------------------ On 2009-11-05T16:35:56+00:00 Peter wrote: If an approach similar to what Jeff has suggested, then it won't matter. If the kernel sets S_IFXDEV,then coreutils can use the optimization. If it doesn't, then it won't? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/46 ------------------------------------------------------------------------ On 2009-11-05T16:50:03+00:00 Kamil wrote: Nope, if I understand it correctly, the semantic of S_IFXDEV bit is exactly opposite. If the bit is set, we need to call stat again after opening a directory. But if it's not set and we don't know if the kernel provides this feature, we can't use the optimization and need to call stat anyway. Or am I wrong? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/47 ------------------------------------------------------------------------ On 2009-11-05T17:06:58+00:00 Peter wrote: Yes, sorry, was looking at the other way around. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/48 ------------------------------------------------------------------------ On 2009-11-05T17:21:09+00:00 Kamil wrote: I think we need either a bit with exactly inverse value, or another equipment indicating that kernel is able to set the S_IFXDEV bit reliably. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/49 ------------------------------------------------------------------------ On 2009-11-07T11:54:03+00:00 Jim wrote: (In reply to comment #48) > Using your reproducer (above, thanks!) let's print one more dev/ino pair > (this is on F12): > > $ stat --printf "%d %i %n\n" /tmp/mnt/home /tmp/mnt > 24 2 /tmp/mnt/home > 24 2 /tmp/mnt > > That shows a big problem: two distinct directories have the same dev/ino pair, FYI, I've opened a new BZ to track this separate problem: https://bugzilla.redhat.com/show_bug.cgi?id=533569 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/51 ------------------------------------------------------------------------ On 2010-04-27T14:26:07+00:00 Bug wrote: This message is a reminder that Fedora 11 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 11. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '11'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 11's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 11 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/79 ------------------------------------------------------------------------ On 2010-06-28T12:38:09+00:00 Bug wrote: Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/80 ------------------------------------------------------------------------ On 2010-06-28T13:59:17+00:00 Jim wrote: I wish it could be closed... Still afflicts rawhide. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/81 ------------------------------------------------------------------------ On 2010-07-30T10:39:47+00:00 Bug wrote: This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle. Changing version to '14'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/82 ------------------------------------------------------------------------ On 2010-11-24T16:57:34+00:00 Jim wrote: still affects rawhide. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/85 ------------------------------------------------------------------------ On 2011-01-11T13:47:33+00:00 Kamil wrote: (In reply to comment #45) > # mount -t nfs4 localhost:/ /tmp/mnt \ > && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home \ > && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home/test \ > && stat --printf "%d\t%i\t%n\n" /tmp/mnt/home > > 29 2 /tmp/mnt/home > 30 12 /tmp/mnt/home/test > 30 2 /tmp/mnt/home FYI I tried the same example on my RHEL-5 machine and, surprisingly, there seems to be no such optimization. The first lstat() syscall on /tmp/mnt/home triggers the the mount of /tmp/mnt/home and picks the final dev/ino pair. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/86 ------------------------------------------------------------------------ On 2011-01-11T14:01:37+00:00 Kamil wrote: ... but it is still reproducible with autofs mount points even on RHEL-5. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/87 ------------------------------------------------------------------------ On 2011-01-11T18:55:31+00:00 Jeff wrote: I concur. I can't reproduce this any more either on nfsv4: # mount /mnt/dantu && stat --printf "%d\t%i\t%n\n" /mnt/dantu && stat --printf "%d\t%i\t%n\n" /mnt/dantu/ext3 && stat --printf "%d\t%i\t%n\n" /mnt/dantu/ext3/testfile && stat --printf "%d\t%i\t%n\n" /mnt/dantu/ext3 24 2 /mnt/dantu 25 2 /mnt/dantu/ext3 25 49153 /mnt/dantu/ext3/testfile 25 2 /mnt/dantu/ext3 ...in my setup the host exports a filesystem and "ext3" is a mounted and exported filesystem under that. It seems like something has changed and now lstat() calls are triggering the mount. I'm going back through the changelogs now to see why it's different now. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/88 ------------------------------------------------------------------------ On 2011-01-11T18:59:18+00:00 Jeff wrote: I should point out that those last results were with my latest RHEL5 test kernels. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/89 ------------------------------------------------------------------------ On 2011-01-11T19:13:36+00:00 Kamil wrote: Jeff, sorry if my comment was confusing, but I think we both have exactly same results. This bug (501848) is against Fedora. RHEL-5 didn't repeat the the bug with nfsv4 for me, but I am still able to reproduce it on RHEL-5 with autofs. I wrote the comment here only as an auxiliary observation while investigating bug 537463 , which is against RHEL-5. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/90 ------------------------------------------------------------------------ On 2011-01-11T19:44:42+00:00 Jeff wrote: No problem. It wasn't confusing. Steve asked me to have a look at this and I was just surprised that I was unable to reproduce this on recent RHEL5 kernels with NFSv4. Not sure why that is so far... Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/91 ------------------------------------------------------------------------ On 2013-04-03T19:57:42+00:00 Fedora wrote: This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/94 ------------------------------------------------------------------------ On 2013-04-05T15:52:38+00:00 Justin wrote: Is this still a problem with 3.9 based F19 kernels? Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/95 ------------------------------------------------------------------------ On 2013-04-23T17:26:31+00:00 Justin wrote: This bug is being closed with INSUFFICIENT_DATA as there has not been a response in 2 weeks. If you are still experiencing this issue, please reopen and attach the relevant data from the latest kernel you are running and any data that might have been requested previously. Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/96 ------------------------------------------------------------------------ On 2013-04-23T22:03:13+00:00 Kamil wrote: The problem still exists in kernel-3.9.0-0.rc7.git3.1.fc20.x86_64. The reproducer from comment #45 works for me: [root@f20 ~]# mount -t nfs4 localhost:/ /tmp/mnt && stat --printf "%d\t%i\t%n\n" /tmp/mnt/boot && stat --printf "%d\t%i\t%n\n" /tmp/mnt/boot/grub2 && stat --printf "%d\t%i\t%n\n" /tmp/mnt/boot 36 2 /tmp/mnt/boot 37 65025 /tmp/mnt/boot/grub2 37 2 /tmp/mnt/boot Reply at: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/97 ** Changed in: coreutils (Fedora) Status: Unknown => In Progress ** Changed in: coreutils (Fedora) Importance: Unknown => Medium ** Bug watch added: Red Hat Bugzilla #472778 https://bugzilla.redhat.com/show_bug.cgi?id=472778 -- You received this bug notification because you are a member of Ubuntu Touch seeded packages, which is subscribed to findutils in Ubuntu. https://bugs.launchpad.net/bugs/506798 Title: du crashes when traversing nfs mounted .snapshot directories Status in coreutils package in Ubuntu: Triaged Status in findutils package in Ubuntu: Triaged Status in linux package in Ubuntu: Confirmed Status in coreutils package in Fedora: In Progress Status in linux package in Fedora: Won't Fix Bug description: Binary package hint: coreutils I'm getting a problem where du errors (and exits) with "du: fts_read failed: no such file or directory" when traversing a directory with a NetApp ".snapshot" directory. My understanding (clarified by the discussions linked bellow) is that: 1) The device ID/inode of a directory is recorded before the submount is made. 2) The device ID of the directory changes after the directory has been read (via readdir which causes the submount) 3) After examining the contents of the directory du goes back up the tree (via '..') finds the device ID doesn't match what it has recorded and assumes things have been moved around under it and bails for safety reasons. I've researched online and this is an upstream bug. We're using Ubuntu 9.10 so I feel there should be a bug in the Ubuntu system. The best information I've found is within Redhat's bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=501848 https://bugzilla.redhat.com/show_bug.cgi?id=533569 This bug has also been discussed on the coreutils mailing list: http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00032.html and LKML: http://lkml.org/lkml/2009/11/4/451 Unfortunately none of these discussions has resulted in a widely accepted solution. We use NetApp .snapshots very extensively and can't afford for du to be unreliable. At the moment we will either have to patch du or downgrade all of coreutils to an older version. For comparison we are upgrading from Ubunto 7.04 which works perfectly. There is a similar problem with find, but it has a --without-fts build option which 'fixes' it. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/+subscriptions -- Mailing list: https://launchpad.net/~touch-packages Post to : touch-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~touch-packages More help : https://help.launchpad.net/ListHelp