Summary: interaction between autounmountd and cpdup's mount-point-traversal detection truncates tree copies early without error.
I'm running 14-stable and am seeing this both on: - 14.0-STABLE built from sources of 27 Mar 2024 and also on - 14.2-STABLE built from sources of 3 Apr 2025. There doesn't seem to be anything specific to 14-stable so I'll bet this issue also manifests on earlier versions of FreeBSD. I think I understand what's happening (details below), but I'm not sure about the right way to fix it. Scenario A large file tree (in my case, the FreeBSD source tree) is published on an NFS server. A FreeBSD NFS client automounts a volume containing this large file tree. cpdup attempts to copy the file tree to another location (in my case, that happens to be another NFS filesystem, but I don't think it matters). cpdup completes without error, however, the destination directory is incomplete, with many empty directories. Analysis cpdup examines the device ID (st_dev) returned by stat(2) as it traverses the source and destination trees copying directories and files. When it finds an st_dev value different from the initial value at the top of the respective tree, it concludes that it has crossed a mount point and prunes the copy at that point. I instrumented cpdup with some additional logging to examine its notion of the src and dst st_dev values and found that, in my test case, in the middle of its tree copy, cpdup started getting unexpected new values of st_dev for the src tree and skipping all directories after that. --- src/cpdup.c.orig 2025-04-04 15:04:44.623646000 -0700 +++ src/cpdup.c 2025-04-05 15:10:52.779426000 -0700 @@ -947,10 +947,15 @@ * When copying a directory, stop if the source crosses a mount * point. */ - if (sdevNo != (dev_t)-1 && stat1->st_dev != sdevNo) + if (VerboseOpt >= 2) + logstd("sdevNo: %ld, stat1->st_dev: %ld\n", sdevNo, stat1->st_dev); + if (sdevNo != (dev_t)-1 && stat1->st_dev != sdevNo) { + if (VerboseOpt >= 2) + logstd("setting skipdir due to sdevNo != stat1->st_dev\n"); skipdir = 1; - else + } else { sdevNo = stat1->st_dev; + } I eventually looked at the automounter and added some logging via devd.conf: notify 10 { match "system" "VFS"; match "subsystem" "FS"; action "logger VFS FS msg=$*"; }; And saw the following in /var/log/messages: Apr 6 10:39:31 f14s-240327-portbuilder me[58694]: VFS FS msg=!system=VFS subsystem=FS type=MOUNT mount-point="/s/public" mount-dev="hairball:/v2/Source/public" mount-type="nfs" fsid=0x94ff003a3a000000 owner=0 flags="automounted;" Apr 6 10:49:54 f14s-240327-portbuilder me[58761]: VFS FS msg=!system=VFS subsystem=FS type=UNMOUNT mount-point="/s/public" mount-dev="hairball:/v2/Source/public" mount-type="nfs" fsid=0x94ff003a3a000000 owner=0 flags="automounted;" Apr 6 10:49:54 f14s-240327-portbuilder me[58770]: VFS FS msg=!system=VFS subsystem=FS type=MOUNT mount-point="/s/public" mount-dev="hairball:/v2/Source/public" mount-type="nfs" fsid=0x95ff003a3a000000 owner=0 flags="automounted;" (By the way, st_dev reported by my new cpdup log messages was a rearranged version of "fsid" in the devd messages) Note that after ten minutes, the NFS filesystem is unmounted and then immediately remounted. The source code of /usr/sbin/autounmountd indicates that it attempts to unmount automounted filesystems ten minutes after they have been mounted (modulo some sleep-related jitter). The immediately following mount (presumably triggered by the next filesystem access by cpdup) results in a new value of fsid, thus changing what cpdup sees as st_dev, causing it to treat all following directory descents as mount-point crossings. Possible Mitigations 1. It might be possible to prevent unmounting by causing cpdup to chdir to the top of the source directory. However, it seems to perform similar st_dev checks on the destination directory and therefore a similar issue would arise with the dst tree. 2. Reusing the old fsid in the new mount? I'm guessing there were good reasons for assigning a new fsid, so it's probably a bad idea. 3. cpdup could call stat() on the top of the tree each time it made a comparison. There might still be a race and the comparison might fail if the automatic unmount occurred between the two stat() calls. Although THAT could be worked around by retrying the two stats + comparison once after each failure. Other ideas? -- G. Paul Ziemba FreeBSD unix: 11:51AM up 18 days, 2:22, 43 users, load averages: 0.42, 0.31, 0.26