https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=249871
Bug ID: 249871 Summary: NFSv4 faulty directory listings under heavy load Product: Base System Version: 12.1-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Some People Priority: --- Component: kern Assignee: b...@freebsd.org Reporter: j...@freebsd.org I think I've discovered a peculiar bug in NFSv4. When the server is under heavy load, directory listings sometimes show duplicate filenames and other times omit filenames. This was discovered when running parallel jobs on a small HPC cluster, each running xzcat on an NFS-served file, dumping the uncompressed output to a local disk on the client, followed by some brief heavy computation and writing several small output files to the NFS server. As shown below, there are 11,031 files processed. Parallel jobs were capped between 50 to 150 at a time, with the problem occurring with any cap. All files list-*.txt shown below were produced by ls | grep 'combined.*-ad\.vcf\.xz' or find . -maxdepth 1 'combined.*-ad.vcf.xz' The file list-1.txt contains the correct directory listing. list-100.txt, however, contains duplicate filenames, and list-1000.txt has both duplicate and missing filenames. # sort list-1.txt | uniq -d # sort list-100.txt | uniq -d combined.NWD297242-ad.vcf.xz combined.NWD745320-ad.vcf.xz combined.NWD787696-ad.vcf.xz # wc -l list-1.txt list-100.txt list-1000.txt 11031 list-1.txt 11034 list-100.txt 11027 list-1000.txt 33092 total # diff list-1.txt list-100.txt 2404a2405 > combined.NWD297242-ad.vcf.xz 7856a7858 > combined.NWD745320-ad.vcf.xz 8391a8394 > combined.NWD787696-ad.vcf.xz # diff list-1.txt list-1000.txt 153a154 > combined.NWD111306-ad.vcf.xz 170d170 < combined.NWD113182-ad.vcf.xz 512d511 [snip] If I revert the mounts to NFSv3, the problem goes away (but performance suffers). There are no apparent problems delivering file content, just directory listings. Using this fact, I can work around the problem by writing the directory listing to a file beforehand, when the server is not under load: ls | grep 'combined.*-ad\.vcf\.xz' > VCF-list.txt Reading this file under heavy load does not pose any problems. It's only if I do a new directory listing with "ls" or "find". The problem is consistently reproducible under heavy load and does not occur under light load. /etc/exports: V4: / /etc/zfs/exports: # !!! DO NOT EDIT THIS FILE MANUALLY !!! /pxeserver/images -alldirs -ro -network 192.168.0.0 -mask 255.255.128.0 /raid-00 -maproot=root -network 192.168.0.0 -mask 255.255.128.0 /sharedapps -maproot=root -network 192.168.0.0 -mask 255.255.128.0 /usr/home -maproot=root -network 192.168.0.0 -mask 255.255.128.0 /var/cache/pkg -maproot=root -network 192.168.0.0 -mask 255.255.128.0 /etc/fstab on the clients: login:/usr/home /usr/home nfs rw,bg,intr,noatime 0 0 login:/raid-00 /raid-00 nfs rw,bg,intr,noatime 0 0 login:/sharedapps /sharedapps nfs rw,bg,intr,noatime 0 0 login:/var/cache/pkg /var/cache/pkg nfs rw,bg,intr,noatime 0 0 -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"