The issue also occurs when using the lookupcache=none option along with the 5.10.X kernel. I was hoping for this option to succeed and to investigate the performance impact, but it is no longer viable. I believe that I am out of options to try with the 5.10.X kernel. Please let me know where we stand.
> -----Original Message----- > From: Jason Breitman > Sent: Wednesday, September 21, 2022 1:01 PM > To: Ben Hutchings <b...@decadent.org.uk>; 1017...@bugs.debian.org > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > I now know that this behavior does exist in Debian Buster 10.8 and more > specifically in the 4.19.X kernel after running stricter testing on more > servers. > The 4.19.X kernel resolves itself immediately following the No such file or > directory error which is different than the 5.X kernel requiring me to clear > the > inode and dentry cache by running echo 2 > /proc/sys/vm/drop_caches. > What further information is required to resolve this issue? > > > -----Original Message----- > > From: Jason Breitman > > Sent: Tuesday, September 13, 2022 4:41 PM > > To: Ben Hutchings <b...@decadent.org.uk>; 1017...@bugs.debian.org > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > I downgraded the nfs-common package which required the downgrade of > > the libevent packages and am using the 4.19.X kernel. > > I see the issue running the initial test, but then the issue is gone when > > running the test a subsequent time. > > > > libevent-2.1-6:amd64 2.1.8-stable-4 > > amd64 > > Asynchronous event notification library > > libevent-core-2.1-6:amd64 2.1.8-stable-4 > > amd64 > > Asynchronous event notification library (core) > > libevent-pthreads-2.1-6:amd64 2.1.8-stable-4 > > amd64 > > Asynchronous event notification library (pthreads) > > linux-image-4.19.0-21-amd64 4.19.249-2 > > amd64 Linux > > 4.19 for 64-bit PCs (signed) > > nfs-common 1:1.3.4-2.5+deb10u1 > > amd64 NFS > > support files common to client and server > > > > What other packages do I need to downgrade in order to get Debian 11.4 to > > behave like Debian 10.8? > > What additional questions can I answer so that we can move forward? > > > > > -----Original Message----- > > > From: Jason Breitman > > > Sent: Tuesday, September 6, 2022 5:18 PM > > > To: Ben Hutchings <b...@decadent.org.uk>; 1017...@bugs.debian.org > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > I also see the failure with the kernels below, but the 4.19.X kernel > resolves > > > the issue without dropping caches. > > > linux-image-4.19.0-14-amd64 4.19.171-2 amd64 > > > Linux 4.19 > > for > > > 64-bit PCs (signed) > > > linux-image-4.19.0-21-amd64 4.19.249-2 amd64 > > > Linux 4.19 > > for > > > 64-bit PCs (signed) > > > > > > I see the issue running the initial test, but then the issue is gone when > > > running the test a subsequent time. > > > I ran several tests to verify the behavior differences between the 4.19.X > > and > > > 5.X kernels. > > > > > > -- Test > > > ls -l /mnt/dir/someOtherDir/* | grep '?' > > > > > > -- Error message - the error message is showing files that have been > erased > > > via rsync --delete > > > ls: cannot access 'filename': No such file or directory > > > -????????? ? ? ? ? ? filename > > > > > > > -----Original Message----- > > > > From: Jason Breitman > > > > Sent: Friday, September 2, 2022 5:17 PM > > > > To: Ben Hutchings <b...@decadent.org.uk>; 1017...@bugs.debian.org > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > > > I have tested with the following kernels and see this issue in each > > > > case. > > > > > > > > linux-image-5.10.0-16-amd64 5.10.127-1 > > > > amd64 > > > Linux > > > > 5.10 for 64-bit PCs (signed) > > > > linux-image-5.15.0-0.bpo.3-amd64 5.15.15-2~bpo11+1 > > > > amd64 > > > > Linux 5.15 for 64-bit PCs (signed) > > > > linux-image-5.18.0-0.deb11.3-amd64 5.18.14-1~bpo11+1 amd64 > > > > Linux 5.18 for 64-bit PCs (signed) > > > > > > > > An interesting note is that when using the 5.18 kernel, I had to run > > > > echo > 3 > > > > > > > /proc/sys/vm/drop_caches to resolve the issue. > > > > echo 2 > /proc/sys/vm/drop_caches did not work as it did on the 5.10 > and > > > > 5.15 kernels. > > > > > > > > > -----Original Message----- > > > > > From: Jason Breitman > > > > > Sent: Friday, August 26, 2022 3:36 PM > > > > > To: 'Ben Hutchings' <b...@decadent.org.uk>; > > '1017...@bugs.debian.org' > > > > > <1017...@bugs.debian.org> > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > > > > > I was able to identify another workaround today which may help you > to > > > > > identify the issue. > > > > > The workaround is to touch the directory where the troubled files live > > on > > > > the > > > > > file server. > > > > > I believe this tells us that updating the modify time attribute is > > > > > used > by > > > the > > > > > cache. > > > > > It should be noted that access time updates are disabled on the file > > > server. > > > > > > > > > > I also wanted to restate that we use rsync to push out these > application > > > > > updates and also use rsync to sync data files. > > > > > Our rsync options preserve timestamps, so it is possible that the new > > files > > > > > have an older timestamp than "now". > > > > > It is not the case that the new files have an older timestamp than the > > > prior > > > > > version that is stuck in the cache. > > > > > > > > > > The rsync process that I describe has not changed and has been in use > > for > > > > > many years. > > > > > > > > > > > -----Original Message----- > > > > > > From: Jason Breitman > > > > > > Sent: Thursday, August 25, 2022 11:54 AM > > > > > > To: Ben Hutchings <b...@decadent.org.uk>; > > 1017...@bugs.debian.org > > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > > > > > > > I have the same issue after adding actimeo=30 to /etc/fstab, > > rebooting > > > > and > > > > > > testing. > > > > > > I also confirmed that those settings applied via /proc/mounts which > > > > shows > > > > > > the below snippet for each mountpoint. > > > > > > nfs4 > > > > > > > > > > > > > > > > > > > > > rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,acregmin=30,a > > > > > > > > > > > > > > > > > > > > > cregmax=30,acdirmax=30,hard,noresvport,proto=tcp,timeo=600,retrans=2,s > > > > > > > > > > > > > > > > > > > > > ec=krb5,clientaddr=X.X.X.X,lookupcache=pos,local_lock=none,addr=Y.Y.Y.Y 0 > > > > > > 0 > > > > > > > > > > > > > -----Original Message----- > > > > > > > From: Jason Breitman > > > > > > > Sent: Tuesday, August 23, 2022 2:42 PM > > > > > > > To: Ben Hutchings <b...@decadent.org.uk>; > > > 1017...@bugs.debian.org > > > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > > > > > > > > > What additional information can I provide for us to move forward > > with > > > > > this > > > > > > > process? > > > > > > > > > > > > > > To summarize and include further details, rsync is used to sync > > > > > applications > > > > > > to > > > > > > > a file server which behaves like a repository. > > > > > > > We do preserve timestamps from the build server and also use -- > > > > delete. > > > > > > We > > > > > > > do not run the applications from the file server. All servers use > NTP. > > > > > > > > > > > > > > The application has a sub-directory that contain files with > > > > > > > version > > > > > numbers. > > > > > > > These are libraries. > > > > > > > When a new build is complete, a developer pushes their updates > via > > > > > rsync > > > > > > to > > > > > > > the file server / repository. > > > > > > > > > > > > > > I believe that the dentry cache thinks the "old" files exist and > > > generates > > > > a > > > > > > No > > > > > > > such file or directory error showing question marks for that files > > > > > attributes. > > > > > > > Dropping the dentry cache via echo 2 > > /proc/sys/vm/drop_caches > > > > > > resolves > > > > > > > the issue. > > > > > > > > > > > > > > This behavior is not observed in Debian 10.8 with that > > > > > > > distributions > > > > > > associated > > > > > > > kernel and packages. > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: Jason Breitman > > > > > > > > Sent: Friday, August 19, 2022 9:52 PM > > > > > > > > To: Ben Hutchings <b...@decadent.org.uk>; > > > > 1017...@bugs.debian.org > > > > > > > > Subject: RE: Bug#1017720: nfs-common: No such file or directory > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: Ben Hutchings <b...@decadent.org.uk> > > > > > > > > > Sent: Friday, August 19, 2022 7:27 PM > > > > > > > > > To: Jason Breitman <jbreit...@tildenparkcapital.com>; > > > > > > > > > 1017...@bugs.debian.org > > > > > > > > > Subject: Re: Bug#1017720: nfs-common: No such file or > directory > > > > > > > > > > > > > > > > > > Control: tag -1 moreinfo > > > > > > > > > > > > > > > > > > On Fri, 2022-08-19 at 13:16 +0000, Jason Breitman wrote: > > > > > > > > > > Package: nfs-common > > > > > > > > > > Version: 1:1.3.4-6 > > > > > > > > > > Severity: important > > > > > > > > > > > > > > > > > > > > Kernel: 5.10.0-16-amd64 #1 SMP Debian 5.10.127-1 (2022-06- > > 30) > > > > > > x86_64 > > > > > > > > > > GNU/Linux > > > > > > > > > > > > > > > > > > > > -- Description > > > > > > > > > > After updating and or creating new files on our file > > > > > > > > > > server > via > > > > > > > > > > rsync, we see many files report the error message below > from > > > > > NFSv4 > > > > > > > > > > clients since upgrading from Debian 10.8 to Debian 11.4. > > > > > > > > > > Clearing the dentry cache resolves the issue right away. > > > > > > > > > > I am not sure that nfs-common is the package to blame, > but > > > > listed > > > > > > > > > > it based on the bug submission recommendations. > > > > > > > > > > > > > > > > > > The NFS implementation is mostly in the kernel, so probably > this > > > > issue > > > > > > > > > belongs there. But the kernel team is responsible for both > > > > packages. > > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > -- Error message > > > > > > > > > > ls: cannot access 'filename': No such file or directory > > > > > > > > > > -????????? ? ? ? ? ? filename > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > So we know the file's there but can't stat it. I think this > > > > > > > > > means > > the > > > > > > > > > client has cached the handle of the old file of that name, > > > > > > > > > which > > > has > > > > > > > > > been deleted. > > > > > > > > > > > > > > > > > > - Are client and server clocks closely synchronised? If not, > > > > > > > > > that > > > > > > > > > needs to be fixed. > > > > > > > > > > > > > > > > > The clocks are synchronized using NTP. > > > > > > > > > > > > > > > > > - Are clients likely to read this directory while rsync is > > > > > > > > > running, > or > > > > > > > > > shortly before? If so, it may help to reduce the attribute > caching > > > > > > > > > timeout on the client. See the "Directory entry caching" > section > > in > > > > > > > > > the nfs(5) manual page. > > > > > > > > > > > > > > > > > Clients are not likely to read this directory while rsync is > > > > > > > > running > > for > > > > the > > > > > > > > observed cases. That can happen in our environment, but not in > > > this > > > > > > case. > > > > > > > > I am using the lookupcache=pos option. I tried noac, but the > > > > > > performance > > > > > > > > penalty was too much. Which option are you referring to and > > what > > > > > > setting > > > > > > > > do you recommend testing? > > > > > > > > > > > > > > > > > I don't know why you're only seeing this after an upgrade of > the > > > > > > > > > clients, though. I'm not aware that there has been any big > > change > > > > to > > > > > > > > > attribute caching. > > > > > > > > > > > > > > > > > I appreciate you responding to my report and am happy to > answer > > > > any > > > > > > > > questions. > > > > > > > > We have multiple monitors and log scrapers to detect "file not > > > found" > > > > > > > > exceptions that would let us know if this was happening before. > > > > > > > > To share more, I have 2 environments mounting from the same > > file > > > > > > server. > > > > > > > > Each environment has several servers. The issue is only seen in > > the > > > > > > > > environment running Debian 11.4. > > > > > > > > I also should have mentioned that the files in question have a > > > version > > > > > > > > number appended. filename-1111. When the file is updated via > > > > rsync, > > > > > it > > > > > > is > > > > > > > > called filename-1112 and the prior file is removed. The error > > > > > > > > is > > > about > > > > > > > > filename-1111. > > > > > > > > I am not sure if this is the proper terminology, but the issue > > appears > > > > to > > > > > be > > > > > > > > the negative dentry cache. > > > > > > > > > > > > > > > > > Ben. > > > > > > > > > > > > > > > > > > -- > > > > > > > > > Ben Hutchings > > > > > > > > > Beware of bugs in the above code; > > > > > > > > > I have only proved it correct, not tried it. - Donald Knuth > > > > > > > > > > > > > > > > Jason Breitman > > > > > > > Jason Breitman > > > > > > Jason Breitman > > > > > Jason Breitman > > > > Jason Breitman > > > Jason Breitman > > Jason Breitman > Jason Breitman Jason Breitman