Here's various outputs: # grep nfs /etc/mtab: rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0 192.168.0.160:/var/log/dms /mnt/dmslogs nfs rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150 0 0 192.168.0.160:/mnt/storage /mnt/storage nfs rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150 0 0 # grep nfs /proc/mounts: rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 192.168.0.160:/var/log/dms /mnt/dmslogs nfs4 rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160 0 0 192.168.0.160:/mnt/storage /mnt/storage nfs4 rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160 0 0
Also, the output of df -hT | grep nfs: 192.168.0.160:/var/log/dms nfs 273G 5.6G 253G 3% /mnt/dmslogs 192.168.0.160:/mnt/storage nfs 2.8T 1.8T 986G 65% /mnt/storage >From the looks of it, it appears to be nfs version 4 (though I thought that I was running version 3, hrm...). With regards to the ls -lid, one of the directories that wasn't altered, but for whatever reason was not accessible due to the handler is this: # ls -lid /mnt/storage/reports/5306 185862043 drwxrwxrwx 4 1095 users 45056 Jul 15 21:37 /mnt/storage/reports/5306 In the directory where we create new documents, which creates a folder for each document (legacy decision), it looks something like this: # ls -lid /mnt/storage/dms/documents/819/* | head -n 10 290518712 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191174 290518714 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191175 290518716 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191176 290518718 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191177 290518720 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 /mnt/storage/dms/documents/819/8191178 290518722 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40 /mnt/storage/dms/documents/819/8191179 290518724 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40 /mnt/storage/dms/documents/819/8191180 290518726 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:47 /mnt/storage/dms/documents/819/8191181 290518728 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:50 /mnt/storage/dms/documents/819/8191182 290518730 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:52 /mnt/storage/dms/documents/819/8191183 The stale handles seem to appear more when there's load on the system, but that's not overly true. I received notice of two failures (both from the same server) tonight, as seen here: Jul 16 19:27:40 imaging4 php: Output of: ls -l /mnt/storage/dms/documents/819/8191226/ 2>&1: Jul 16 19:27:40 imaging4 php: ls: cannot access /mnt/storage/dms/documents/819/8191226/: Stale NFS file handle Jul 16 19:44:15 imaging4 php: Output of: ls -l /mnt/storage/dms/documents/819/8191228/ 2>&1: Jul 16 19:44:15 imaging4 php: ls: cannot access /mnt/storage/dms/documents/819/8191228/: Stale NFS file handle The above is logged out of my e-mail collecting daemon, which is written in PHP. When I can't access the directory that was just created, it uses syslog() to write the above information out. >From the same server, doing ls -lid I get these for those two directories: 290518819 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:44 /mnt/storage/dms/documents/819/8191228 290518816 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:27 /mnt/storage/dms/documents/819/8191226 Stating the directories showed that the modified times coorespond to the logs above: Modify: 2013-07-16 19:27:40.786142391 -0700 Modify: 2013-07-16 19:44:15.458250738 -0700 By the time it happened, to the time I got back, the stale handle cleared itself. If it's at all relevant, this is the fstab: 192.168.0.160:/var/log/dms /mnt/dmslogs nfs defaults,nodev,nosuid,noexec,noatime 0 0 192.168.0.160:/mnt/storage /mnt/storage nfs defaults,nodev,nosuid,noexec,noatime 0 0 Lastly, in a fit of grasping at straws, I did unmount the ocfs2 partition on the secondary server, and stopped ocfs2 service. I was thinking that maybe having it in master/master mode could cause what I was seeing. Alas, that's not the case as the above errors came after I did that. Is there anything else that I can provide that might be of help? Adam. On Tue, Jul 16, 2013 at 5:15 PM, Patrick J. LoPresti <lopre...@gmail.com>wrote: > What version is the NFS mount? ("cat /proc/mounts" on the NFS client) > > NFSv2 only allowed 64 bits in the file handle. With the > "subtree_check" option on the NFS server, 32 of those bits are used > for the subtree check, leaving only 32 for the inode. (This is from > memory; I may have the exact numbers wrong. But the principle > applies.) > > See < > https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS > > > > If you run "ls -lid <directory>" for directories that work and those > that fail, and you find that the failing directories all have huge > inode numbers, that will help confirm that this is the problem. > > Also if you are using NFSv2 and switch to v3 or set the > "no_subtree_check" option and it fixes the problem, that will also > help confirm that this is the problem. :-) > > - Pat > > > On Tue, Jul 16, 2013 at 5:07 PM, Adam Randall <randa...@gmail.com> wrote: > > Please forgive my lack of experience, but I've just recently started > deeply > > working with ocfs2 and am not familiar with all it's caveats. > > > > We've just deployed two servers that have SAN arrays attached to them. > These > > arrays are synchronized with DRBD in master/master mode, with ocfs2 > > configured on top of that. In all my testing everything worked well, > except > > for an issue with symbolic links throwing an exception in the kernel (ths > > was fixed by applying a patch I found here: > > comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008). Of these > > machines, one of them is designated the master and the other is it's > backup. > > > > Host is Gentoo linux running the 3.8.13. > > > > I have four other machines that are connecting to the master ocfs2 > partition > > using nfs. The problem I'm having is that on these machines, I'm randomly > > getting read errors while trying to enter directories over nfs. In all of > > these cases, except on, these directories are immediately unavailable > after > > they are created. The error that comes back is always something like > this: > > > > ls: cannot access /mnt/storage/documents/818/8189794/: Stale NFS file > handle > > > > The mount point is /mnt/storage. Other directories on the mount are > > available, and on other servers the same directory can be accessed > perfectly > > fine. > > > > I haven't been able to reproduce this issue in isolated testing. > > > > The four machines that connect via NFS are doing one of two things: > > > > 1) processing e-mail through a php driven daemon (read and write, > creating > > directories) > > 2) serving report files in PDF format over the web via a php web > application > > (read only) > > > > I believe that the ocfs2 version if 1.5. I found this in the kernel > source > > itself, but haven't figured out how to determine this in the shell. > > ocfs2-tools is version 1.8.2, which is what ocfs2 wanted (maybe this is > > ocfs2 1.8 then?). > > > > The only other path I can think to take is to abandon OCFS2 and use DRBD > in > > master/slave mode with ext4 on top of that. This would still provide me > with > > the redundancy I want, but at a lack of not being able to use both > machines > > simultaneously. > > > > If anyone has any advice, I'd love to hear it. > > > > Thanks in advance, > > > > Adam. > > > > > > -- > > Adam Randall > > http://www.xaren.net > > AIM: blitz574 > > Twitter: @randalla0622 > > > > "To err is human... to really foul up requires the root password." > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users@oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > -- Adam Randall http://www.xaren.net AIM: blitz574 Twitter: @randalla0622 "To err is human... to really foul up requires the root password."
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users