The problem I have with NFSv3 is that it's difficult to make it work with iptables. I'll give it a go, however, and see how it affects things.
Also, should I instead be considering iSCSI instead of NFS? Adam. On Wed, Jul 17, 2013 at 7:51 AM, Patrick J. LoPresti <p...@patl.com> wrote: > I would seriously try "nfsvers=3" in those mount options. > > In my experience, Linux NFS features take around 10 years before the > bugs are shaken out. And NFSv4 is much, much more complicated than > most. (They added a "generation number" to the file handle, but if the > underlying file system does not implement generation numbers, I have > no idea what will happen...) > > - Pat > > On Wed, Jul 17, 2013 at 7:47 AM, Adam Randall <randa...@gmail.com> wrote: > > My changes to exports had no effect it seems. I awoke to four errors > from my > > processing engine. All of them came from the same server, which makes me > > curious. I've turned that one off and will see what happens. > > > > > > On Tue, Jul 16, 2013 at 11:22 PM, Adam Randall <randa...@gmail.com> > wrote: > >> > >> I've been doing more digging, and I've changed some of the > configuration: > >> > >> 1) I've changed my nfs mount options to this: > >> > >> 192.168.0.160:/mnt/storage /mnt/i2xstorage nfs > >> defaults,nosuid,noexec,noatime,nodiratime 0 0 > >> > >> 2) I've changed the /etc/exports for /mnt/storage to this: > >> > >> /mnt/storage -rw,sync,subtree_check,no_root_squash @trusted > >> > >> In #1, I've removed nodev, which I think I accidentally copied over > from a > >> tmpfs mount point above it when I originally set up the nfs mount point > so > >> long ago. Additionally, I added nodiratime. In #2, it used to be > >> -rw,async,no_subtree_check,no_root_squash. I think the async may be > causing > >> what I'm seeing potentially, and the subtree_check should be okay for > >> testing. > >> > >> Hopefully, this will have an effect. > >> > >> Adam. > >> > >> > >> On Tue, Jul 16, 2013 at 9:44 PM, Adam Randall <randa...@gmail.com> > wrote: > >>> > >>> Here's various outputs: > >>> > >>> # grep nfs /etc/mtab: > >>> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw 0 0 > >>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs > >>> > >>> > rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150 > >>> 0 0 > >>> 192.168.0.160:/mnt/storage /mnt/storage nfs > >>> > >>> > rw,noexec,nosuid,nodev,noatime,vers=4,addr=192.168.0.160,clientaddr=192.168.0.150 > >>> 0 0 > >>> # grep nfs /proc/mounts: > >>> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 > >>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs4 > >>> > >>> > rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160 > >>> 0 0 > >>> 192.168.0.160:/mnt/storage /mnt/storage nfs4 > >>> > >>> > rw,nosuid,nodev,noexec,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.0.150,local_lock=none,addr=192.168.0.160 > >>> 0 0 > >>> > >>> Also, the output of df -hT | grep nfs: > >>> 192.168.0.160:/var/log/dms nfs 273G 5.6G 253G 3% > /mnt/dmslogs > >>> 192.168.0.160:/mnt/storage nfs 2.8T 1.8T 986G 65% > /mnt/storage > >>> > >>> >From the looks of it, it appears to be nfs version 4 (though I thought > >>> that > >>> I was running version 3, hrm...). > >>> > >>> With regards to the ls -lid, one of the directories that wasn't > altered, > >>> but for whatever reason was not accessible due to the handler is this: > >>> > >>> # ls -lid /mnt/storage/reports/5306 > >>> 185862043 drwxrwxrwx 4 1095 users 45056 Jul 15 21:37 > >>> /mnt/storage/reports/5306 > >>> > >>> In the directory where we create new documents, which creates a folder > >>> for each document (legacy decision), it looks something like this: > >>> > >>> # ls -lid /mnt/storage/dms/documents/819/* | head -n 10 > >>> 290518712 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 > >>> /mnt/storage/dms/documents/819/8191174 > >>> 290518714 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 > >>> /mnt/storage/dms/documents/819/8191175 > >>> 290518716 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 > >>> /mnt/storage/dms/documents/819/8191176 > >>> 290518718 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 > >>> /mnt/storage/dms/documents/819/8191177 > >>> 290518720 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:39 > >>> /mnt/storage/dms/documents/819/8191178 > >>> 290518722 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40 > >>> /mnt/storage/dms/documents/819/8191179 > >>> 290518724 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:40 > >>> /mnt/storage/dms/documents/819/8191180 > >>> 290518726 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:47 > >>> /mnt/storage/dms/documents/819/8191181 > >>> 290518728 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:50 > >>> /mnt/storage/dms/documents/819/8191182 > >>> 290518730 drwxrwxrwx 2 nobody nobody 3896 Jul 16 18:52 > >>> /mnt/storage/dms/documents/ > >>> 819/8191183 > >>> > >>> The stale handles seem to appear more when there's load on the system, > >>> but that's not overly true. I received notice of two failures (both > from the > >>> same server) tonight, as seen here: > >>> > >>> Jul 16 19:27:40 imaging4 php: Output of: ls -l > >>> /mnt/storage/dms/documents/819/8191226/ 2>&1: > >>> Jul 16 19:27:40 imaging4 php: ls: cannot access > >>> /mnt/storage/dms/documents/819/8191226/: Stale NFS file handle > >>> Jul 16 19:44:15 imaging4 php: Output of: ls -l > >>> /mnt/storage/dms/documents/819/8191228/ 2>&1: > >>> Jul 16 19:44:15 imaging4 php: ls: cannot access > >>> /mnt/storage/dms/documents/819/8191228/: Stale NFS file handle > >>> > >>> The above is logged out of my e-mail collecting daemon, which is > written > >>> in PHP. When I can't access the directory that was just created, it > uses > >>> syslog() to write the above information out. > >>> > >>> >From the same server, doing ls -lid I get these for those two > >>> directories: > >>> > >>> 290518819 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:44 > >>> /mnt/storage/dms/documents/819/8191228 > >>> 290518816 drwxrwxrwx 2 nobody nobody 3896 Jul 16 19:27 > >>> /mnt/storage/dms/documents/819/8191226 > >>> > >>> Stating the directories showed that the modified times coorespond to > the > >>> logs above: > >>> > >>> Modify: 2013-07-16 19:27:40.786142391 -0700 > >>> Modify: 2013-07-16 19:44:15.458250738 -0700 > >>> > >>> By the time it happened, to the time I got back, the stale handle > cleared > >>> itself. > >>> > >>> If it's at all relevant, this is the fstab: > >>> > >>> 192.168.0.160:/var/log/dms /mnt/dmslogs nfs > >>> defaults,nodev,nosuid,noexec,noatime 0 0 > >>> 192.168.0.160:/mnt/storage /mnt/storage nfs > >>> defaults,nodev,nosuid,noexec,noatime 0 0 > >>> > >>> Lastly, in a fit of grasping at straws, I did unmount the ocfs2 > partition > >>> on the secondary server, and stopped ocfs2 service. I was thinking that > >>> maybe having it in master/master mode could cause what I was seeing. > Alas, > >>> that's not the case as the above errors came after I did that. > >>> > >>> Is there anything else that I can provide that might be of help? > >>> > >>> Adam. > >>> > >>> > >>> > >>> On Tue, Jul 16, 2013 at 5:15 PM, Patrick J. LoPresti < > lopre...@gmail.com> > >>> wrote: > >>>> > >>>> What version is the NFS mount? ("cat /proc/mounts" on the NFS client) > >>>> > >>>> NFSv2 only allowed 64 bits in the file handle. With the > >>>> "subtree_check" option on the NFS server, 32 of those bits are used > >>>> for the subtree check, leaving only 32 for the inode. (This is from > >>>> memory; I may have the exact numbers wrong. But the principle > >>>> applies.) > >>>> > >>>> See > >>>> < > https://oss.oracle.com/projects/ocfs2/dist/documentation/v1.2/ocfs2_faq.html#NFS > > > >>>> > >>>> If you run "ls -lid <directory>" for directories that work and those > >>>> that fail, and you find that the failing directories all have huge > >>>> inode numbers, that will help confirm that this is the problem. > >>>> > >>>> Also if you are using NFSv2 and switch to v3 or set the > >>>> "no_subtree_check" option and it fixes the problem, that will also > >>>> help confirm that this is the problem. :-) > >>>> > >>>> - Pat > >>>> > >>>> > >>>> On Tue, Jul 16, 2013 at 5:07 PM, Adam Randall <randa...@gmail.com> > >>>> wrote: > >>>> > Please forgive my lack of experience, but I've just recently started > >>>> > deeply > >>>> > working with ocfs2 and am not familiar with all it's caveats. > >>>> > > >>>> > We've just deployed two servers that have SAN arrays attached to > them. > >>>> > These > >>>> > arrays are synchronized with DRBD in master/master mode, with ocfs2 > >>>> > configured on top of that. In all my testing everything worked well, > >>>> > except > >>>> > for an issue with symbolic links throwing an exception in the kernel > >>>> > (ths > >>>> > was fixed by applying a patch I found here: > >>>> > comments.gmane.org/gmane.comp.file-systems.ocfs2.devel/8008). Of > these > >>>> > machines, one of them is designated the master and the other is it's > >>>> > backup. > >>>> > > >>>> > Host is Gentoo linux running the 3.8.13. > >>>> > > >>>> > I have four other machines that are connecting to the master ocfs2 > >>>> > partition > >>>> > using nfs. The problem I'm having is that on these machines, I'm > >>>> > randomly > >>>> > getting read errors while trying to enter directories over nfs. In > all > >>>> > of > >>>> > these cases, except on, these directories are immediately > unavailable > >>>> > after > >>>> > they are created. The error that comes back is always something like > >>>> > this: > >>>> > > >>>> > ls: cannot access /mnt/storage/documents/818/8189794/: Stale NFS > file > >>>> > handle > >>>> > > >>>> > The mount point is /mnt/storage. Other directories on the mount are > >>>> > available, and on other servers the same directory can be accessed > >>>> > perfectly > >>>> > fine. > >>>> > > >>>> > I haven't been able to reproduce this issue in isolated testing. > >>>> > > >>>> > The four machines that connect via NFS are doing one of two things: > >>>> > > >>>> > 1) processing e-mail through a php driven daemon (read and write, > >>>> > creating > >>>> > directories) > >>>> > 2) serving report files in PDF format over the web via a php web > >>>> > application > >>>> > (read only) > >>>> > > >>>> > I believe that the ocfs2 version if 1.5. I found this in the kernel > >>>> > source > >>>> > itself, but haven't figured out how to determine this in the shell. > >>>> > ocfs2-tools is version 1.8.2, which is what ocfs2 wanted (maybe this > >>>> > is > >>>> > ocfs2 1.8 then?). > >>>> > > >>>> > The only other path I can think to take is to abandon OCFS2 and use > >>>> > DRBD in > >>>> > master/slave mode with ext4 on top of that. This would still provide > >>>> > me with > >>>> > the redundancy I want, but at a lack of not being able to use both > >>>> > machines > >>>> > simultaneously. > >>>> > > >>>> > If anyone has any advice, I'd love to hear it. > >>>> > > >>>> > Thanks in advance, > >>>> > > >>>> > Adam. > >>>> > > >>>> > > >>>> > -- > >>>> > Adam Randall > >>>> > http://www.xaren.net > >>>> > AIM: blitz574 > >>>> > Twitter: @randalla0622 > >>>> > > >>>> > "To err is human... to really foul up requires the root password." > >>>> > > >>>> > _______________________________________________ > >>>> > Ocfs2-users mailing list > >>>> > Ocfs2-users@oss.oracle.com > >>>> > https://oss.oracle.com/mailman/listinfo/ocfs2-users > >>> > >>> > >>> > >>> > >>> -- > >>> Adam Randall > >>> http://www.xaren.net > >>> AIM: blitz574 > >>> Twitter: @randalla0622 > >>> > >>> "To err is human... to really foul up requires the root password." > >> > >> > >> > >> > >> -- > >> Adam Randall > >> http://www.xaren.net > >> AIM: blitz574 > >> Twitter: @randalla0622 > >> > >> "To err is human... to really foul up requires the root password." > > > > > > > > > > -- > > Adam Randall > > http://www.xaren.net > > AIM: blitz574 > > Twitter: @randalla0622 > > > > "To err is human... to really foul up requires the root password." > > > > _______________________________________________ > > Ocfs2-users mailing list > > Ocfs2-users@oss.oracle.com > > https://oss.oracle.com/mailman/listinfo/ocfs2-users > -- Adam Randall http://www.xaren.net AIM: blitz574 Twitter: @randalla0622 "To err is human... to really foul up requires the root password."
_______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-users