> On 21 Dec 2019, at 19:32, Rick Macklem <rmack...@uoguelph.ca> wrote: > > Daniel Braniss wrote: >>> On 20 Dec 2019, at 19:19, Rick Macklem >>> >><rmack...@uoguelph.ca<mailto:rmack...@uoguelph.ca>> wrote: >>> >>> Adam McDougall wrote: >>>> Try changing bool_t do_tcp = FALSE; to TRUE in >>>> /usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. I >>>> think this makes it match Linux client behavior. I suspect I ran into >>>> the same issue as you. I do think I used nolockd is a workaround >>>> temporarily. I can provide some more details if it works. >>> If this fixes the problem, please let me know. >>> >>> I'm not sure I'd want to change the default, since it might break things for >>> others, but I can definitely make it a tunable, so that people don't need to >>> recompile a kernel to deal with it. >>> >>> >> great! I was just about to see how it can be done(tunable) but need to check >> if it can >be done >> at any time, or just at boot time. > I haven't looked at the code, but I suspect changing it on the fly could > cause problems, > so I am inclined to make it a tunable (boot time only). my feelings too. > >> thanks. >> btw, currently, from several hours of analysing the traffic, it seems that >> nlm is UDP. > I assume that means you haven't tried flipping it to TCP yet. I will soon, but I have my doubts, the problem is caused my multiple events, i.e, it happened once while I was doing svn checkout, but i have done it several times since, and no issues. So it must be an aggregation of factors. Other hosts are reporting locks times too.
danny > > Please let us know how it goes, rick > > danny > > > rick > > On 12/19/19 9:21 AM, Daniel Braniss wrote: > > > On 19 Dec 2019, at 16:09, Rick Macklem > <rmack...@uoguelph.ca<mailto:rmack...@uoguelph.ca>> wrote: > > Daniel Braniss wrote: > [stuff snipped] > all mounts are nfsv3/tcp > This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't know > when > the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times. > can the replay cache have any influence here? I tend to remember way back > issues > with it, > > To me, it looks like a network configuration issue. > that was/is my gut feelings too, but, as far as we can tell, nothing has > changed in the network infrastructure, > the problems appeared after the NetAPP’s software was updated, it was working > fine till then. > > the problems are also happening on freebsd 12.1 > > You could capture packets (maybe when a client first starts rpc.statd and > rpc.lockd) > and then look at them in wireshark. I'd disable statup of rpc.lockd and > rpc.statd > at boot for a test client and then run something like: > # tcpdump -s 0 -s out.pcap host <netapp-host> > - and then start rpc.statd and rpc.lockd > Then I'd look at out.pcap in wireshark (much better at decoding this stuff > than > tcpdump). I'd look for things like different reply IP addresses from the > Netapp, > which might confuse this tired old NLM protocol Sun devised in the mid-1980s. > > it’s going to be an interesting week end :-( > > the error is also appearing on freebsd-11.2-stable, I’m now checking if it’s > also > happening on 12.1 > btw, the NetApp version is 9.3P17 > Yes. I wasn't the author of the NSM and NLM code (long ago I refused to even > try to implement it, because I knew the protocol was badly broken) and I avoid > fiddling with. As such, it won't have change much since around FreeBSD7. > and we haven’t had any issues with it for years, so you must have done > something good > > cheers, > danny > > > rick > > cheers, > danny > > rick > > Cheers > > Richard > (NetApp admin) > > On Wed, 18 Dec 2019 at 15:46, Daniel Braniss > <da...@cs.huji.ac.il<mailto:da...@cs.huji.ac.il><mailto:da...@cs.huji.ac.il>> > wrote: > > > On 18 Dec 2019, at 16:55, Rick Macklem > <rmack...@uoguelph.ca<mailto:rmack...@uoguelph.ca><mailto:rmack...@uoguelph.ca>> > wrote: > > Daniel Braniss wrote: > > Hi, > The server with the problems is running FreeBSD 11.1 stable, it was working > fine for >several months, > but after a software upgrade of our NetAPP server it’s reporting many lockd > errors >and becomes catatonic, > ... > Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not responding > Dec 18 13:11:45 moo-09 last message repeated 7 times > Dec 18 13:12:55 moo-09 last message repeated 8 times > Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive again > Dec 18 13:13:10 moo-09 last message repeated 8 times > Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Listen > queue >overflow: 194 already in queue awaiting acceptance (1 occurrences) > Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Listen > queue >overflow: 193 already in queue awaiting acceptance (3957 occurrences) > Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xfffff8004cc051d0: Listen > queue >overflow: 193 already in queue awaiting acceptance … > Seems like their software upgrade didn't improve handling of NLM RPCs? > Appears to be handling RPCs slowly and/or intermittently. Note that no one > tests it with IPv6, so at least make sure you are still using IPv4 for the > mounts and > try and make sure IP broadcast works between client and Netapp. I think the > NLM > and NSM (rpc.statd) still use IP broadcast sometimes. > > we are ipv4 - we have our own class c :-) > Maybe the network guys can suggest more w.r.t. why, but as I've stated before, > the NLM is a fundamentally broken protocol which was never published by Sun, > so I suggest you avoid using it if at all possible. > well, at the moment the ball is on NetAPP court, and switching to NFSv4 at > the moment is out of the question, it’s > a production server used by several thousand students. > > > - If the locks don't need to be seen by other clients, you can just use the > "nolockd" > mount option. > or > - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers > should support NFSv4.1, which is a much better protocol that NFSv4.0. > > Good luck with it, rick > thanks > danny > > … > any ideas? > > thanks, > danny > > _______________________________________________ > freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org><mailto:freebsd-stable@freebsd.org> > mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscr...@freebsd.org<mailto:freebsd-stable-unsubscr...@freebsd.org>" > > _______________________________________________ > freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org><mailto:freebsd-stable@freebsd.org> > mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscr...@freebsd.org<mailto:freebsd-stable-unsubscr...@freebsd.org>" > > > _______________________________________________ > freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > > > _______________________________________________ > freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscr...@freebsd.org<mailto:freebsd-stable-unsubscr...@freebsd.org>" > _______________________________________________ > freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to > "freebsd-stable-unsubscr...@freebsd.org<mailto:freebsd-stable-unsubscr...@freebsd.org>" > _______________________________________________ freebsd-stable@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"