My guess is that attempting to retrieve SRV and then AFSDB DNS records for an "htaccess" top level domain is very slow to fail on the problematic system for some reason.
I think it's kind of a known issue which has crept up in the past for things like ".trash" as well. You could probably find out where things get stuck by comparing tcpdump outputs. - Stephan > On 08 Nov 2018, at 20:41, John Sopko <[email protected]> wrote: > > Wow! Removing -afsdb and adding our db servers in the CellServDB seems > to have fixed the problem. Does not make any sense, this machine and > others running many years with -afsdb. And fs listcells works when > -afsdb is used: > > % fs listcells > Cell dynroot on hosts. > Cell cs.unc.edu on hosts toucan.cs.unc.edu quail.cs.unc.edu kiwi.cs.unc.edu. > > % host -t AFSDB cs.unc.edu > cs.unc.edu has AFSDB record 1 kiwi.cs.unc.edu. > cs.unc.edu has AFSDB record 1 quail.cs.unc.edu. > cs.unc.edu has AFSDB record 1 toucan.cs.unc.edu. > > Thanks for the help. Is this a known issue? > > > On Thu, Nov 8, 2018 at 1:59 PM Stephan Wiesand <[email protected]> > wrote: >> >> Have you tried w/o -afsdb? >> >>> On 08 Nov 2018, at 19:48, John Sopko <[email protected]> wrote: >>> >>> nsswitch and DNS the same, the AFSDB records resolve fine, the >>> /afs/cs.unc.edu cell works fine, just not /afs. >>> >>> >>> On Thu, Nov 8, 2018 at 12:52 PM Stephan Wiesand <[email protected]> >>> wrote: >>>> >>>> >>>>> On 8. Nov 2018, at 18:22, John Sopko <[email protected]> wrote: >>>>> >>>>> I have been running two legacy Redhat 6.x web servers for several >>>>> years. The apache httpd processes started to go into device wait state >>>>> the last few days on one of the servers, the other server is fine, >>>>> both are configured pretty much the same. I tracked this down to the >>>>> web server trying to stat /afs/.htaccess. If I try to do an ls in /afs >>>>> or cat /afs/.htaccess which does not exist, the commands take a long >>>>> time to complete and first go into device wait state, it can take >>>>> several minutes or they may hang indefinitely. The afs file system >>>>> seems to be working fine, just accessing under /afs is the problem. On >>>>> other Redhat 6.x systems accessing /afs is fast and have no problems. >>>> >>>> Are the nsswitch and DNS resolver configurations the same on all systems? >>>> Any differences in network restrictions? >>>> Does it help to run afsd without -afsdb? >>>> >>>> Just a wild guess, >>>> Stephan >>>> >>>>> >>>>> I am running afsd with: >>>>> >>>>> /usr/vice/etc/afsd -dynroot -fakestat-all -afsdb >>>>> >>>>> Note I tried fakestat-all to see if that would help, I have been >>>>> running just -fakesat, our db servers have afsdb records. >>>>> >>>>> I removed all cells accept for our cell in CellServDB so only have this: >>>>> >>>>> % pwd >>>>> /afs >>>>> >>>>> % ls -l >>>>> total 4 >>>>> lrwxr-xr-x 1 root root 10 Dec 31 1969 cs -> cs.unc.edu/ >>>>> drwxr-xr-x 8 root root 2048 Mar 6 2015 cs.unc.edu/ >>>>> lrwxr-xr-x 1 root root 10 Dec 31 1969 unc -> cs.unc.edu/ >>>>> >>>>> I re-formatted the /usr/vice/cache partition and that did not help. >>>>> >>>>> I cannot find any hardware problems, no clues in the syslog or on the >>>>> console, the system disk including the cache is on a raid1/mirror >>>>> disk. This is a Dell server and I run Dell OpenMange which is really >>>>> good at reporting system and especially disk errors. >>>>> >>>>> I am running the same afsd verison on our remaining rhel 6.x servers: >>>>> >>>>> % fs version >>>>> openafs 1.6.22.2 >>>>> >>>>> Distributor ID: RedHatEnterpriseWorkstation >>>>> Release: 6.10 >>>>> >>>>> The problem is intermittent but goes into device wait most of the >>>>> time, for example the first time ran fine, the second time it took >>>>> 14.96 seconds. >>>>> >>>>> % time ls -l >>>>> total 4 >>>>> lrwxr-xr-x 1 root root 10 Dec 31 1969 cs -> cs.unc.edu >>>>> drwxr-xr-x 8 root root 2048 Mar 6 2015 cs.unc.edu >>>>> lrwxr-xr-x 1 root root 10 Dec 31 1969 unc -> cs.unc.edu >>>>> 0.000u 0.000s 0:00.00 0.0% 0+0k 0+0io 0pf+0w >>>>> >>>>> % time ls -l >>>>> total 4 >>>>> lrwxr-xr-x 1 root root 10 Dec 31 1969 cs -> cs.unc.edu >>>>> drwxr-xr-x 8 root root 2048 Mar 6 2015 cs.unc.edu >>>>> lrwxr-xr-x 1 root root 10 Dec 31 1969 unc -> cs.unc.edu >>>>> 0.000u 0.000s 0:14.96 0.0% 0+0k 0+0io 0pf+0w >>>>> >>>>> Thanks for any help or ideas to try. _______________________________________________ OpenAFS-info mailing list [email protected] https://lists.openafs.org/mailman/listinfo/openafs-info
