My guess is that attempting to retrieve SRV and then AFSDB DNS
records for an "htaccess" top level domain is very slow to fail
on the problematic system for some reason.

I think it's kind of a known issue which has crept up in the past
for things like ".trash" as well.

You could probably find out where things get stuck by comparing
tcpdump outputs.

- Stephan

> On 08 Nov 2018, at 20:41, John Sopko <[email protected]> wrote:
> 
> Wow! Removing -afsdb and adding our db servers in the CellServDB seems
> to have fixed the problem. Does not make any sense, this machine and
> others running many years with -afsdb. And fs listcells works when
> -afsdb is used:
> 
> % fs listcells
> Cell dynroot on hosts.
> Cell cs.unc.edu on hosts toucan.cs.unc.edu quail.cs.unc.edu kiwi.cs.unc.edu.
> 
> % host -t AFSDB cs.unc.edu
> cs.unc.edu has AFSDB record 1 kiwi.cs.unc.edu.
> cs.unc.edu has AFSDB record 1 quail.cs.unc.edu.
> cs.unc.edu has AFSDB record 1 toucan.cs.unc.edu.
> 
> Thanks for the help. Is this a known issue?
> 
> 
> On Thu, Nov 8, 2018 at 1:59 PM Stephan Wiesand <[email protected]> 
> wrote:
>> 
>> Have you tried w/o -afsdb?
>> 
>>> On 08 Nov 2018, at 19:48, John Sopko <[email protected]> wrote:
>>> 
>>> nsswitch and DNS the same, the AFSDB records resolve fine, the
>>> /afs/cs.unc.edu cell works fine, just not /afs.
>>> 
>>> 
>>> On Thu, Nov 8, 2018 at 12:52 PM Stephan Wiesand <[email protected]> 
>>> wrote:
>>>> 
>>>> 
>>>>> On 8. Nov 2018, at 18:22, John Sopko <[email protected]> wrote:
>>>>> 
>>>>> I have been running two legacy Redhat 6.x web servers for several
>>>>> years. The apache httpd processes started to go into device wait state
>>>>> the last few days on one of the servers, the other server is fine,
>>>>> both are configured pretty much the same. I tracked this down to the
>>>>> web server trying to stat /afs/.htaccess. If I try to do an ls in /afs
>>>>> or cat /afs/.htaccess which does not exist, the commands take a long
>>>>> time to complete and first go into device wait state, it can take
>>>>> several minutes or they may hang indefinitely. The afs file system
>>>>> seems to be working fine, just accessing under /afs is the problem. On
>>>>> other Redhat 6.x systems accessing /afs is fast and have no problems.
>>>> 
>>>> Are the nsswitch and DNS resolver configurations the same on all systems?
>>>> Any differences in network restrictions?
>>>> Does it help to run afsd without -afsdb?
>>>> 
>>>> Just a wild guess,
>>>>       Stephan
>>>> 
>>>>> 
>>>>> I am running afsd with:
>>>>> 
>>>>> /usr/vice/etc/afsd -dynroot -fakestat-all -afsdb
>>>>> 
>>>>> Note I tried fakestat-all to see if that would help, I have been
>>>>> running just -fakesat, our db servers have afsdb records.
>>>>> 
>>>>> I removed all cells accept for our cell in CellServDB so only have this:
>>>>> 
>>>>> % pwd
>>>>> /afs
>>>>> 
>>>>> % ls -l
>>>>> total 4
>>>>> lrwxr-xr-x 1 root root   10 Dec 31  1969 cs -> cs.unc.edu/
>>>>> drwxr-xr-x 8 root root 2048 Mar  6  2015 cs.unc.edu/
>>>>> lrwxr-xr-x 1 root root   10 Dec 31  1969 unc -> cs.unc.edu/
>>>>> 
>>>>> I re-formatted the /usr/vice/cache partition and that did not help.
>>>>> 
>>>>> I cannot find any hardware problems, no clues in the syslog or on the
>>>>> console, the system disk including the cache is on a raid1/mirror
>>>>> disk. This is a Dell server and I run Dell OpenMange which is really
>>>>> good at reporting system and especially disk errors.
>>>>> 
>>>>> I am running the same afsd verison on our remaining rhel 6.x servers:
>>>>> 
>>>>> % fs version
>>>>> openafs 1.6.22.2
>>>>> 
>>>>> Distributor ID: RedHatEnterpriseWorkstation
>>>>> Release:        6.10
>>>>> 
>>>>> The problem is intermittent but goes into device wait most of the
>>>>> time, for example the first time ran fine, the second time it took
>>>>> 14.96 seconds.
>>>>> 
>>>>> % time ls -l
>>>>> total 4
>>>>> lrwxr-xr-x 1 root root   10 Dec 31  1969 cs -> cs.unc.edu
>>>>> drwxr-xr-x 8 root root 2048 Mar  6  2015 cs.unc.edu
>>>>> lrwxr-xr-x 1 root root   10 Dec 31  1969 unc -> cs.unc.edu
>>>>> 0.000u 0.000s 0:00.00 0.0%      0+0k 0+0io 0pf+0w
>>>>> 
>>>>> % time ls -l
>>>>> total 4
>>>>> lrwxr-xr-x 1 root root   10 Dec 31  1969 cs -> cs.unc.edu
>>>>> drwxr-xr-x 8 root root 2048 Mar  6  2015 cs.unc.edu
>>>>> lrwxr-xr-x 1 root root   10 Dec 31  1969 unc -> cs.unc.edu
>>>>> 0.000u 0.000s 0:14.96 0.0%      0+0k 0+0io 0pf+0w
>>>>> 
>>>>> Thanks for any help or ideas to try.

_______________________________________________
OpenAFS-info mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-info

Reply via email to