On Thu, 4 Sep 2014, Kuklin István wrote: > I've found another clue: > The shutdown problem initializes itself only if I cd to the afs share > (after kinit and aklog). Without that, shutdown is quick. > Here are some links to some pictures I took: > I see this on shutdown if I mount the share from a tty with a local > account (using kinit <a central account>, aklog, then cd): > http://pbrd.co/1o246kL > When it hangs, it looks like this(sorry for the quality): > http://pasteboard.co/2Nv06Lqd.jpg > Once it looked like this: > http://pasteboard.co/2Nv6A7q0.jpg > http://pasteboard.co/2Nv7OMtt.jpg > Here is a video: http://youtu.be/sAc44PtsJds
Thanks for putting in the time to capture all this data, I really appreciate the effort. I don't see an obvious "smoking gun", but there are at least a couple of hints. I have remotved 'quiet' from my kernel command line (/etc/default/grub) and set LogLevel=debug in /etc/systemd/system.conf to try and get more/better diagnostics. Also, using shutdown -H should leave the last messages visible without powering off. (I'm not sure that the very last messages are going to be helpful, though.) > In the video I'm logged in with a central profile (I use PAM modules for > AFS home directories), which can sudo on that machine. When I'm shutting > it down, you see what happens. It's not best quality and a couple of > lines are missing from the picture at the ending so if you wish, I can > record it again, for example the whole screen without moving the > camcorder. > Note that I'm using the same machine in the video as before, I've just > replaced the machine name earlier to client1 for better understanding. > > Please try to reproduce the problem by cd-ing to the afs share. I had cd-ed into /afs in my previous attempts, though maybe I did not have an active shell still there during the reboot attempts. Even now, when I halt the system with root's shell in /afs/..., I do not see a noticably longer shutdown time than when AFS has not been used. I can, however, reproduce some of the "hints" I mentioned above. Well, sometimes. It doesn't seem fully deterministic. In particular, there is a diagnostic about unmounting /afs failing, and later on a note that a "cold shutdown" is being performed (these two are related). You had a message "AFS isn't unmounted yet! Call aborted", which is another indicator of this, since it is what happens when a shutdown syscall is issued but shutdown is already in progress (but incomplete). At this point, I feel like the best step forward is going to be to use a proper systemd unit file for the client, instead of relying on the compatibility shims for sysvinit scripts, since there doesn't seem to be an obvious way to further debug exactly what's happening at the moment. I don't think I have an ETA for when that might happen, though. -Ben

