On 10/11/2016 03:22 AM, Ferenc Wágner wrote: > "S. Banerian" <baner...@u.washington.edu> writes: > >> On 10/09/2016 05:25 PM, Ferenc Wágner wrote: >> >>> "S. Banerian" <baner...@u.washington.edu> writes: >>> >>>> On 10/07/2016 02:04 PM, Ferenc Wágner wrote: >>>> >>>>> Could you please make sure shibd isn't running >>>>> then show me the output of >>>>> >>>>> # sudo -u _shibd strace shibd -f -F >> [...] >> after some 12 hours of trying to start, failing, it finally started, >> created shibd.sock, and under a test, worked. > > Was this the doing of a single invocation of the above, or do you refer > to systemd continuously trying to restart it and succeeding eventually?
this was systemd continually trying. i ensured no spurious shibd procs were running. >>> Can you provide a full GDB backtrace (after installing >>> shibboleth-sp2-utils-dbgsym; please yell if you need precise >>> instructions). >> >> does not appear to be in stretch. so i need the instructions. > > It is in a separate archive, see > https://wiki.debian.org/AutomaticDebugPackages. But let's exclude the > simple timeout problem beforehand. > >>>> Note: prior to the upgrade, shibboleth was working. >>> >>> Which version of shibboleth was working for you? >> >> the version just prior to this one 2.6.0+dfsg1-3+b1 on stretch. > > Do you mean 2.5.6+dfsg1-2? Your dpkg or apt logs should reveal the > upgraded version. yes. >>> Can you share your shibboleth2.xml? >> >> I'm a bit reluctant to provide some of the information in the >> RequestMapper sections. > > If configuring a longer timeout (below) does not help, please check if > you can reproduce the issue without the sensitive parts. > >> When I force a restart, systemctl restart shibd.service I get the issue >> as before, where >> >> \_ /bin/systemd-tty-ask-password-agent --watch >> >> stays there for a looong time, and is not returning, systemctl says it >> is started, but journalctl -xe gives: >> >> Oct 10 14:00:35 epics systemd[1]: shibd.service: Killing process 30980 >> (shibd) with signal SIGKILL. >> Oct 10 14:00:35 epics systemd[1]: shibd.service: Main process exited, >> code=killed, status=9/KILL >> Oct 10 14:00:35 epics systemd[1]: Failed to start Shibboleth Service >> Provider Daemon. >> -- Subject: Unit shibd.service has failed > > This really does not make much sense together... And I can't see any > systemd-tty-ask-password-agent processes at all for some reason. we agree. no reason to be seeing this. >> there is a shibd -f -F process running, but no shibd.sock file > > Are you sure that process isn't from some manual start attempt? Also, > if you start an instance manually while systemd's still trying to > occasionally restart shibd in the background, the socket may get lost. > > So, first of all, tell systemd to stop shibd and wait for it: > > # systemctl stop shibd > > Then you should see something like: > > # systemctl status shibd > [...] > Active: inactive (dead) [...] > [...] > Main PID: 360 (code=exited, status=0/SUCCESS) > [...] > Oct 11 11:34:39 elm systemd[1]: Stopped Shibboleth Service Provider Daemon. actually, after doing that, I got: systemctl status shibd.service ● shibd.service - Shibboleth Service Provider Daemon Loaded: loaded (/lib/systemd/system/shibd.service; disabled; vendor preset: enabled) Active: inactive (dead) Docs: man:shibd(8) https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPshibd Oct 11 10:35:18 epics systemd[1]: Stopped Shibboleth Service Provider Daemon. Oct 11 10:35:18 epics systemd[1]: Starting Shibboleth Service Provider Daemon... Oct 11 10:36:48 epics systemd[1]: shibd.service: Start operation timed out. Terminating. Oct 11 10:36:54 epics systemd[1]: shibd.service: State 'stop-final-sigterm' timed out. Killing. Oct 11 10:36:54 epics systemd[1]: shibd.service: Killing process 5523 (shibd) with signal SIGKILL. Oct 11 10:36:54 epics systemd[1]: shibd.service: Main process exited, code=killed, status=9/KILL Oct 11 10:36:54 epics systemd[1]: Failed to start Shibboleth Service Provider Daemon. Oct 11 10:36:54 epics systemd[1]: shibd.service: Unit entered failed state. Oct 11 10:36:54 epics systemd[1]: shibd.service: Failed with result 'signal'. Oct 11 10:36:58 epics systemd[1]: Stopped Shibboleth Service Provider Daemon. > Then start it manually: > > # date; sudo -u _shibd /usr/sbin/shibd -f -F > > Meanwhile check /var/log/shibboleth/shibd.log for progress; the > timestamps should tell you where time was spent. Did this, and after a while, it started. >> I'm not convinced that systemd is behaving well. > > Maybe it is, just the default start timeut (90s) is too short for your > metadata setup. Try setting it longer like: > > # mkdir /etc/systemd/system/shibd.service.d > # printf '[Service]\nTimeoutStartSec=5min\n' > >/etc/systemd/system/shibd.service.d/timeout.conf > # systemctl daemon-reload > # systemctl cat shibd > [you should see the result at then end of output] > > Make sure to Ctrl-C your manually started shibd process if it's still > running before starting the systemd shibd service. > >> with the attempt to perform >> systemctl restart shibd.service >> I'm now seeing the CPU at 100% and memory (but not yet swap) near 100% also. >> and no shibd.sock. > > Yes, the startup phase of shibd can consume lots of resources (Dynamic > MetadataProvider can help with this). And the default timeout changed > from 5min to 1.5min in this upgrade, which might cause your problems. adding the timeout.conf file, systemctl daemon-reload and then systemctl start shibd after approximately two minutes, the shibd process started. I was able to use apache2 normally. I was able to systemctl stop shibd and start it again normally, and after two minutes or so, it was running. I have been able to reproduce this now. Two minutes seems to be the requirement. thank you. -- Stefani Banerian UW Clinical Cyclotron www.uwmcf.org UW School of Medicine UW Box 356043 206-598-0302 gpg key 6642E7EE fingerprint = BD13 875D 2D03 5E1D 1E3B 8BF7 F4B8 63AD 6642 E7EE