Bug#840056: shibboleth-sp2-utils: upgrade attempt of shibboleth-sp2-utils gets hung at restart of shibd service

S. Banerian Tue, 11 Oct 2016 11:15:49 -0700

On 10/11/2016 03:22 AM, Ferenc Wágner wrote:
> "S. Banerian" <baner...@u.washington.edu> writes:
> 
>> On 10/09/2016 05:25 PM, Ferenc Wágner wrote:
>>
>>> "S. Banerian" <baner...@u.washington.edu> writes:
>>>
>>>> On 10/07/2016 02:04 PM, Ferenc Wágner wrote:
>>>>
>>>>> Could you please make sure shibd isn't running
>>>>> then show me the output of
>>>>>
>>>>> # sudo -u _shibd strace shibd -f -F
>> [...]
>> after some 12 hours of trying to start, failing, it finally started,
>> created shibd.sock, and under a test, worked.
> 
> Was this the doing of a single invocation of the above, or do you refer
> to systemd continuously trying to restart it and succeeding eventually?


this was systemd continually trying. i ensured no spurious shibd procs
were running.



>>> Can you provide a full GDB backtrace (after installing
>>> shibboleth-sp2-utils-dbgsym; please yell if you need precise
>>> instructions).
>>
>> does not appear to be in stretch. so i need the instructions.
> 
> It is in a separate archive, see
> https://wiki.debian.org/AutomaticDebugPackages.  But let's exclude the
> simple timeout problem beforehand.
> 
>>>> Note: prior to the upgrade, shibboleth was working.
>>>
>>> Which version of shibboleth was working for you?
>>
>> the version just prior to this one 2.6.0+dfsg1-3+b1 on stretch.
> 
> Do you mean 2.5.6+dfsg1-2?  Your dpkg or apt logs should reveal the
> upgraded version.

yes.


>>> Can you share your shibboleth2.xml?
>>
>> I'm a bit reluctant to provide some of the information in the
>> RequestMapper sections.
> 
> If configuring a longer timeout (below) does not help, please check if
> you can reproduce the issue without the sensitive parts.
> 
>> When I force a restart, systemctl restart shibd.service I get the issue
>> as before, where
>>
>> \_ /bin/systemd-tty-ask-password-agent --watch
>>
>> stays there for a looong time, and is not returning, systemctl says it
>> is started, but journalctl -xe gives:
>>
>> Oct 10 14:00:35 epics systemd[1]: shibd.service: Killing process 30980
>> (shibd) with signal SIGKILL.
>> Oct 10 14:00:35 epics systemd[1]: shibd.service: Main process exited,
>> code=killed, status=9/KILL
>> Oct 10 14:00:35 epics systemd[1]: Failed to start Shibboleth Service
>> Provider Daemon.
>> -- Subject: Unit shibd.service has failed
> 
> This really does not make much sense together...  And I can't see any
> systemd-tty-ask-password-agent processes at all for some reason.

we agree. no reason to be seeing this.



>> there is a shibd -f -F process running, but no shibd.sock file
> 
> Are you sure that process isn't from some manual start attempt?  Also,
> if you start an instance manually while systemd's still trying to
> occasionally restart shibd in the background, the socket may get lost.
> 
> So, first of all, tell systemd to stop shibd and wait for it:
> 
> # systemctl stop shibd
> 
> Then you should see something like:
> 
> # systemctl status shibd
> [...]
>    Active: inactive (dead) [...]
> [...]
>  Main PID: 360 (code=exited, status=0/SUCCESS)
> [...]
> Oct 11 11:34:39 elm systemd[1]: Stopped Shibboleth Service Provider Daemon.

actually, after doing that, I got:

systemctl  status shibd.service
● shibd.service - Shibboleth Service Provider Daemon
   Loaded: loaded (/lib/systemd/system/shibd.service; disabled; vendor
preset: enabled)
   Active: inactive (dead)
     Docs: man:shibd(8)

https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPshibd

Oct 11 10:35:18 epics systemd[1]: Stopped Shibboleth Service Provider
Daemon.
Oct 11 10:35:18 epics systemd[1]: Starting Shibboleth Service Provider
Daemon...
Oct 11 10:36:48 epics systemd[1]: shibd.service: Start operation timed
out. Terminating.
Oct 11 10:36:54 epics systemd[1]: shibd.service: State
'stop-final-sigterm' timed out. Killing.
Oct 11 10:36:54 epics systemd[1]: shibd.service: Killing process 5523
(shibd) with signal SIGKILL.
Oct 11 10:36:54 epics systemd[1]: shibd.service: Main process exited,
code=killed, status=9/KILL
Oct 11 10:36:54 epics systemd[1]: Failed to start Shibboleth Service
Provider Daemon.
Oct 11 10:36:54 epics systemd[1]: shibd.service: Unit entered failed state.
Oct 11 10:36:54 epics systemd[1]: shibd.service: Failed with result
'signal'.
Oct 11 10:36:58 epics systemd[1]: Stopped Shibboleth Service Provider
Daemon.



> Then start it manually:
> 
> # date; sudo -u _shibd /usr/sbin/shibd -f -F
> 
> Meanwhile check /var/log/shibboleth/shibd.log for progress; the
> timestamps should tell you where time was spent.

Did this, and after a while, it started.


>> I'm not convinced that systemd is behaving well.
> 
> Maybe it is, just the default start timeut (90s) is too short for your
> metadata setup.  Try setting it longer like:
> 
> # mkdir /etc/systemd/system/shibd.service.d
> # printf '[Service]\nTimeoutStartSec=5min\n' 
> >/etc/systemd/system/shibd.service.d/timeout.conf
> # systemctl daemon-reload
> # systemctl cat shibd
> [you should see the result at then end of output]
> 
> Make sure to Ctrl-C your manually started shibd process if it's still
> running before starting the systemd shibd service.
> 
>> with the attempt to perform
>> systemctl restart shibd.service
>> I'm now seeing the CPU at 100% and memory (but not yet swap) near 100% also.
>> and no shibd.sock.
> 
> Yes, the startup phase of shibd can consume lots of resources (Dynamic
> MetadataProvider can help with this).  And the default timeout changed
> from 5min to 1.5min in this upgrade, which might cause your problems.

adding the timeout.conf file, systemctl daemon-reload and then
systemctl start shibd

after approximately two minutes, the shibd process started.
I was able to use apache2 normally. I was able to
systemctl stop shibd   and start it again normally, and after two
minutes or so, it was running.

I have been able to reproduce this now. Two minutes seems to be the
requirement.

thank you.


-- 
Stefani Banerian
UW Clinical Cyclotron www.uwmcf.org
UW School of Medicine
UW Box 356043
206-598-0302
gpg key 6642E7EE
fingerprint = BD13 875D 2D03 5E1D 1E3B  8BF7 F4B8 63AD 6642 E7EE

Bug#840056: shibboleth-sp2-utils: upgrade attempt of shibboleth-sp2-utils gets hung at restart of shibd service

Reply via email to