RHEL 7 (systemd) reboot
I am running three instances (under different users) on a RHEL 7 server to support a vendor product. In the defined services, the start & stop scripts work fine when invoked with systemctl {start|stop} whatever.service but we have automated monthly patching which does a reboot. Looking in /var/log/messages and the stop scripts do not get invoked on reboot, therefore I created a new shutdown service as described here<https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>. It appears that PostGreSQL is receiving a signal from somewhere prior to my script running... Oct 05 14:18:56 kccontrolmt01 NetworkManager[787]: [1538767136.0967] manager: NetworkManager state is now DISCONNECTED Oct 05 14:18:56 kccontrolmt01 dbus[740]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispa Oct 05 14:18:56 kccontrolmt01 dbus[740]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation Oct 05 14:18:56 kccontrolmt01 network[29310]: Shutting down interface eth0: Device 'eth0' successfully disconnected. Oct 05 14:18:56 kccontrolmt01 network[29310]: [ OK ] Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Shutting down CONTROL-M. Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Waiting ... Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: psql action failed. cannot perform sql command in /data00/ctmlinux/ctm_server/tmp/upd_CMS_SY Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: db_execute_sql failed while processing /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448. Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Failed to update CMS_SYSPRM table. Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Be aware that the Configuration Agent might start the CONTROL-M/Server The database must be available for the product to shut down in a consistent state. I am open to suggestions. Thanks, Bryce Bryce Pepper Sr. Unix Applications Systems Engineer The Kansas City Southern Railway Company 114 West 11th Street | Kansas City, MO 64105 Office: 816.983.1512 Email: bpep...@kcsouthern.com<mailto:bpep...@kcsouthern.com>
RE: RHEL 7 (systemd) reboot
Adrian, Thanks for the inquiry. The function (db_execute_sql) is coming from a vendor (BMC) product called Control-M. It is a scheduling product. The tmp file is deleted before I can see its contents but I believe it is trying to update some columns in the CMS_SYSPRM table. I also think the postgresql instance is already stopped and hence why the db_execute fails. I will try to modify the vendor function to save off the contents of the query. Bryce p.s. Do you know of any verbose logging that could be turned on to catch when pgsql is being terminated? -Original Message- From: Adrian Klaver Sent: Tuesday, October 09, 2018 7:39 PM To: Bryce Pepper ; pgsql-general@lists.postgresql.org Subject: Re: RHEL 7 (systemd) reboot This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm. On 10/9/18 11:06 AM, Bryce Pepper wrote: > I am running three instances (under different users) on a RHEL 7 > server to support a vendor product. > > In the defined services, the start & stop scripts work fine when > invoked with systemctl {start|stop} whatever.service but we have > automated monthly patching which does a reboot. > > Looking in /var/log/messages and the stop scripts do not get invoked > on reboot, therefore I created a new shutdown service as described > here > <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>. > > It appears that PostGreSQL is receiving a signal from somewhere prior > to my script running. > > > The database must be available for the product to shut down in a > consistent state. > > I am open to suggestions. What is the below doing or coming from?: db_execute_sql failed while processing /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448. > > Thanks, > > Bryce > > *Bryce Pepper* > > Sr. Unix Applications Systems Engineer > > *The Kansas City Southern Railway Company * > > 114 West 11^th Street | Kansas City, MO 64105 > > Office: 816.983.1512 > > Email: bpep...@kcsouthern.com <mailto:bpep...@kcsouthern.com> > -- Adrian Klaver adrian.kla...@aklaver.com
RE: RHEL 7 (systemd) reboot
Here is the contents of the query and error: [root@kccontrolmt01 tmp]# cat ctm.Xf9pQkg2 update CMS_SYSPRM set CURRENT_STATE='STOPPING',DESIRED_STATE='Down' where DESIRED_STATE <> 'Ignored' ; psql: could not connect to server: Connection refused Is the server running on host "kccontrolmt01" (10.1.32.53) and accepting TCP/IP connections on port 5433? -Original Message- From: Adrian Klaver Sent: Tuesday, October 09, 2018 7:39 PM To: Bryce Pepper ; pgsql-general@lists.postgresql.org Subject: Re: RHEL 7 (systemd) reboot This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm. ________ On 10/9/18 11:06 AM, Bryce Pepper wrote: > I am running three instances (under different users) on a RHEL 7 > server to support a vendor product. > > In the defined services, the start & stop scripts work fine when > invoked with systemctl {start|stop} whatever.service but we have > automated monthly patching which does a reboot. > > Looking in /var/log/messages and the stop scripts do not get invoked > on reboot, therefore I created a new shutdown service as described > here > <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>. > > It appears that PostGreSQL is receiving a signal from somewhere prior > to my script running. > > > The database must be available for the product to shut down in a > consistent state. > > I am open to suggestions. What is the below doing or coming from?: db_execute_sql failed while processing /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448. > > Thanks, > > Bryce > > *Bryce Pepper* > > Sr. Unix Applications Systems Engineer > > *The Kansas City Southern Railway Company * > > 114 West 11^th Street | Kansas City, MO 64105 > > Office: 816.983.1512 > > Email: bpep...@kcsouthern.com <mailto:bpep...@kcsouthern.com> > -- Adrian Klaver adrian.kla...@aklaver.com
RE: RHEL 7 (systemd) reboot
Sorry, I wasn't clear in the prior posts. The stop script is running during reboot. The problem is the database is not reachable when the stop script runs. The ctmdist server shut down is as follows: Stop control-m application Stop control-m configuration agent Stop database As you can see the intent is for the database to be shut down after the product. But as you noticed from /var/log/message the stop_ctmlinux_server.sh script is running but unable to execute the update query. I created the following Service definition and scripts that follow -- note there are 2 datacenters (ctmdist, ctmlinux) that have comparable scripts so I have only included one set: [root@kccontrolmt01 ~]# cat ControlM_Shutdown.service [Unit] Description=Run mycommand at shutdown Requires=network.target CTM_Postgre.service DefaultDependencies=no Before=shutdown.target reboot.target [Service] Type=oneshot RemainAfterExit=true ExecStart=/bin/true ExecStop=/root/scripts/control-m_shutdown.sh [Install] WantedBy=multi-user.target [root@kccontrolmt01 ~]# cat /root/scripts/control-m_shutdown.sh #!/bin/sh # Shutdown any running Control-M services STATUS=$(/usr/bin/systemctl is-active CTMLinux_Server.service) if [ ${STATUS} == "active" ]; then /usr/bin/systemctl stop CTMLinux_Server.service fi STATUS=$(/usr/bin/systemctl is-active CTMDist_Server.service) if [ ${STATUS} == "active" ]; then /usr/bin/systemctl stop CTMDist_Server.service fi STATUS=$(/usr/bin/systemctl is-active EnterpriseManager.service) if [ ${STATUS} == "active" ]; then /usr/bin/systemctl stop EnterpriseManager.service fi exit 0 #!/bin/bash # stop CONTROL-M if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ctm ]; then echo "Stopping CONTROL-M application" /data00/ctmlinux/ctm_server/scripts/shut_ctm fi # stop CONTROL-M Configuration Agent if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ca ]; then echo "Stopping CONTROL-M Server Configuration Agent" /data00/ctmlinux/ctm_server/scripts/shut_ca fi # stop database /data00/ctmlinux/ctm_server/scripts/dbversion if [ $? -ne 0 ] ; then echo "SQL Server is already stopped " else if [ -f /data00/ctmlinux/ctm_server/scripts/shutdb ]; then echo "Stopping SQL server for CONTROL-M" /data00/ctmlinux/ctm_server/scripts/shutdb fi fi exit 0 -Original Message- From: Adrian Klaver Sent: Wednesday, October 10, 2018 8:25 AM To: Bryce Pepper ; pgsql-general@lists.postgresql.org Subject: Re: RHEL 7 (systemd) reboot This email originated from outside the company. Please use caution when opening attachments or clicking on links. If you suspect this to be a phishing attempt, please report via PhishAlarm. On 10/10/18 5:32 AM, Bryce Pepper wrote: > Adrian, > Thanks for the inquiry. The function (db_execute_sql) is coming from a > vendor (BMC) product called Control-M. It is a scheduling product. > The tmp file is deleted before I can see its contents but I believe it is > trying to update some columns in the CMS_SYSPRM table. > I also think the postgresql instance is already stopped and hence why the > db_execute fails. I will try to modify the vendor function to save off the > contents of the query. Alright, I'm confused. In your earlier post you said the stop script is not running. Yet here it is, just not at the right time. I think a more detailed explanation is needed: 1) The stop script you are concerned about is a systemd script, one that you created or system provided? 2) What is the shutdown service you refer to? 3) Is there a separate shutdown script for the Control-M product? 4) What do you expect to happen vs what is happening? > > Bryce > > p.s. Do you know of any verbose logging that could be turned on to catch when > pgsql is being terminated? > > > -Original Message- > From: Adrian Klaver > Sent: Tuesday, October 09, 2018 7:39 PM > To: Bryce Pepper ; > pgsql-general@lists.postgresql.org > Subject: Re: RHEL 7 (systemd) reboot > > This email originated from outside the company. Please use caution when > opening attachments or clicking on links. If you suspect this to be a > phishing attempt, please report via PhishAlarm. > > > On 10/9/18 11:06 AM, Bryce Pepper wrote: >> I am running three instances (under different users) on a RHEL 7 >> server to support a vendor product. >> >> In the defined services, the start & stop scripts work fine when >> invoked with systemctl {start|stop} whatever.service but we have >> automated monthly patching which does a reboot. >> >> Looking in /var/log/messages and the stop scripts do not get invoked >> on reboot, therefore I created a new shutdown service as d
RE: RHEL 7 (systemd) reboot
Adrian, Thanks for being willing to dig into this. You are correct there are other scripts being called from mine (delivered by BMC with their software). In order to stay in support and work with their updates I use the vendor supplied scripts/programs. The Control-M product is installed on this single server and is broken down into the following parts: Enterprise server with dedicated postgresql instance Distributed datacenter with agent and dedicated postgresql instance Linux datacenter with with agent and dedicated postgresql instance To cut down on the noise, my post only focused on the "Distributed" side and shutdown process -- although the ControlM_Shutdown.service unit stop script manages all of the above components. In the ControlM_Shutdown.service there is a requires statement identifying that network must be available while this systemd unit runs. You noticed that the eth0 disconnected in the /var/log/messages. I showed that to highlight that the unit was not executing in the order I had intended, again refer to the requires statement. The second shebang is from one of the invoked subscripts (stop_ctmdist_server.sh) and is the "main" shutdown sequence for the Distributed datacenter (I think the "SQL server" echo from BMC is because it can be configured with other databases and they use it in a generic term --- not meaning sqlserver from Microsoft). The dbversion check is being used to verify pgsql instance for this datacenter is running and returns a non-zero return code if the instance is unreachable (I could use pg_isready or pg_ctl but would diverge further from the BMC supported technique). You probably also noticed in the earlier posted shutdown service a requires of CTM_Postgre.service. This was one of my attempts to ensure the instance was available by actually starting the instance outside of the BMC routines (if it is already running the BMC routines will not start -- the dbversion check is on the start side also). I thought if I managed the postgresql instance outside of the product I could ensure it was running. Unfortunately that didn't work as the instance shutdown on its own, presumably a resource (perhaps network) was terminated and postgresql shutdown. So to restate the original post... It appears the postgresql instance is unavailable when the stop script runs. Thanks, Bryce [root@kccontrolmt01 ~]# systemctl --full cat ControlM_Shutdown.service # /etc/systemd/system/ControlM_Shutdown.service [Unit] Description=Run ControlM shutdown process Requires=graphical.target multi-user.target network.target network.service sockets.target DefaultDependencies=no Before=shutdown.target reboot.target halt.target poweroff.target kexec.target [Service] Type=oneshot RemainAfterExit=true ExecStart=/bin/true ExecStop=/bin/bash /root/scripts/control-m_shutdown.sh TimeoutStopSec=4min [Install] WantedBy=multi-user.target [root@kccontrolmt01 ~]#
RE: RHEL 7 (systemd) reboot
Adrian, I tried changing the Before to After but the postgresql instance was still shutdown too early. I appreciate all of the help but think I'm going to ask the patching group to ensure they stop the control-m services prior to reboot. Bryce Oct 11 09:19:57 kccontrolmt01 su[9816]: pam_unix(su-l:session): session opened for user sa_ctmlinux_uat by (uid=0) Oct 11 09:19:57 kccontrolmt01 systemd[1]: Started Restore /run/initramfs. Oct 11 09:19:57 kccontrolmt01 stop_ctmdist_agent.sh[9671]: setenv: Too many arguments. Oct 11 09:19:57 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: setenv: Too many arguments. Oct 11 09:19:57 kccontrolmt01 stop_ctmdist_agent.sh[9671]: Killing Control-M/Agent Listener pid:5595 Oct 11 09:19:57 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: Killing Control-M/Agent Listener pid:5977 Oct 11 09:19:58 kccontrolmt01 stop_ctmdist_agent.sh[9671]: 2018-10-11 09:19:58 Listener process stopped Oct 11 09:19:58 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: 2018-10-11 09:19:58 Listener process stopped Oct 11 09:19:58 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: Killing Control-M/Agent Tracker pid:6199 Oct 11 09:19:58 kccontrolmt01 stop_ctmdist_agent.sh[9671]: Killing Control-M/Agent Tracker pid:6172 Oct 11 09:19:58 kccontrolmt01 systemd[1]: Stopped Dynamic System Tuning Daemon. Oct 11 09:19:59 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: 2018-10-11 09:19:59 Tracker process stopped Oct 11 09:19:59 kccontrolmt01 stop_ctmdist_agent.sh[9671]: 2018-10-11 09:19:59 Tracker process stopped Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Eracent EUA Service. Oct 11 09:19:59 kccontrolmt01 su[9815]: pam_unix(su-l:session): session closed for user sa_ctmdist_uat Oct 11 09:19:59 kccontrolmt01 su[9816]: pam_unix(su-l:session): session closed for user sa_ctmlinux_uat Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Control-M CTM Dist Agent. Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Control-M CTM Dist Server... Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Control-M CTM Linux Agent. Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Control-M CTM Linux Server... Oct 11 09:19:59 kccontrolmt01 su[10319]: (to sa_ctmdist_uat) root on none Oct 11 09:19:59 kccontrolmt01 su[10320]: (to sa_ctmlinux_uat) root on none Oct 11 09:19:59 kccontrolmt01 systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive. Oct 11 09:19:59 kccontrolmt01 systemd-logind[777]: Failed to start session scope session-c12.scope: Transaction is destructive. Oct 11 09:19:59 kccontrolmt01 su[10319]: pam_systemd(su-l:session): Failed to create session: Resource deadlock avoided Oct 11 09:19:59 kccontrolmt01 su[10319]: pam_unix(su-l:session): session opened for user sa_ctmdist_uat by (uid=0) Oct 11 09:19:59 kccontrolmt01 systemd[1]: Requested transaction contradicts existing jobs: Transaction is destructive. Oct 11 09:19:59 kccontrolmt01 systemd-logind[777]: Failed to start session scope session-c13.scope: Transaction is destructive. Oct 11 09:19:59 kccontrolmt01 su[10320]: pam_systemd(su-l:session): Failed to create session: Resource deadlock avoided Oct 11 09:19:59 kccontrolmt01 su[10320]: pam_unix(su-l:session): session opened for user sa_ctmlinux_uat by (uid=0) Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Eracent EPA Service. Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped target Network. Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Network. Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping LSB: Bring up/down networking... Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many arguments. Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server is not running. Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv: Too many arguments. Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server is not running. Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Shutting down CONTROL-M. Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Waiting ... Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: psql action failed. cannot perform sql command in /data00/ctmdist/ctm_server/tmp/upd_CMS_SYSP Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: db_execute_sql failed while processing /data00/ctmdist/ctm_server/tmp/upd_CMS_SYSPRM_10512.sq Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Failed to update CMS_SYSPRM table. Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Be aware that the Configuration Agent might start the CONTROL-M/Server Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server
RE: RHEL 7 (systemd) reboot
I disabled and removed the CTM_Postgre.service as it didn't help (and I didn't want too many moving parts left out there). I did find a post https://superuser.com/questions/1016827/how-do-i-run-a-script-before-everything-else-on-shutdown-with-systemd that I think is getting me closer. I triedRequiresMountsFor=/data00which starts the script much sooner but unfortunately the postgresql instance is unreachable by the time the script gets there. These are two unique datacenter shutdowns: ctmdist & ctmlinux Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many arguments. Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server is not running. Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv: Too many arguments. Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping CONTROL-M application Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server is not running.