RHEL 7 (systemd) reboot

2018-10-09 Thread Bryce Pepper
I am running three instances (under different users) on a RHEL 7 server to 
support a vendor product.

In the defined services, the start & stop scripts work fine when invoked with 
systemctl {start|stop} whatever.service  but we have automated monthly patching 
which does a reboot.

Looking in /var/log/messages and the stop scripts do not get invoked on reboot, 
therefore I created a new shutdown service as described 
here<https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.

It appears that PostGreSQL is receiving a signal from somewhere prior to my 
script running...

Oct 05 14:18:56 kccontrolmt01 NetworkManager[787]:   [1538767136.0967] 
manager: NetworkManager state is now DISCONNECTED
Oct 05 14:18:56 kccontrolmt01 dbus[740]: [system] Activating via systemd: 
service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispa
Oct 05 14:18:56 kccontrolmt01 dbus[740]: [system] Activation via systemd failed 
for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation
Oct 05 14:18:56 kccontrolmt01 network[29310]: Shutting down interface eth0:  
Device 'eth0' successfully disconnected.
Oct 05 14:18:56 kccontrolmt01 network[29310]: [  OK  ]
Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: 

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Shutting down 
CONTROL-M.
Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: 

Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Waiting ...
Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: psql action 
failed. cannot perform sql command in /data00/ctmlinux/ctm_server/tmp/upd_CMS_SY
Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: db_execute_sql 
failed while processing /data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.
Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Failed to update 
CMS_SYSPRM table.
Oct 05 14:18:56 kccontrolmt01 stop_ctmlinux_server.sh[29185]: Be aware that the 
Configuration Agent might start the CONTROL-M/Server

The database must be available for the product to shut down in a consistent 
state.

I am open to suggestions.

Thanks,
Bryce

Bryce Pepper
Sr. Unix Applications Systems Engineer
The Kansas City Southern Railway Company
114 West 11th Street  |  Kansas City,  MO 64105
Office:  816.983.1512
Email:  bpep...@kcsouthern.com<mailto:bpep...@kcsouthern.com>



RE: RHEL 7 (systemd) reboot

2018-10-10 Thread Bryce Pepper
Adrian,
Thanks for the inquiry.  The function (db_execute_sql) is coming from a vendor 
(BMC) product called Control-M. It is a scheduling product.
The tmp file is deleted before I can see its contents but I believe it is 
trying to update some columns in the CMS_SYSPRM table. 
I also think the postgresql instance is already stopped and hence why the 
db_execute fails.  I will try to modify the vendor function to save off the 
contents of the query.

Bryce

p.s. Do you know of any verbose logging that could be turned on to catch when 
pgsql is being terminated?


-Original Message-
From: Adrian Klaver  
Sent: Tuesday, October 09, 2018 7:39 PM
To: Bryce Pepper ; pgsql-general@lists.postgresql.org
Subject: Re: RHEL 7 (systemd) reboot

This email originated from outside the company. Please use caution when opening 
attachments or clicking on links. If you suspect this to be a phishing attempt, 
please report via PhishAlarm.


On 10/9/18 11:06 AM, Bryce Pepper wrote:
> I am running three instances (under different users) on a RHEL 7 
> server to support a vendor product.
>
> In the defined services, the start & stop scripts work fine when 
> invoked with systemctl {start|stop} whatever.service  but we have 
> automated monthly patching which does a reboot.
>
> Looking in /var/log/messages and the stop scripts do not get invoked 
> on reboot, therefore I created a new shutdown service as described 
> here 
> <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>
> It appears that PostGreSQL is receiving a signal from somewhere prior 
> to my script running.
>

>
> The database must be available for the product to shut down in a 
> consistent state.
>
> I am open to suggestions.

What is the below doing or coming from?:

db_execute_sql failed while processing
/data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.

>
> Thanks,
>
> Bryce
>
> *Bryce Pepper*
>
> Sr. Unix Applications Systems Engineer
>
> *The Kansas City Southern Railway Company *
>
> 114 West 11^th Street  |  Kansas City,  MO 64105
>
> Office:  816.983.1512
>
> Email: bpep...@kcsouthern.com <mailto:bpep...@kcsouthern.com>
>


--
Adrian Klaver
adrian.kla...@aklaver.com



RE: RHEL 7 (systemd) reboot

2018-10-10 Thread Bryce Pepper
Here is the contents of the query and error:
[root@kccontrolmt01 tmp]# cat ctm.Xf9pQkg2
update CMS_SYSPRM set CURRENT_STATE='STOPPING',DESIRED_STATE='Down' where 
DESIRED_STATE <> 'Ignored'
;
psql: could not connect to server: Connection refused
Is the server running on host "kccontrolmt01" (10.1.32.53) and accepting
TCP/IP connections on port 5433?

-Original Message-
From: Adrian Klaver  
Sent: Tuesday, October 09, 2018 7:39 PM
To: Bryce Pepper ; pgsql-general@lists.postgresql.org
Subject: Re: RHEL 7 (systemd) reboot

This email originated from outside the company. Please use caution when opening 
attachments or clicking on links. If you suspect this to be a phishing attempt, 
please report via PhishAlarm.
________

On 10/9/18 11:06 AM, Bryce Pepper wrote:
> I am running three instances (under different users) on a RHEL 7 
> server to support a vendor product.
>
> In the defined services, the start & stop scripts work fine when 
> invoked with systemctl {start|stop} whatever.service  but we have 
> automated monthly patching which does a reboot.
>
> Looking in /var/log/messages and the stop scripts do not get invoked 
> on reboot, therefore I created a new shutdown service as described 
> here 
> <https://unix.stackexchange.com/questions/211924/effect-of-reboot-signal-on-systemd-service-state>.
>
> It appears that PostGreSQL is receiving a signal from somewhere prior 
> to my script running.
>

>
> The database must be available for the product to shut down in a 
> consistent state.
>
> I am open to suggestions.

What is the below doing or coming from?:

db_execute_sql failed while processing
/data00/ctmlinux/ctm_server/tmp/upd_CMS_SYSPRM_29448.

>
> Thanks,
>
> Bryce
>
> *Bryce Pepper*
>
> Sr. Unix Applications Systems Engineer
>
> *The Kansas City Southern Railway Company *
>
> 114 West 11^th Street  |  Kansas City,  MO 64105
>
> Office:  816.983.1512
>
> Email: bpep...@kcsouthern.com <mailto:bpep...@kcsouthern.com>
>


--
Adrian Klaver
adrian.kla...@aklaver.com



RE: RHEL 7 (systemd) reboot

2018-10-10 Thread Bryce Pepper
Sorry, I wasn't clear in the prior posts.   

The stop script is running during reboot. The problem is the database is not 
reachable when the stop script runs.  The ctmdist server shut down is as 
follows:
   Stop control-m application
   Stop control-m configuration agent
   Stop database

As you can see the intent is for the database to be shut down after the 
product. 

But as you noticed from /var/log/message the stop_ctmlinux_server.sh  script is 
running but unable to execute the update query.

I created the following Service definition and scripts that follow -- note 
there are 2 datacenters (ctmdist, ctmlinux) that have comparable scripts so I 
have only included one set:

[root@kccontrolmt01 ~]# cat ControlM_Shutdown.service
[Unit]
Description=Run mycommand at shutdown
Requires=network.target CTM_Postgre.service
DefaultDependencies=no
Before=shutdown.target reboot.target

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/root/scripts/control-m_shutdown.sh

[Install]
WantedBy=multi-user.target


[root@kccontrolmt01 ~]# cat /root/scripts/control-m_shutdown.sh
#!/bin/sh
  # Shutdown any running Control-M services
STATUS=$(/usr/bin/systemctl is-active CTMLinux_Server.service)
if [ ${STATUS} == "active" ]; then
  /usr/bin/systemctl stop CTMLinux_Server.service 
fi

STATUS=$(/usr/bin/systemctl is-active CTMDist_Server.service)
if [ ${STATUS} == "active" ]; then
  /usr/bin/systemctl stop CTMDist_Server.service 
fi

STATUS=$(/usr/bin/systemctl is-active EnterpriseManager.service)
if [ ${STATUS} == "active" ]; then
  /usr/bin/systemctl stop EnterpriseManager.service
fi
exit 0


#!/bin/bash

# stop CONTROL-M
if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ctm ]; then
  echo "Stopping CONTROL-M application"
  /data00/ctmlinux/ctm_server/scripts/shut_ctm
fi

# stop CONTROL-M Configuration Agent
if [ -f /data00/ctmlinux/ctm_server/scripts/shut_ca ]; then
  echo "Stopping CONTROL-M Server Configuration Agent"
  /data00/ctmlinux/ctm_server/scripts/shut_ca
fi

# stop database
/data00/ctmlinux/ctm_server/scripts/dbversion
if [ $? -ne 0 ] ; then
  echo "SQL Server is already stopped "
else
  if [ -f /data00/ctmlinux/ctm_server/scripts/shutdb ]; then
echo "Stopping SQL server for CONTROL-M"
/data00/ctmlinux/ctm_server/scripts/shutdb
  fi
fi

exit 0

-Original Message-
From: Adrian Klaver  
Sent: Wednesday, October 10, 2018 8:25 AM
To: Bryce Pepper ; pgsql-general@lists.postgresql.org
Subject: Re: RHEL 7 (systemd) reboot

This email originated from outside the company. Please use caution when opening 
attachments or clicking on links. If you suspect this to be a phishing attempt, 
please report via PhishAlarm.


On 10/10/18 5:32 AM, Bryce Pepper wrote:
> Adrian,
> Thanks for the inquiry.  The function (db_execute_sql) is coming from a 
> vendor (BMC) product called Control-M. It is a scheduling product.
> The tmp file is deleted before I can see its contents but I believe it is 
> trying to update some columns in the CMS_SYSPRM table.
> I also think the postgresql instance is already stopped and hence why the 
> db_execute fails.  I will try to modify the vendor function to save off the 
> contents of the query.

Alright, I'm confused. In your earlier post you said the stop script is not 
running. Yet here it is, just not at the right time. I think a more detailed 
explanation is needed:

1) The stop script you are concerned about is a systemd  script, one that you 
created or system provided?

2) What is the shutdown service you refer to?

3) Is there a separate shutdown script for the Control-M product?

4) What do you expect to happen vs what is happening?

>
> Bryce
>
> p.s. Do you know of any verbose logging that could be turned on to catch when 
> pgsql is being terminated?
>
>
> -Original Message-
> From: Adrian Klaver 
> Sent: Tuesday, October 09, 2018 7:39 PM
> To: Bryce Pepper ; 
> pgsql-general@lists.postgresql.org
> Subject: Re: RHEL 7 (systemd) reboot
>
> This email originated from outside the company. Please use caution when 
> opening attachments or clicking on links. If you suspect this to be a 
> phishing attempt, please report via PhishAlarm.
> 
>
> On 10/9/18 11:06 AM, Bryce Pepper wrote:
>> I am running three instances (under different users) on a RHEL 7 
>> server to support a vendor product.
>>
>> In the defined services, the start & stop scripts work fine when 
>> invoked with systemctl {start|stop} whatever.service  but we have 
>> automated monthly patching which does a reboot.
>>
>> Looking in /var/log/messages and the stop scripts do not get invoked 
>> on reboot, therefore I created a new shutdown service as d

RE: RHEL 7 (systemd) reboot

2018-10-11 Thread Bryce Pepper
Adrian,

Thanks for being willing to dig into this.  

You are correct there are other scripts being called from mine (delivered by 
BMC with their software).   In order to stay in support and work with their 
updates I use the vendor supplied scripts/programs.  

The Control-M product is installed on this single server and is broken down 
into the following parts:
Enterprise server with dedicated postgresql instance
Distributed datacenter with agent and dedicated postgresql instance
Linux datacenter with with agent and dedicated postgresql instance

To cut down on the noise, my post only focused on the "Distributed" side and 
shutdown process -- although the ControlM_Shutdown.service unit stop script 
manages all of the above components.

In the ControlM_Shutdown.service there is a requires statement identifying that 
 network must be available while this systemd unit runs.

You noticed that the eth0 disconnected in the /var/log/messages.   I showed 
that to highlight that the unit was not executing in the order I had intended, 
again refer to the requires statement.

The second shebang is from one of the invoked subscripts 
(stop_ctmdist_server.sh) and is the "main" shutdown sequence for the 
Distributed datacenter (I think the "SQL server" echo from BMC is because it 
can be configured with other databases and they use it in a generic term --- 
not meaning sqlserver from Microsoft).

The dbversion check is being used to verify pgsql instance for this datacenter 
is running and returns a non-zero return code if the instance is unreachable (I 
could use pg_isready or pg_ctl but would diverge further from the BMC supported 
technique).

You probably also noticed in the earlier posted shutdown service a requires of 
CTM_Postgre.service.  This was one of my attempts to ensure the instance was 
available by actually starting the instance outside of the BMC routines (if it 
is already running the BMC routines will not start -- the dbversion check is on 
the start side also).  I thought if I managed the postgresql instance outside 
of the product I could ensure it was running.  Unfortunately that didn't work 
as the instance shutdown on its own, presumably a resource (perhaps network) 
was terminated and postgresql shutdown.  

So to restate the original post...   It appears the postgresql instance is 
unavailable when the stop script runs.  

Thanks,
Bryce

[root@kccontrolmt01 ~]# systemctl --full cat ControlM_Shutdown.service
# /etc/systemd/system/ControlM_Shutdown.service
[Unit]
Description=Run ControlM shutdown process
Requires=graphical.target multi-user.target network.target network.service 
sockets.target
DefaultDependencies=no
Before=shutdown.target reboot.target halt.target poweroff.target kexec.target

[Service]
Type=oneshot
RemainAfterExit=true
ExecStart=/bin/true
ExecStop=/bin/bash /root/scripts/control-m_shutdown.sh
TimeoutStopSec=4min

[Install]
WantedBy=multi-user.target
[root@kccontrolmt01 ~]#



RE: RHEL 7 (systemd) reboot

2018-10-11 Thread Bryce Pepper
Adrian,

I tried changing the Before to After but the postgresql instance was still 
shutdown too early. 

I appreciate all of the help but think I'm going to ask the patching group to 
ensure they stop the control-m services prior to reboot. 

Bryce

Oct 11 09:19:57 kccontrolmt01 su[9816]: pam_unix(su-l:session): session opened 
for user sa_ctmlinux_uat by (uid=0)
Oct 11 09:19:57 kccontrolmt01 systemd[1]: Started Restore /run/initramfs.
Oct 11 09:19:57 kccontrolmt01 stop_ctmdist_agent.sh[9671]: setenv: Too many 
arguments.
Oct 11 09:19:57 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: setenv: Too many 
arguments.
Oct 11 09:19:57 kccontrolmt01 stop_ctmdist_agent.sh[9671]: Killing 
Control-M/Agent Listener pid:5595
Oct 11 09:19:57 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: Killing 
Control-M/Agent Listener pid:5977
Oct 11 09:19:58 kccontrolmt01 stop_ctmdist_agent.sh[9671]: 2018-10-11 09:19:58 
Listener process stopped
Oct 11 09:19:58 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: 2018-10-11 09:19:58 
Listener process stopped
Oct 11 09:19:58 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: Killing 
Control-M/Agent Tracker pid:6199
Oct 11 09:19:58 kccontrolmt01 stop_ctmdist_agent.sh[9671]: Killing 
Control-M/Agent Tracker pid:6172
Oct 11 09:19:58 kccontrolmt01 systemd[1]: Stopped Dynamic System Tuning Daemon.
Oct 11 09:19:59 kccontrolmt01 stop_ctmlinux_agent.sh[9672]: 2018-10-11 09:19:59 
Tracker process stopped
Oct 11 09:19:59 kccontrolmt01 stop_ctmdist_agent.sh[9671]: 2018-10-11 09:19:59 
Tracker process stopped
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Eracent EUA Service.
Oct 11 09:19:59 kccontrolmt01 su[9815]: pam_unix(su-l:session): session closed 
for user sa_ctmdist_uat
Oct 11 09:19:59 kccontrolmt01 su[9816]: pam_unix(su-l:session): session closed 
for user sa_ctmlinux_uat
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Control-M CTM Dist Agent.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Control-M CTM Dist Server...
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Control-M CTM Linux Agent.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Control-M CTM Linux Server...
Oct 11 09:19:59 kccontrolmt01 su[10319]: (to sa_ctmdist_uat) root on none
Oct 11 09:19:59 kccontrolmt01 su[10320]: (to sa_ctmlinux_uat) root on none
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Requested transaction contradicts 
existing jobs: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 systemd-logind[777]: Failed to start session 
scope session-c12.scope: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 su[10319]: pam_systemd(su-l:session): Failed to 
create session: Resource deadlock avoided
Oct 11 09:19:59 kccontrolmt01 su[10319]: pam_unix(su-l:session): session opened 
for user sa_ctmdist_uat by (uid=0)
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Requested transaction contradicts 
existing jobs: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 systemd-logind[777]: Failed to start session 
scope session-c13.scope: Transaction is destructive.
Oct 11 09:19:59 kccontrolmt01 su[10320]: pam_systemd(su-l:session): Failed to 
create session: Resource deadlock avoided
Oct 11 09:19:59 kccontrolmt01 su[10320]: pam_unix(su-l:session): session opened 
for user sa_ctmlinux_uat by (uid=0)
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped Eracent EPA Service.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopped target Network.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping Network.
Oct 11 09:19:59 kccontrolmt01 systemd[1]: Stopping LSB: Bring up/down 
networking...
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many 
arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M 
application
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL Server is not 
running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv: Too many 
arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping 
CONTROL-M application
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: SQL Server is not 
running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: 

Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Shutting down 
CONTROL-M.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: 

Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Waiting ...
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: psql action 
failed. cannot perform sql command in 
/data00/ctmdist/ctm_server/tmp/upd_CMS_SYSP
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: db_execute_sql 
failed while processing /data00/ctmdist/ctm_server/tmp/upd_CMS_SYSPRM_10512.sq
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Failed to update 
CMS_SYSPRM table.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Be aware that the 
Configuration Agent might start the CONTROL-M/Server
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server

RE: RHEL 7 (systemd) reboot

2018-10-11 Thread Bryce Pepper
I disabled and removed the CTM_Postgre.service as it didn't help (and I didn't 
want too many moving parts left out there).

I did find a post 
https://superuser.com/questions/1016827/how-do-i-run-a-script-before-everything-else-on-shutdown-with-systemd
 that I think is getting me closer.

I triedRequiresMountsFor=/data00which starts the script much sooner but 
unfortunately  the  postgresql instance is unreachable by the time the script 
gets there.

These are two unique datacenter shutdowns: ctmdist  & ctmlinux 

Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: setenv: Too many 
arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: Stopping CONTROL-M 
application Oct 11 09:20:00 kccontrolmt01 stop_ctmdist_server.sh[10316]: SQL 
Server is not running.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: setenv:
Too many arguments.
Oct 11 09:20:00 kccontrolmt01 stop_ctmlinux_server.sh[10318]: Stopping 
CONTROL-M application Oct 11 09:20:00 kccontrolmt01 
stop_ctmlinux_server.sh[10318]: SQL Server is not running.