Re: [Linux-HA] How to tell master-slave group set one node to master?

Takehiro Matsushima Fri, 13 Dec 2013 11:18:06 -0800

1. Well, it means rebuilding PostgreSQL replication cluster by using
pg_basebackup or rsync or something.
2. Thanks, but I'll try fist.


2013/12/14 Andrey Rogovsky <[email protected]>:
> 1. You meant crm resource cleanup or something else?
>
> 2. If you want - I can give you logs.
>
>
>
> 2013/12/13 Takehiro Matsushima <[email protected]>
>
>> 1. Temporarily, how about cleanup completely all nodes once? like
>> master is "a", slaves are "b" and "c".
>>
>> 2. It looks like it caused by RA... umm... I'll try building a cluster
>> on Debian 7.
>>
>> 2013/12/14 Andrey Rogovsky <[email protected]>:
>> > 1. How I can find status in the log? What exactly I need search in?
>> >
>> > 2. I did it and have this situation:
>> > On a node:
>> > root@a:~# sudo -u postgres psql
>> > could not change directory to "/root": Permission denied
>> > psql (9.3.2)
>> > Type "help" for help.
>> >
>> > postgres=# select client_addr,sync_state from pg_stat_replication;
>> >  client_addr  | sync_state
>> > --------------+------------
>> >  192.168.10.2 | async
>> >  192.168.10.3 | async
>> > (2 rows)
>> >
>> > So, pgsql is correct. But...
>> > root@a:~# crm_mon -VAf -1
>> > crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing
>> > msPostgresql from re-starting on a.mydomain.com: operation monitor
>> failed
>> > 'invalid parameter' (rc=2)
>> > crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing
>> > msPostgresql from re-starting on b.mydomain.com: operation monitor
>> failed
>> > 'invalid parameter' (rc=2)
>> > crm_mon[16456]: 2013/12/13_22:15:30 ERROR: unpack_rsc_op: Preventing
>> > msPostgresql from re-starting on c.mydomain.com: operation monitor
>> failed
>> > 'invalid parameter' (rc=2)
>> > ============
>> > Last updated: Fri Dec 13 22:15:30 2013
>> > Last change: Fri Dec 13 20:48:18 2013 via crmd on c.mydomain.com
>> > Stack: openais
>> > Current DC: a.mydomain.com - partition with quorum
>> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> > 3 Nodes configured, 3 expected votes
>> > 6 Resources configured.
>> > ============
>> >
>> > Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ]
>> >
>> >  apache-master-ip (ocf::heartbeat:IPaddr2): Started a.mydomain.com
>> >  apache (ocf::heartbeat:apache): Started a.mydomain.com
>> >
>> > Node Attributes:
>> > * Node a.mydomain.com:
>> >     + pgsql-data-status               : LATEST
>> > * Node c.mydomain.com:
>> >     + pgsql-data-status               : STREAMING|ASYNC
>> >     + pgsql-status                     : HS:async
>> > * Node b.mydomain.com:
>> >     + pgsql-data-status               : STREAMING|ASYNC
>> >     + pgsql-status                     : HS:async
>> >
>> > Migration summary:
>> > * Node a.mydomain.com:
>> > * Node b.mydomain.com:
>> > * Node c.mydomain.com:
>> >
>> > Failed actions:
>> >     pgsql:0_monitor_0 (node=a.mydomain.com, call=31, rc=2,
>> > status=complete): invalid parameter
>> >     pgsql:0_monitor_0 (node=b.mydomain.com, call=26, rc=2,
>> > status=complete): invalid parameter
>> >     pgsql:0_monitor_0 (node=c.mydomain.com, call=22, rc=2,
>> > status=complete): invalid parameter
>> > root@a:~#
>> >
>> > How I can fix it?
>> >
>> >
>> >
>> > 2013/12/13 Takehiro Matsushima <[email protected]>
>> >
>> >> 1. Excuse me, could you tell me status before a.mydomain.com fails?
>> >>
>> >> 2. Sorry, replace rep_mode="sync" with rep_mode="async" defined in
>> >> primitive pgsql.
>> >>
>> >> 2013/12/14 Andrey Rogovsky <[email protected]>:
>> >> > 1. If fall down:
>> >> > ============
>> >> > Last updated: Fri Dec 13 19:06:51 2013
>> >> > Last change: Fri Dec 13 10:06:49 2013 via cibadmin on a.mydomain.com
>> >> > Stack: openais
>> >> > Current DC: c.mydomain.com - partition with quorum
>> >> > Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
>> >> > 3 Nodes configured, 3 expected votes
>> >> > 6 Resources configured.
>> >> > ============
>> >> >
>> >> > Online: [ a.mydomain.com c.mydomain.com b.mydomain.com ]
>> >> >
>> >> > Full list of resources:
>> >> >
>> >> >  Resource Group: master
>> >> >      pgsql-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com
>> >> >  Master/Slave Set: msPostgresql [pgsql]
>> >> >      Masters: [ b.mydomain.com ]
>> >> >      Slaves: [ c.mydomain.com ]
>> >> >      Stopped: [ pgsql:0 ]
>> >> >  apache-master-ip (ocf::heartbeat:IPaddr2): Started b.mydomain.com
>> >> >  apache (ocf::heartbeat:apache): Started b.mydomain.com
>> >> >
>> >> > Node Attributes:
>> >> > * Node a.mydomain.com:
>> >> >     + master-pgsql:0                   : -INFINITY
>> >> >     + master-pgsql:1                   : 1000
>> >> >     + pgsql-data-status               : DISCONNECT
>> >> >     + pgsql-status                     : STOP
>> >> > * Node c.mydomain.com:
>> >> >     + master-pgsql:2                   : 100
>> >> >     + pgsql-data-status               : STREAMING|SYNC
>> >> >     + pgsql-status                     : HS:sync
>> >> > * Node b.mydomain.com:
>> >> >     + master-pgsql:0                   : -INFINITY
>> >> >     + master-pgsql:1                   : 1000
>> >> >     + pgsql-data-status               : LATEST
>> >> >     + pgsql-master-baseline           : 000000000F000090
>> >> >     + pgsql-status                     : PRI
>> >> >
>> >> > Migration summary:
>> >> > * Node a.mydomain.com:
>> >> >    pgsql:0: migration-threshold=1 fail-count=1
>> >> > * Node c.mydomain.com:
>> >> > * Node b.mydomain.com:
>> >> >
>> >> > Failed actions:
>> >> >     pgsql:0_monitor_4000 (node=a.mydomain.com, call=89, rc=7,
>> >> > status=complete): not running
>> >> >
>> >> > This is in the log file on a node:
>> >> > Dec 10 20:49:57 a pgsql[903]: INFO: Don't check
>> >> > /var/lib/postgresql/9.3/main during probe
>> >> > Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM operation
>> >> > pgsql-master-ip_monitor_0 (call=2, rc=7, cib-update=7, confirmed=true)
>> >> not
>> >> > running
>> >> > Dec 10 20:49:57 a pgsql[903]: INFO: PostgreSQL is down
>> >> > Dec 10 20:49:57 a lrmd: [890]: info: operation monitor[3] on pgsql:1
>> for
>> >> > client 893: pid 903 exited with return code 7
>> >> > Dec 10 20:49:57 a crmd: [893]: info: process_lrm_event: LRM operation
>> >> > pgsql:1_monitor_0 (call=3, rc=7, cib-update=8, confirmed=true) not
>> >> running
>> >> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending
>> >> flush
>> >> > op to all hosts for: probe_complete (true)
>> >> > Dec 10 20:49:57 a lrmd: [890]: info: rsc:pgsql:1 start[4] (pid 986)
>> >> > Dec 10 20:49:57 a pgsql[986]: INFO: Changing pgsql-status on
>> >> > a.mydomain.com: ->STOP.
>> >> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending
>> >> flush
>> >> > op to all hosts for: pgsql-status (STOP)
>> >> > Dec 10 20:49:57 a attrd: [891]: notice: attrd_trigger_update: Sending
>> >> flush
>> >> > op to all hosts for: master-pgsql:1 (-INFINITY)
>> >> > Dec 10 20:49:57 a pgsql[986]: INFO: Set all nodes into async mode.
>> >> > Dec 10 20:49:57 a pgsql[986]: INFO: server starting
>> >> > Dec 10 20:49:57 a pgsql[986]: INFO: PostgreSQL start command sent.
>> >> > Dec 10 20:49:58 a lrmd: [890]: info: RA output: (pgsql:1:start:stderr)
>> >> > psql: FATAL:  the database system is starting up
>> >> > Dec 10 20:49:58 a pgsql[986]: WARNING: Can't get PostgreSQL recovery
>> >> > status. rc=2
>> >> > Dec 10 20:49:58 a pgsql[986]: WARNING: Connection error (connection to
>> >> the
>> >> > server went bad and the session was not interactive) occurred while
>> >> > executing the psql command.
>> >> > Dec 10 20:49:59 a pgsql[986]: INFO: PostgreSQL is started.
>> >> > Dec 10 20:49:59 a pgsql[986]: INFO: Changing pgsql-status on
>> >> > a.mydomain.com: ->HS:alone.
>> >> > Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: Sending
>> >> flush
>> >> > op to all hosts for: pgsql-status (HS:alone)
>> >> > Dec 10 20:49:59 a lrmd: [890]: info: operation start[4] on pgsql:1 for
>> >> > client 893: pid 986 exited with return code 0
>> >> > Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM operation
>> >> > pgsql:1_start_0 (call=4, rc=0, cib-update=9, confirmed=true) ok
>> >> > Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 notify[5] (pid 1163)
>> >> > Dec 10 20:49:59 a lrmd: [890]: info: operation notify[5] on pgsql:1
>> for
>> >> > client 893: pid 1163 exited with return code 0
>> >> > Dec 10 20:49:59 a crmd: [893]: info: process_lrm_event: LRM operation
>> >> > pgsql:1_notify_0 (call=5, rc=0, cib-update=0, confirmed=true) ok
>> >> > Dec 10 20:49:59 a lrmd: [890]: info: rsc:pgsql:1 monitor[6] (pid 1207)
>> >> > Dec 10 20:49:59 a attrd: [891]: notice: attrd_trigger_update: Sending
>> >> flush
>> >> > op to all hosts for: pgsql-status (HS:alone)
>> >> >
>> >> > I think it is wrong, becouse is 2 live nodes. One can stay as master.
>> >> >
>> >> > Also this is in postgresql log on a node:
>> >> > 2013-12-06 10:56:53 MSK WARNING:  archive_mode enabled, yet
>> >> archive_command
>> >> > is not set
>> >> > 2013-12-06 10:57:37 MSK LOG:  received SIGHUP, reloading configuration
>> >> files
>> >> > 2013-12-06 10:57:37 MSK LOG:  parameter "archive_command" changed to
>> "cp
>> >> %p
>> >> > /var/lib/postgresql/9.3/pg_archive/%f"
>> >> > 2013-12-06 10:57:43 MSK ERROR:  a backup is not in progress
>> >> > 2013-12-06 10:57:43 MSK STATEMENT:  SELECT pg_stop_backup()
>> >> > 2013-12-07 10:24:22 MSK LOG:  received fast shutdown request
>> >> > 2013-12-07 10:24:22 MSK LOG:  aborting any active transactions
>> >> > 2013-12-07 10:24:22 MSK LOG:  autovacuum launcher shutting down
>> >> > 2013-12-07 10:24:22 MSK LOG:  shutting down
>> >> > 2013-12-07 10:24:22 MSK LOG:  database system is shut down
>> >> > 2013-12-07 10:24:29 MSK LOG:  database system was shut down at
>> 2013-12-07
>> >> > 10:24:22 MSK
>> >> > 2013-12-07 10:24:29 MSK LOG:  autovacuum launcher started
>> >> > 2013-12-07 10:24:29 MSK LOG:  database system is ready to accept
>> >> connections
>> >> > 2013-12-07 10:24:29 MSK LOG:  incomplete startup packet
>> >> > 2013-12-07 10:24:34 MSK LOG:  received fast shutdown request
>> >> > 2013-12-07 10:24:34 MSK LOG:  aborting any active transactions
>> >> > 2013-12-07 10:24:34 MSK LOG:  autovacuum launcher shutting down
>> >> > 2013-12-07 10:24:34 MSK LOG:  shutting down
>> >> > 2013-12-07 10:24:34 MSK LOG:  database system is shut down
>> >> > 2013-12-07 14:31:11 MSK LOG:  database system was shut down in
>> recovery
>> >> at
>> >> > 2013-12-07 14:29:19 MSK
>> >> > cp: cannot stat
>> `/var/lib/postgresql/9.3/pg_archive/00000002.history': No
>> >> > such file or directory
>> >> > 2013-12-07 14:31:11 MSK LOG:  entering standby mode
>> >> > cp: cannot stat
>> >> > `/var/lib/postgresql/9.3/pg_archive/000000010000000000000007': No such
>> >> file
>> >> > or directory
>> >> > 2013-12-07 14:31:11 MSK LOG:  consistent recovery state reached at
>> >> 0/7000090
>> >> > 2013-12-07 14:31:11 MSK LOG:  record with zero length at 0/7000090
>> >> > 2013-12-07 14:31:11 MSK LOG:  database system is ready to accept read
>> >> only
>> >> > connections
>> >> > 2013-12-07 14:31:12 MSK LOG:  incomplete startup packet
>> >> > 2013-12-07 14:31:14 MSK FATAL:  could not connect to the primary
>> server:
>> >> > could not connect to server: No route to host
>> >> >                 Is the server running on host "192.168.10.200" and
>> >> accepting
>> >> >                 TCP/IP connections on port 5432?
>> >> >
>> >> > Why master not got shutdown request? It is life.
>> >> >
>> >> >
>> >> > 2. There is my config:
>> >> > node a.mydomain.com \
>> >> >         attributes pgsql-data-status="DISCONNECT"
>> >> > node b.mydomain.com \
>> >> >         attributes pgsql-data-status="LATEST" pgsql-status="HS:async"
>> >> > node c.mydomain.com \
>> >> >         attributes pgsql-data-status="STREAMING|SYNC"
>> >> > pgsql-status="HS:async"
>> >> > primitive apache ocf:heartbeat:apache \
>> >> >         params configfile="/etc/apache2/apache2.conf" \
>> >> >         op monitor interval="1min"
>> >> > primitive apache-master-ip ocf:heartbeat:IPaddr2 \
>> >> >         params ip="192.168.10.100" nic="peervpn0" \
>> >> >         op monitor interval="30s"
>> >> > primitive pgsql ocf:heartbeat:pgsql \
>> >> >         params pgctl="/usr/lib/postgresql/9.3/bin/pg_ctl"
>> >> > psql="/usr/bin/psql" pgdata="/var/lib/postgresql/9.3/main"
>> start_opt="-p
>> >> 543
>> >> > 2" rep_mode="sync" node_list="a.mydomain.com b.mydomain.com
>> >> c.mydomain.com"
>> >> > restore_command="cp /v
>> >> > ar/lib/postgresql/9.3/pg_archive/%f %p" master_ip="192.168.10.200"
>> >> > restart_on_promote="true" config="/etc/postgresql/9.3/main/postgres
>> >> > ql.conf" \
>> >> >         op start interval="0s" timeout="60s" on-fail="restart" \
>> >> >         op monitor interval="4s" timeout="60s" on-fail="restart" \
>> >> >         op monitor interval="3s" role="Master" timeout="60s"
>> >> > on-fail="restart" \
>> >> >         op promote interval="0s" timeout="60s" on-fail="restart" \
>> >> >         op demote interval="0s" timeout="60s" on-fail="stop" \
>> >> >         op stop interval="0s" timeout="60s" on-fail="block" \
>> >> >         op notify interval="0s" timeout="60s"
>> >> > primitive pgsql-master-ip ocf:heartbeat:IPaddr2 \
>> >> >         params ip="192.168.10.200" nic="peervpn0" \
>> >> >         op start interval="0s" timeout="60s" on-fail="restart" \
>> >> >         op monitor interval="10s" timeout="60s" on-fail="restart" \
>> >> >         op stop interval="0s" timeout="60s" on-fail="block" \
>> >> >         meta target-role="Started"
>> >> > group master pgsql-master-ip
>> >> > ms msPostgresql pgsql \
>> >> >         meta master-max="1" master-node-max="1" clone-max="3"
>> >> > clone-node-max="1" target-role="Master" notify="true"
>> >> > location prefer-apache-node apache 150: b.mydomain.com
>> >> > colocation apache-with-ip inf: apache apache-master-ip
>> >> > colocation set_ip inf: master msPostgresql:Master
>> >> > order apache-after-ip inf: apache-master-ip apache
>> >> > order ip_down 0: msPostgresql:demote master:stop symmetrical=false
>> >> > order ip_up 0: msPostgresql:promote master:start symmetrical=false
>> >> > property $id="cib-bootstrap-options" \
>> >> >         dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
>> >> >         cluster-infrastructure="openais" \
>> >> >         expected-quorum-votes="3" \
>> >> >         stonith-enabled="false" \
>> >> >         crmd-transition-delay="0" \
>> >> >         last-lrm-refresh="1386751770"
>> >> > rsc_defaults $id="rsc-options" \
>> >> >         resource-stickiness="100" \
>> >> >         migration-threshold="1"
>> >> >
>> >> > Where I will add rep_mode="async"? In easch slave node attributes?
>> >> >
>> >> >
>> >> >
>> >> > 2013/12/13 Takehiro Matsushima <[email protected]>
>> >> >
>> >> >> Hello,
>> >> >>
>> >> >> 1. How is it work stably after that? Failover works correctly, too?
>> >> >>
>> >> >> 2. I see, in this case, specify rep_mode="async" in crm config then
>> >> >> all slaves run in async.
>> >> >>
>> >> >> --
>> >> >> Regards,
>> >> >> Takehiro Matsushima
>> >> >> _______________________________________________
>> >> >> Linux-HA mailing list
>> >> >> [email protected]
>> >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> >> See also: http://linux-ha.org/ReportingProblems
>> >> >>
>> >> > _______________________________________________
>> >> > Linux-HA mailing list
>> >> > [email protected]
>> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> > See also: http://linux-ha.org/ReportingProblems
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Takehiro Matsushima
>> >> _______________________________________________
>> >> Linux-HA mailing list
>> >> [email protected]
>> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> See also: http://linux-ha.org/ReportingProblems
>> >>
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>>
>>
>>
>> --
>> Regards,
>> Takehiro Matsushima
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems



-- 
Regards,
Takehiro Matsushima
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] How to tell master-slave group set one node to master?

Reply via email to