Hi, The clone command just clones the data from node2 to node1, you need to also register it with the `force` option to override the old record. (as if you're building a new replica node...) see: https://github.com/2ndQuadrant/repmgr#converting-a-failed-master-to-a-standby
Regards, - Jony On Sun, Aug 16, 2015 at 3:19 PM, Aviel Buskila <avie...@gmail.com> wrote: > Hey, > I think I know what the problem is, > after the first failover when I clone the old master to be standby with > the 'repmgr standby clone' command it seems that nothing updates the > repl_nodes table with the new standby in my cluster so on the next failover > the repmgrd is failed to find a new upcoming standby to failover.. > > this issue is confirmed after that I manually updated the repl_nodes table > after the clone so that the old master is now a standby database. > > now my question is: > Where does is suppose to happen that after I issue the 'repmgr standby > clone' the repl_nodes should be updated too about the new standby server? > > Best regards, > Aviel Buskila > > > > 2015-08-16 12:11 GMT+03:00 Aviel Buskila <avie...@gmail.com>: > >> hey, >> >> I have tried to set the configuration all over again, now the status of >> 'repl_nodes' before the failover is: >> >> id | type | upstream_node_id | cluster | name | conninfo | priority | >> active >> >> ----+---------+---------------+------------------------------------------------------------+----------+--------- >> 1 | master | | cluster_name |node1| host=node1 >> dbname=repmgr port=5432 user=repmgr | 100 | t >> 2 | standby| 1 | cluster_name |node2| host=node2 >> dbname=repmgr port=5432 user=repmgr | 100 | t >> >> 3 | witness| | cluster_name |node3| host=node3 >> dbname=repmgr port=5499 user=repmgr | 100 | t >> >> >> repmgr is started on node2 and node3 (standby and witness) now when I >> kill postgresmaster process I can see in the >> >> repmgrd log the following messages: >> >> [WARNING] connection to master has been lost, trying to recover... 60 >> seconds before failover decision >> >> [WARNING] connection to master has been lost, trying to recover... 50 >> seconds before failover decision >> >> [WARNING] connection to master has been lost, trying to recover... 40 >> seconds before failover decision >> >> [WARNING] connection to master has been lost, trying to recover... 30 >> seconds before failover decision >> >> [WARNING] connection to master has been lost, trying to recover... 20 >> seconds before failover decision >> >> [WARNING] connection to master has been lost, trying to recover... 10 >> seconds before failover decision >> >> >> and than when it tried to elect node2 to be promoted it shows the >> following messages: >> >> [DEBUG] connecting to: 'host=node2 user=repmgr dbname=repmgr >> fallback_application_name='repmgr'' >> >> [WARNING] unable to defermmine a valid master server; waiting 10 seconds >> to retry... >> >> [ERROR] unable to determine a valid master node, terminating... >> >> [INFO] repmgrd terminating.. >> >> >> >> what am I doing wrong? >> >> >> El 14/08/15 a las 04:14, Aviel Buskila escribió: >> > Hey, >> > yes I did .. and still it wont fail back.. >> >> Can you send over the output of "repmgr cluster show" before and after >> the failover process? >> >> The output of SELECT * FROM repmgr_schema.repl_nodes; after the failover >> (you need to change repmgr_schema with what you have configured). >> >> Also, which version of repmgr are you running? >> >> > 2015-08-13 16:23 GMT+03:00 Jony Vesterman Cohen <jony.cohe...@gmail.com >> >: >> > >> >> Hi, did you make the old master follow the new one using repmgr? >> >> >> >> It doesn't update itself automatically... >> >> From the looks of it repmgr thinks you have 2 masters - the old one >> >> offline and the new one online. >> >> Regards, >> >> -- >> Martín Marqués http://www.2ndQuadrant.com/ >> PostgreSQL Development, 24x7 Support, Training & Services >> >