We had a failover.
I would read the Patroni logs below as following.

2022-09-21 11:13:56,384 secondary did a HTTP GET request to primary. This 
failed with a read timeout.
2022-09-21 11:13:56,792 secondary promoted itself to primary
2022-09-21 11:13:57,279 primary did a HTTP GET request to secondary. An 
exception happend. Probably also due to read timeout.
2022-09-21 11:13:57,983 primary demoted itself

So, the failover has been caused by a network timeout between primary and 
secondary.
QUESTION 1 : Do you agree?

I thought that the Patroni nodes do not communicate directly with each other 
but only by DCS?
QUESTION 2: Is this not correct anymore?



===========================


patroni version: 2.1.3


===========================


Patroni Logfile of Host szhm49346 (IP 10.9.132.13) => Primary until Failover
...
...
2022-09-21 11:13:57,279 DEBUG: API thread: 10.9.132.16 - - "GET /patroni 
HTTP/1.1" 200 - latency: 2245.090 ms
2022-09-21 11:13:57,378 ERROR:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 566, in 
wrapper
    retval = func(self, *args, **kwargs) is not None
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 696, in 
_update_leader
    return self.retry(self._client.write, self.leader_path, self._name, 
prevValue=self._name, ttl=self._ttl)
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 447, in 
retry
    return retry(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/patroni/utils.py", line 334, in 
__call__
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/etcd/client.py", line 500, in write
    response = self.api_execute(path, method, params=params)
  File "/usr/lib/python3.6/site-packages/patroni/dcs/etcd.py", line 257, in 
api_execute
    return self._handle_server_response(response)
  File "/usr/lib/python3.6/site-packages/etcd/client.py", line 987, in 
_handle_server_response
    etcd.EtcdError.handle(r)
  File "/usr/lib/python3.6/site-packages/etcd/__init__.py", line 306, in handle
    raise exc(msg, payload)
etcd.EtcdCompareFailed: Compare failed : [pcl_p011@szhm49346 != 
pcl_p011@szhm49345]
2022-09-21 11:13:57,558 WARNING: Exception happened during processing of 
request from 10.9.132.16:49080
2022-09-21 11:13:57,965 ERROR: failed to update leader lock
2022-09-21 11:13:57,983 INFO: Demoting self (immediate-nolock)
2022-09-21 11:13:58,214 WARNING: Traceback (most recent call last):
  File "/usr/lib64/python3.6/socketserver.py", line 654, in 
process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib64/python3.6/socketserver.py", line 364, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/lib64/python3.6/socketserver.py", line 724, in __init__
    self.handle()
  File "/usr/lib64/python3.6/http/server.py", line 418, in handle
    self.handle_one_request()
  File "/usr/lib/python3.6/site-packages/patroni/api.py", line 652, in 
handle_one_request
    BaseHTTPRequestHandler.handle_one_request(self)
  File "/usr/lib64/python3.6/http/server.py", line 406, in handle_one_request
    method()
  File "/usr/lib/python3.6/site-packages/patroni/api.py", line 198, in 
do_GET_patroni
    self._write_status_response(200, response)
  File "/usr/lib/python3.6/site-packages/patroni/api.py", line 94, in 
_write_status_response
    self._write_json_response(status_code, response)
  File "/usr/lib/python3.6/site-packages/patroni/api.py", line 53, in 
_write_json_response
    self._write_response(status_code, json.dumps(response, default=str), 
content_type='application/json')
  File "/usr/lib/python3.6/site-packages/patroni/api.py", line 50, in 
_write_response
    self.wfile.write(body.encode('utf-8'))
  File "/usr/lib64/python3.6/socketserver.py", line 803, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
...
...


===========================


Patroni Logfile of Host szhm49345 (IP 10.9.132.16) => Standby until Failover
...
...
2022-09-21 11:13:54,381 DEBUG: Starting new HTTP connection (1): 
szhm49346.global.szh.loc:8009
2022-09-21 11:13:56,384 WARNING: Request failed to pcl_p011@szhm49346: GET 
http://szhm49346.global.szh.loc:8009/patroni 
(HTTPConnectionPool(host='szhm49346.global.szh.loc', port=8009): Max retries 
exceeded with url: /patroni (Caused by 
ReadTimeoutError("HTTPConnectionPool(host='szhm49346.global.szh.loc', 
port=8009): Read timed out. (read timeout=2)",)))
2022-09-21 11:13:56,484 DEBUG: Writing pcl_p011@szhm49345 to key 
/patroni/pcl_p011/leader ttl=30 dir=False append=False
2022-09-21 11:13:56,485 DEBUG: Converted retries value: 0 -> Retry(total=0, 
connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,562 DEBUG: http://10.7.211.13:2379 "PUT 
/v2/keys/patroni/pcl_p011/leader HTTP/1.1" 201 197
2022-09-21 11:13:56,562 DEBUG: Issuing read for key /patroni/pcl_p011/ with 
args {'recursive': True, 'retry': <patroni.utils.Retry object at 
0x7fcbb0d0c2b0>}
2022-09-21 11:13:56,563 DEBUG: Converted retries value: 0 -> Retry(total=0, 
connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,634 DEBUG: http://10.7.211.13:2379 "GET 
/v2/keys/patroni/pcl_p011/?recursive=true HTTP/1.1" 200 None
2022-09-21 11:13:56,635 DEBUG: Writing 
{"leader":"pcl_p011@szhm49345","sync_standby":null} to key 
/patroni/pcl_p011/sync ttl=None dir=False append=False
2022-09-21 11:13:56,635 DEBUG: Converted retries value: 0 -> Retry(total=0, 
connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,713 DEBUG: http://10.7.211.13:2379 "PUT 
/v2/keys/patroni/pcl_p011/sync HTTP/1.1" 200 368
2022-09-21 11:13:56,713 DEBUG: Writing 
{"conn_url":"postgres://szhm49345.global.szh.loc:5432/pcl_p011","api_url":"http://szhm49345.global.szh.loc:8009/patroni","state":"running","role":"replica","version":"2.1.3","checkpoint_after_promote":false,"xlog_location":9087609453816,"timeline":6}
 to key /patroni/pcl_p011/members/pcl_p011@szhm49345 ttl=30 dir=False 
append=False
2022-09-21 11:13:56,714 DEBUG: Converted retries value: 0 -> Retry(total=0, 
connect=None, read=None, redirect=0, status=None)
2022-09-21 11:13:56,791 DEBUG: http://10.7.211.13:2379 "PUT 
/v2/keys/patroni/pcl_p011/members/pcl_p011@szhm49345 HTTP/1.1" 200 896
2022-09-21 11:13:56,792 INFO: promoted self to leader by acquiring session lock
2022-09-21 11:13:56,798 INFO: cleared rewind state after becoming the leader
...
...




Reply via email to