Hi Ilya, Thanks for quick replay and directions.
"Do you mean a RAFT leader change/transfer? I'm assuming so, because you mention disabling of leadership transfer on compaction" Yes, I am talking about RAFT leader change/transfer. "I don't understand why that is supposed to help, because clients should not disconnect on leadership change in the first place. All the SbDB clients, except for ovn-northd, should not require leader-only connections. Could you, please, check why is this happening?" Yes, you are right, I totally forget about our custom connection setting in SB DB, which forcing client to use leader node only: ovn-sbctl set-connection ptcp:6642:<IP of current RAFT leader node> ovn-sbctl set connection . inactivity_probe=60000 Also we have custom daemon, which periodically check if RAFT leader was changed and set address of new RAFT leader in connection string It was realized in this manner some time ago, because "inactivity_probe=60000" didn`t work with simple setting "ovn-sbctl set-connection ptcp:6642", clients with long-running tasks still got disconnects from SB DB after default 5sec probe interval. Ok, it seems I need to review and change this approach. "It may be a possibility. Not making it configurable, but just increasing it to a much higher value (it's capped by the size of the database itself, so should not grow too large)." It will be great, if so. I am still thinking, that transaction history of 100 records can cause issues with fast re-sync in big clouds, where a lot of clients fulfill this limit very quickly by transactions below (I can consider them as heartbeats): {"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308} {"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427} Thanks! Regards, Oleksandr -----Original Message----- From: Ilya Maximets <i.maxim...@ovn.org> Sent: Wednesday, September 20, 2023 11:34 PM To: Oleksandr Mykhalskyi <oleksandr.mykhals...@netcracker.com>; 'b...@openvswitch.org' <b...@openvswitch.org> Cc: Alexander Povaliukhin <alexander.povaliuk...@netcracker.com>; i.maxim...@ovn.org Subject: Re: [ovs-discuss] Possible bug with monitor_cond_since and fast re-sync of ovn-controllers [External Email] ________________________________ On 9/20/23 20:39, Oleksandr Mykhalskyi via discuss wrote: > Dear OVS developers, > > > > After recent update of our openstack cloud from xena to yoga (OVS 2.17 and > OVN 22.03), we faced critical issue related to reconnection of > ovn-controllers to OVN SB DB. > > From 2.17, python-ovs supports monitor_cond_since to fast re-sync of > ovn-controllers during reconnection to Southbound DB. > > Method monitor_cond_since looks at server`s transaction history to find > client`s last-txn-id. > > Transaction history limit is hardcoded with 100 records - > https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39 > > <https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39>, > and its root cause of our issues. > > > > For neutron with OVN, we have at least 2 transactions for each hypervisor per > 60 sec, ovn-controller and ovn-metadata-agent send heartbeats in this way, > updating timestamp for corresponding records in Chassis_Private table, > something like: > > / > {"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}/ > > / > {"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}/ > > For our cloud with 250 hypervisors, we have at least 500 such > heartbeat transactions per 60 sec > > > > Now look at next case - SB DB master moved to other node, agents of 250 > hypervisors try to reconnect to SB DB nodes while searching master one. Do you mean a RAFT leader change/transfer? I'm assuming so, because you mention disabling of leadership transfer on compaction. But neither ovn-controller nor neutron agents are leader-only. Why do they re-connect on leadership change? Sounds strange. > > First ~50 hypervisors reconnected and did fast re-sync, sent heartbeats and > filled transaction history by 100 records in this way. > > All other hypervisors are not able to do fast re-sync anymore, they try to > download whole SB contents and hung OVSDB completely. > > > > Its not a theoretical case, we observed such behavior couple of times after > update. > > To stabilize this situation, we had to stop all ovn-controllers and start > them one by one. It's not necessary to start them one by one. New connections are processed one by one on a server side, so already connected instances should not disconnect again (ovsdb-server should be able to reply to inactivity probes for them in time). But they should not really re-connect on leadership change in the first place. > > Here is examples on small test cloud: > > */ Reconnection of ovn-contoller, when less than 100 tnx executed/* > > / 2023-09-20T11:45:43.629Z|12099|jsonrpc|DBG|tcp:172.31.23.11:6642: > send request, method="*monitor_cond_since*", > params=["OVN_Southbound",...,"*8f7caad3-c20c-4563-ae20-8eaaf0f22287*"], > id=40940/ > > / 2023-09-20T11:45:43.635Z|12101|jsonrpc|DBG|tcp:172.31.23.11:6642: > received reply, > result=[*true*,"*1a70474d-c437-4af1-ac2e-4a6261c5ca70*",{"SB_Global"....], > id=40940/ > > / / > > / > {"eid":"*8f7caad3-c20c-4563-ae20-8eaaf0f22287*","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"nb_cfg":1078289,"nb_cfg_timestamp":1695210108374}},"_comment":"ovn-controller","_date":1695210108457,"_is_diff":true}],"term":9866,"index":*8583570*}/ > > / > {"eid":"*1a70474d-c437-4af1-ac2e-4a6261c5ca70*","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078292"]]]}},"_date":1695210343461,"_is_diff":true}],"term":9866,"index":*8583603*}/ > > / / > > / *Reconnection of ovn-contoller, when more than 100 tnx executed*/ > > / 2023-09-20T11:31:24.300Z|11888|jsonrpc|DBG|tcp:172.31.23.11:6642: send > request, method="*monitor_cond_since*", params=["OVN_Southbound",..., > "*75718296-7ff5-4d6d-9fe8-129fe4ea2075*"], id=40891/ > > / 2023-09-20T11:31:24.840Z|11890|jsonrpc|DBG|tcp:172.31.23.11:6642: > received reply, > result=[*false*,"*aefff81c-76ae-4978-ac38-c129ed1a1009*",{"RBAC_Role....], > id=40891/ > > / ### downloading of full SB DB contents…./ > > / / > > / > {"eid":"*75718296-7ff5-4d6d-9fe8-129fe4ea2075*","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":*8583308*}/ > > / > {"eid":"*aefff81c-76ae-4978-ac38-c129ed1a1009*","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":*8583427*}/ > > > > Before yoga, on kolla images with OVS 2.16, reconnect worked well. This sounds strange. As you said, with OVS 2.16 python clients didn't support fast re-sync, so the situation should have been worse, not better. > In long-term perspective, I can propose to make transaction history limit as > configurable parameter. It may be a possibility. Not making it configurable, but just increasing it to a much higher value (it's capped by the size of the database itself, so should not grow too large). On the other hand, starting with OVS 3.1, ovsdb-server should reply to new monitor requests without fast re-sync way faster. So, upgrade to that version may fix your problem. > > In short-term perspective, all we can do now - avoid transferring of > leadership (when doing snapshot) doing manual DB compaction time to time, > whithin 24 hours. I don't understand why that is supposed to help, because clients should not disconnect on leadership change in the first place. All the SbDB clients, except for ovn-northd, should not require leader-only connections. Could you, please, check why is this happening? Best regards, Ilya Maximets. > > Kindle ask you to review and comment this information. > > Thank you! > > Oleksandr Mykhalskyi, System Engineer > *Netcracker Technology* ________________________________ The information transmitted herein is intended only for the person or entity to which it is addressed and may contain confidential, proprietary and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer. _______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss