Re: [ovs-discuss] Possible bug with monitor_cond_since and fast re-sync of ovn-controllers

Oleksandr Mykhalskyi via discuss Thu, 21 Sep 2023 01:58:28 -0700

Hi Ilya,

Thanks for quick replay and directions.

"Do you mean a RAFT leader change/transfer?  I'm assuming so, because you 
mention disabling of leadership transfer on compaction"

Yes, I am talking about RAFT leader change/transfer.

"I don't understand why that is supposed to help, because clients should not 
disconnect on leadership change in the first place.  All the SbDB clients, 
except for ovn-northd, should not require leader-only connections. Could you, 
please, check why is this happening?"

Yes, you are right, I totally forget about our custom connection setting in SB 
DB, which forcing client to use leader node only:
  ovn-sbctl set-connection ptcp:6642:<IP of current RAFT leader node>
  ovn-sbctl set connection . inactivity_probe=60000
Also we have custom daemon, which periodically check if RAFT leader was changed 
and set address of new RAFT leader in connection string
It was realized in this manner some time ago, because "inactivity_probe=60000" 
didn`t work with simple setting "ovn-sbctl set-connection ptcp:6642", clients 
with long-running tasks still got disconnects from SB DB after default 5sec 
probe interval.
Ok, it seems I need to review and change this approach.

"It may be a possibility.  Not making it configurable, but just increasing it 
to a much higher value (it's capped by the size of the database itself, so 
should not grow too large)."

It will be great, if so.
I am still thinking, that transaction history of 100 records can cause issues 
with fast re-sync in big clouds, where a lot of clients fulfill this limit very 
quickly by transactions below (I can consider them as heartbeats):

{"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}
{"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}

Thanks!

Regards, Oleksandr

-----Original Message-----
From: Ilya Maximets <i.maxim...@ovn.org>
Sent: Wednesday, September 20, 2023 11:34 PM
To: Oleksandr Mykhalskyi <oleksandr.mykhals...@netcracker.com>; 
'b...@openvswitch.org' <b...@openvswitch.org>
Cc: Alexander Povaliukhin <alexander.povaliuk...@netcracker.com>; 
i.maxim...@ovn.org
Subject: Re: [ovs-discuss] Possible bug with monitor_cond_since and fast 
re-sync of ovn-controllers

[External Email]
________________________________

On 9/20/23 20:39, Oleksandr Mykhalskyi via discuss wrote:
> Dear OVS developers,
>
>
>
> After recent update of our openstack cloud from xena to yoga (OVS 2.17 and 
> OVN 22.03), we faced critical issue related to reconnection of 
> ovn-controllers to OVN SB DB.
>
> From 2.17, python-ovs supports monitor_cond_since to fast re-sync of 
> ovn-controllers during reconnection to Southbound DB.
>
> Method monitor_cond_since looks at server`s transaction history to find 
> client`s last-txn-id.
>
> Transaction history limit is hardcoded with 100 records - 
> https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39
>  
> <https://github.com/openvswitch/ovs/blob/master/ovsdb/transaction.c#L1666C39-L1666C39>,
>  and its root cause of our issues.
>
>
>
> For neutron with OVN, we have at least 2 transactions for each hypervisor per 
> 60 sec, ovn-controller and ovn-metadata-agent send heartbeats in this way, 
> updating timestamp for corresponding records in Chassis_Private table, 
> something like:
>
> /     
> {"eid":"75718296-7ff5-4d6d-9fe8-129fe4ea2075","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":8583308}/
>
> /     
> {"eid":"aefff81c-76ae-4978-ac38-c129ed1a1009","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":8583427}/
>
> For our cloud with 250 hypervisors, we have at least 500 such
> heartbeat transactions per 60 sec
>
>
>
> Now look at next case - SB DB master moved to other node, agents of 250 
> hypervisors try to reconnect to SB DB nodes while searching master one.

Do you mean a RAFT leader change/transfer?  I'm assuming so, because you 
mention disabling of leadership transfer on compaction.

But neither ovn-controller nor neutron agents are leader-only.
Why do they re-connect on leadership change?  Sounds strange.

>
> First ~50 hypervisors reconnected and did fast re-sync, sent heartbeats and 
> filled transaction history by 100 records in this way.
>
> All other hypervisors are not able to do fast re-sync anymore, they try to 
> download whole SB contents and hung OVSDB completely.
>
>
>
> Its not a theoretical case, we observed such behavior couple of times after 
> update.
>
> To stabilize this situation, we had to stop all ovn-controllers  and start 
> them one by one.

It's not necessary to start them one by one.  New connections are processed one 
by one on a server side, so already connected instances should not disconnect 
again (ovsdb-server should be able to reply to inactivity probes for them in 
time).
But they should not really re-connect on leadership change in the first place.

>
> Here is examples on small test cloud:
>
> */  Reconnection of ovn-contoller, when less than 100 tnx executed/*
>
> /       2023-09-20T11:45:43.629Z|12099|jsonrpc|DBG|tcp:172.31.23.11:6642: 
> send request, method="*monitor_cond_since*", 
> params=["OVN_Southbound",...,"*8f7caad3-c20c-4563-ae20-8eaaf0f22287*"], 
> id=40940/
>
> /      2023-09-20T11:45:43.635Z|12101|jsonrpc|DBG|tcp:172.31.23.11:6642: 
> received reply, 
> result=[*true*,"*1a70474d-c437-4af1-ac2e-4a6261c5ca70*",{"SB_Global"....], 
> id=40940/
>
> /   /
>
> /      
> {"eid":"*8f7caad3-c20c-4563-ae20-8eaaf0f22287*","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"nb_cfg":1078289,"nb_cfg_timestamp":1695210108374}},"_comment":"ovn-controller","_date":1695210108457,"_is_diff":true}],"term":9866,"index":*8583570*}/
>
> /     
> {"eid":"*1a70474d-c437-4af1-ac2e-4a6261c5ca70*","data":[null,{"Chassis_Private":{"f2d5a60d-03ff-4d60-ad6c-36c30e7f8546":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078292"]]]}},"_date":1695210343461,"_is_diff":true}],"term":9866,"index":*8583603*}/
>
> / /
>
> /   *Reconnection of ovn-contoller, when more than 100 tnx executed*/
>
> /     2023-09-20T11:31:24.300Z|11888|jsonrpc|DBG|tcp:172.31.23.11:6642: send 
> request, method="*monitor_cond_since*", params=["OVN_Southbound",..., 
> "*75718296-7ff5-4d6d-9fe8-129fe4ea2075*"], id=40891/
>
> /     2023-09-20T11:31:24.840Z|11890|jsonrpc|DBG|tcp:172.31.23.11:6642: 
> received reply, 
> result=[*false*,"*aefff81c-76ae-4978-ac38-c129ed1a1009*",{"RBAC_Role....], 
> id=40891/
>
> /     ### downloading of full SB DB contents…./
>
> / /
>
> /     
> {"eid":"*75718296-7ff5-4d6d-9fe8-129fe4ea2075*","data":[null,{"Chassis_Private":{"8a94ede6-ee86-443f-a861-19320b500767":{"nb_cfg":1078267,"nb_cfg_timestamp":1695208789023}},"_comment":"ovn-controller","_date":1695208789142,"_is_diff":true}],"term":9866,"index":*8583308*}/
>
> /     
> {"eid":"*aefff81c-76ae-4978-ac38-c129ed1a1009*","data":[null,{"Chassis_Private":{"94e607a0-9e1a-4ca4-9e26-b1cbdcdf362a":{"external_ids":["map",[["neutron:ovn-metadata-sb-cfg","1078278"]]]}},"_date":1695209448827,"_is_diff":true}],"term":9866,"index":*8583427*}/
>
>
>
> Before yoga, on kolla images with OVS 2.16, reconnect worked well.

This sounds strange.  As you said, with OVS 2.16 python clients didn't support 
fast re-sync, so the situation should have been worse, not better.

> In long-term perspective, I can propose to make transaction history limit as 
> configurable parameter.

It may be a possibility.  Not making it configurable, but just increasing it to 
a much higher value (it's capped by the size of the database itself, so should 
not grow too large).

On the other hand, starting with OVS 3.1, ovsdb-server should reply to new 
monitor requests without fast re-sync way faster.  So, upgrade to that version 
may fix your problem.

>
> In short-term perspective, all we can do now - avoid transferring of 
> leadership (when doing snapshot) doing manual DB compaction time to time, 
> whithin 24 hours.

I don't understand why that is supposed to help, because clients should not 
disconnect on leadership change in the first place.  All the SbDB clients, 
except for ovn-northd, should not require leader-only connections.

Could you, please, check why is this happening?

Best regards, Ilya Maximets.

>
> Kindle ask you to review and comment this information.
>
> Thank you!
>
> Oleksandr Mykhalskyi, System Engineer
> *Netcracker Technology*

________________________________
The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Re: [ovs-discuss] Possible bug with monitor_cond_since and fast re-sync of ovn-controllers

Reply via email to