Hello Ales and all I just felt to obliged in competing the case with some explanation from my side
We have made some additional investigation of the case and came to conclusion that the main RC of this was the change we made 6th of November, changing ‘external_ids:ovn-monitor-all’ from false (default) to true [3]. Thus, making ovn-controller to receive datapathes from all the nodes. Fortunately, we were able to reproduce the case at our staging environment and after switching back the mentioned hint we have observed the decrease is flows from ‘359340’ to ‘86197’ and ct-zones from ‘4487’ to ‘1044’ correspondently. So the hint [3] had given us respite in time, yet had not solved our problem. We are going to Implement Relays for SBDB and switch back the hint [3] to default value. The only thing that remains some awkward to me is that why probe interval seems to be non-manageable at ovsdb-server side and why it was decided to hard code one? [0] [3] - https://mail.openvswitch.org/pipermail/ovs-discuss/2023-November/052798.html From: Ales Musil <amu...@redhat.com> Date: Wednesday, 15 November 2023, 13:09 To: Шагов Георгий <gmsha...@cloud.ru> Cc: "ovs-discuss@openvswitch.org" <ovs-discuss@openvswitch.org> Subject: Re: [ovs-discuss] Enormous amount of records into openvswitch db Bridge table external_ids ct-zone ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте пароль, не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru<mailto:secur...@cloud.ru> On Wed, Nov 15, 2023 at 10:49 AM Шагов Георгий <gmsha...@cloud.ru<mailto:gmsha...@cloud.ru>> wrote: Hello Ales Hi, I really appreciate your reply. It helps a lot. * ovn-appctl -t ovn-controller ct-zone-list this request produced about 7K+ records. So, it seems like 6K5 records for ct-zone in Bridge Table seems to be valid Digging deeper we have found that it looks like a major cause for 100% CPU in ovn-controller is that it performs a full recompute constantly. Looking into logs of ovsdb-server of openvswitch_db we see constant messages: reconnect|ERR|tcp:127.0.0.1:53560<http://127.0.0.1:53560>: no response to inactivity probe after 5 seconds, disconnecting So it seems like openvswitch_db constantly drop the connection from ovn-controller due to inactivity. We tried to change an Inactivity Probe interval at openvswitch_db using ovn-controller setting into openvswitch_db:Open_VSwitch table into external_ids: ovn-openflow-probe-interval How it is explained here: https://mail.openvswitch.org/pipermail/ovs-dev/2020-August/373671.html Yet , it seems to be not working, regardless of the value we set (ex: ovn-openflow-probe-interval:”60”) we still do observe the same 5 secs interval into openvswitch_db logs: reconnect|ERR|tcp:127.0.0.1:53560<http://127.0.0.1:53560>: no response to inactivity probe after 5 seconds, disconnecting This is very confusing. There are multiple connections that we have from ovn-controller to br-int (ovsdb). Only one of them can be influenced by the "ovn-openflow-probe-interval". This was recently changed by "controller: disable OpenFlow inactivity probing" [0]. The fact that the probe is still failing after 5 seconds suggests that it is one of the hardcoded ones. This is explained in the email thread [1]. You can try to upgrade past the mentioned patch to see if that helps, unfortunately this is only on main currently and will be available in 24.03. Do we miss anything here? Any hint is appreciated. Thanx in advance. [0] https://github.com/ovn-org/ovn/commit/c16e5da803838fa66129eb61d7930fc84d237f85 [1] https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html Hopefully this helps. Best regards, Ales From: Ales Musil <amu...@redhat.com<mailto:amu...@redhat.com>> Date: Tuesday, 14 November 2023, 14:19 To: Шагов Георгий <gmsha...@cloud.ru<mailto:gmsha...@cloud.ru>> Cc: "ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>" <ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>> Subject: Re: [ovs-discuss] Enormous amount of records into openvswitch db Bridge table external_ids ct-zone ВНИМАНИЕ! ВНЕШНИЙ ОТПРАВИТЕЛЬ Если отправитель почты неизвестен, не переходите по ссылкам, не сообщайте пароль, не запускайте вложения и сообщите коллегам из ЦКЗ на secur...@cloud.ru<mailto:secur...@cloud.ru> On Tue, Nov 14, 2023 at 12:02 PM Шагов Георгий via discuss <ovs-discuss@openvswitch.org<mailto:ovs-discuss@openvswitch.org>> wrote: Hello All Hi, We do observe strangely f(or our installation) amount of records into openvswitch db Bridge table external_ids:ct-zone, i.e.: 6K5+ CT zone is allocated for most of the LSPs (there are some exceptions) and for all LR DNAT and SNAT that are local for the specified controller. Which means that you have a lot of ports and possibly routers on that single controller or the external-ids are not cleared on update (this would be a bug) . You can actually check the zone list by running: ovn-appctl -t ovn-controller ct-zone-list, to see if that matches the count of active zones that ovn-controller knows about. grep -A20 '^Bridge table' ./ovs.dump | grep external_ids | sed 's/ct-zone-/\nct-zone-/g' | sort | uniq | wc -l 6659 Details: 5 "Bridge" : { 6 "06ef9e06-188e-4654-93b2-5242a324a5c7" : { 7 "initial" : { 8 "datapath_type" : "system", 9 "external_ids" : [ 10 "map", 11 [ 12 [ 13 "ct-zone-00368809-59f5-4408-8ae3-fb5401ff6ea4_dnat", 14 "60" 15 ], 16 [ In that same time if I run: ovs-dpctl ct-stats-show Connections Stats: Total: 1672 TCP: 1269 UDP: 398 ICMP: 5 The questions are: * Who is writing into: openvswitch db Bridge table external_ids:ct-zone? ovn-controller is writing those values for the purpose of restoring the zones after restart. * Is there any way to manage these records into openvswitch db Bridge table external_ids? I want to purge them… ovn-controller will still write those that are new/changed if you purge them. I would advise against that if you care about the restoration after restart. This, actually kills ovn-controller in 100% CPU, since it gets reply from openvswitch with full number of ct-zone records into external_ids of Bridge table: 1 2023-11-13T14:28:47.976Z|10019838|jsonrpc|DBG|tcp:127.0.0.1:6640<http://127.0.0.1:6640>: received reply, result=[false,"00000000-0000-0000-0000-000000000000",{" Any help is extremely appreciated Yours truly, George УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение. CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email. _______________________________________________ discuss mailing list disc...@openvswitch.org<mailto:disc...@openvswitch.org> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss Best regards, Ales -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA<https://www.redhat.com> amu...@redhat.com<mailto:amu...@redhat.com> Error! Filename not specified.<https://red.ht/sig> УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение. CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email. -- Ales Musil Senior Software Engineer - OVN Core Red Hat EMEA<https://www.redhat.com> amu...@redhat.com<mailto:amu...@redhat.com> [Image removed by sender.]<https://red.ht/sig> УВЕДОМЛЕНИЕ О КОНФИДЕНЦИАЛЬНОСТИ: Это электронное сообщение и любые документы, приложенные к нему, содержат конфиденциальную информацию. Настоящим уведомляем Вас о том, что если это сообщение не предназначено Вам, использование, копирование, распространение информации, содержащейся в настоящем сообщении, а также осуществление любых действий на основе этой информации, строго запрещено. Если Вы получили это сообщение по ошибке, пожалуйста, сообщите об этом отправителю по электронной почте и удалите это сообщение. CONFIDENTIALITY NOTICE: This email and any files attached to it are confidential. If you are not the intended recipient you are notified that using, copying, distributing or taking any action in reliance on the contents of this information is strictly prohibited. If you have received this email in error please notify the sender and delete this email.
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss