On 28/10/2021 16:41, Numan Siddique wrote:
On Thu, Oct 28, 2021 at 5:20 AM Brendan Doyle <[email protected]> wrote:Numan, Just wondering if you got a chance to look at those logs?I looked into the logs, and as I had mentioned earlier you need this fix - https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXTrxBXf8$ Please let me know if you still see this issue with the latest OVN or with the version of OVN which has this fix. This fix is available from OVN 21.03 and onwards.
OK thanks, yes I look at that patch and it won't apply to the version we are running with@
ovn-nbctl 20.09.0_r1.0.0 Open vSwitch Library 2.14.0 DB Schema 5.27.0 I guess I need to backport of move forward.
Thanks NumanThanks Brendan On 27/10/2021 11:25, Brendan Doyle wrote: Hi, I finally got some debug logs, truncated after the failure occurs, the truncated entries just are repeated updates of the same entry. So some more light on this, It seems this is a timing issue. The test being run involves creating a number of Logical switches (LS), Routers (LR) and Distributed Router Port gateways (DR). And then immediately deleting them, with the last created DR being deleted first. Our CMs is using the ovsdbapp python lib to do this. So it occurs to me that perhaps the objects get created in NB, but before they have been propagated to SB and to the HV chassis, we get the delete, and this causes updates to be sent to the chassis for a logical port that does not exist? Just a hypothesis. The ovn-nbctl has synchronization flags (--wait) to guard against such behavior, does ovsdbapp I wonder? In any-case the test fails (we see a runaway conf.db) pretty regularly, but not every time. The failure is always observed on the delete operations. If I put a delay after create and before delete, then we don't see the failure. If anyone can shed light on this from the logs would be much appreciated. Thanks Brendan On 26/10/2021 17:11, Brendan Doyle wrote: On 26/10/2021 15:50, Numan Siddique wrote: On Tue, Oct 26, 2021 at 8:20 AM Brendan Doyle <[email protected]> wrote: Hi, So what is very odd here, is that I have used ovn-nbctl to delete the NB config, so # ovn-nbctl show # ovn-sbctl lflow-list Yet I still see /etc/openvswitch/conf.db growing with updates for Logical switch ports that no longer exist! "],["ct-zone-ln-ls_vcn9195577_external_ugw","220"],["ct-zone-ln-ls_vcn9206002_external_igw","110"],["ct-zone-ln-ls_vcn9210052_external_igw","110"],["ct-zone-ln-ls_vcn9232395_external_ugw","75"],["ct-zone-ln-ls_vcn9236987_external_igw","110"],["ct-zone-ln-ls_vcn9236987_external_ugw","78"],["ct-zone-ln-ls_vcn9255861_external_igw","118"],["ct-zone-ln-ls_vcn9255861_external_ugw","100"],["ct-zone-ln-ls_vcn9319435_external_igw","87"],["ct-zone-ln-ls_vcn9352502_external_igw","40"],["ct-zone-ln-ls_vcn9402504_external_ugw","99"],["ct-zone-ln-ls_vcn9403404_external_igw","133"],["ct-zone-ln-ls_vcn9403404_external_ugw","114"],["ct-zone-ln-ls_vcn9461566_external_ugw","191"],["ct-zone-ln-ls_vcn9480000_external_igw","254"],["ct-zone-ln-ls_vcn9480000_external_ugw","236"],["ct-zone-ln-ls_vcn9492134_external_igw","262"],["ct-zone-ln-ls_vcn9523503_external_igw","207"],["ct-zone-ln-ls_vcn9542102_external_igw","133"],["ct-zone-ln-ls_vcn9542102_external_ugw","115"],["ct-zone-ln-ls_vcn9559658_external_i
gw","125"],["ct-zone-ln-ls_vcn9559658_external_ugw","78"],["ct-zone-ln-ls_vcn9594034_external_igw","49"],["ct-zone-ln-ls_vcn9619021_external_igw","133"],["ct-zone-ln-ls_vcn9634773_external_igw","292"],["ct-zone-ln-ls_vcn9649169_external_igw","132"],["ct-zone-ln-ls_vcn9649169_external_ugw","110"],["ct-zone-ln-ls_vcn9661290_external_ugw","78"],["ct-zone-ln-ls_vcn9734192_external_ugw","114"],["ct-zone-ln-ls_vcn9774252_external_igw","262"],["ct-zone-ln-ls_vcn9796262_external_igw","72"],["ct-zone-ln-ls_vcn9796262_external_ugw","54"],["ct-zone-ln-ls_vcn9805903_external_igw","147"],["ct-zone-ln-ls_vcn9805903_external_ugw","126"],["ct-zone-ln-ls_vcn9809895_external_igw","246"],["ct-zone-ln-ls_vcn9812576_external_ugw","78"],["ct-zone-ln-ls_vcn9834728_external_igw","110"],["ct-zone-ln-ls_vcn9886683_external_ugw","114"],["ct-zone-ln-ls_vcn9903419_external_ugw","235"],["ct-zone-ln-ls_vcn9917510_external_igw","56"],["ct-zone-ln-ls_vcn9917510_external_ugw","38"]]]}},"_comment":"ovn-controller:
modifying OVS tunnels 'pcacn001'"} A shortened version of one entry Could it be that switch ports must be deleted before deleting the switch? I was under the impression once a switch is deleted it's ports get deleted? Yes. If you delete the switch, the switch ports get deleted too. After deleting the logical switch (or switch ports) do you see them to be deleted by ovn-northd in SB DB ? Run - ovn-sbctl list port_binding <deleted_port> or/and ovn-sbctl list datapath_binding <deleted_lswitch> I'd suggest you enable jsonrpc debug in ovn-controller and see what's happening. It would be helpful if you can share the ovn-controller debug logs. ovn-appctl -t ovn-controller vlog/set jsonrpc:dbg So in my test I create a simple network then delete it so NB DB and SB DB are empty. # ovn-sbctl list port_binding # ovn-sbctl list datapath_binding # The network has a number of LS's and LR's and two Distributed Router (DR) ports (on separate LRs). When I just create one DR all seems fine, but when I add the second into the mix I get a runaway openvswitch/conf.db but NOT on all chassis. I have 4 chassis that I can schedule the DR ports to. In this latest test I observed the runaway conf.db on pcacn003 & pcacn005. The logs are too large to send in email, is there an ftp server that I can upload to? I will redo with debug enabled and collect updated logs. The conf.db on both pcacn003 & pcacn005 is several GBs. The only way to recover is to stop the OVS/OVN procs, then delete /etc/openvswitch/conf.db and restart them. Brendan Thanks Numan switch 712757c3-2481-4f8b-940c-05dc13ce37a5 (ls_vcn9319435_external_ugw) port ls_vcn9319435_external_ugw-lr_vcn9319435 type: router router-port: lr_vcn9319435-ls_vcn9319435_external_ugw port ln-ls_vcn9319435_external_ugw type: localnet addresses: ["unknown"] router 80c281af-319b-416b-8a17-0ce7b8901bb1 (lr_vcn9319435) port lr_vcn9319435-ls_vcn9319435_external_ugw mac: "00:13:97:88:31:90" networks: ["253.255.80.4/16"] gateway chassis: [pcacn002 pcacn003 pcacn001] port lr_vcn9319435-lsb_vcn9319435 mac: "00:13:97:d4:26:ec" networks: ["253.255.29.2/25"] nat 6c87050f-cd27-423e-815e-deda74bd9bc6 external ip: "253.255.80.4" logical ip: "10.221.0.0/16" type: "snat" Do each port have to be deleted or is it ok to just delete the switch and router? Brendan On 25/10/2021 16:10, Brendan Doyle wrote: On 25/10/2021 15:08, Numan Siddique wrote: On Fri, Oct 22, 2021 at 9:30 AM Brendan Doyle <[email protected]> wrote: Hi, Looking at /etc/openvswitch/conf.db I see it getting very large: [root@pcacn001 ~]# ls -l /etc/openvswitch/conf.db -rw-r--r--. 1 root root 6069248828 Oct 22 11:55 /etc/openvswitch/conf.db And has lots and lots (mostly) "ovn-controller: modifying OVS tunnels" updates entries, like below. What are these? it does not seem normal? OVSDB JSON 4687 00e8788dd5d9af2aac5ca7724759017c52ddd580 {"_date":1634903752117,"Bridge":{"745726c4-0451-4f52-a52b-1f9c5e85c703":{"external_ids":["map",[["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_dnat","4"],["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_snat","1"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_dnat","4"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_snat","5"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_dnat","17"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_snat","7"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_dnat","9"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_snat","10"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_dnat","3"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_snat","4"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_dnat","19"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_snat","18"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_dnat","10"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_snat","16"],["ct-zone-423896cb-5573-4c54-b6e2-38f192eacae3_dnat","9"],["ct-zone-423896cb-5573 -4c54-b6e2-38f192eacae3_snat","12"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_dnat","10"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_snat","11"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_dnat","2"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_snat","5"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_dnat","18"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_snat","3"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_dnat","12"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_snat","11"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_dnat","10"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_snat","12"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_dnat","1"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_snat","11"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_dnat","2"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_snat","13"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_dnat","11"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_snat","12"],["ct-zone-9498aca9-7
62
3-4ce0-a0ff-d4d5c17d7223_dnat","19"],["ct-zone-9498aca9-7623-4ce0-a0ff-d4d5c17d7223_snat","15"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_dnat","18"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_snat","17"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_dnat","13"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_snat","10"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_dnat","8"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_snat","14"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_dnat","12"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_snat","11"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_dnat","13"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_snat","10"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_dnat","9"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_snat","1"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_dnat","16"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_snat","15"],["ct-zone-cfa46699-cc79-445e-a902-f1e37ff99806_dnat","5"],["ct-zone-cfa46699
-c
c79-445e-a902-f1e37ff99806_snat","2"],["ct-zone-cr-lr_vcn0747157-ls_vcn0747157_external_ugw","9"],["ct-zone-cr-lr_vcn1645571_igw-ls_vcn1645571_external_igw","21"],["ct-zone-cr-lr_vcn7319607-ls_vcn7319607_external_ugw","14"],["ct-zone-cr-lr_vcn7319607_igw-ls_vcn7319607_external_igw","21"],["ct-zone-cr-lr_vcn7395327_igw-ls_vcn7395327_external_igw","21"],["ct-zone-cr-lr_vcn9567153-ls_vcn9567153_external_ugw","1"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_dnat","9"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_snat","8"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_dnat","3"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_snat","6"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_dnat","12"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_snat","13"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_dnat","15"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_snat","14"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_dnat","16"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_snat","
17
"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_dnat","6"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_snat","7"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_dnat","13"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_snat","11"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_dnat","20"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_snat","19"],["ct-zone-ln-ls_vcn6603036_external_ugw","7"],["ct-zone-ln-ls_vcn7319607_external_igw","20"],["ct-zone-ln-ls_vcn7395327_external_ugw","7"],["ct-zone-ln-ls_vcn7836024_external_igw","20"],["ct-zone-ln-ls_vcn9567153_external_igw","21"],["ct-zone-ln-ls_vcn9567153_external_ugw","8"]]]}},"_comment":"ovn-controller: modifying OVS tunnels 'pcacn001'"} In which OVN version are you seeing this ? ovs-vsctl -V ovs-vsctl (Open vSwitch) 2.14.0_r0.0.0 DB Schema 8.2.0 # ovn-nbctl -V ovn-nbctl 20.09.0_r1.0.0 Open vSwitch Library 2.14.0 DB Schema 5.27.0 I wonder if you're seeing this issue - https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgReUsdtEU$ Have to step out for a bit will look at this when I can What I can say is that we are using ovsdbapp to configure central, and I see /etc/openvswitch/conf.db getting up to several Gb! so much so that systemd times out when you try start the service using it. I am also seeing ovs-vswitchd getting a SEGV on a regular basis which I think is related. I wondering if this patch might help [External] : Re: [ovs-dev] [PATCH branch-2.14] python: idl: Avoid sending transactions when the DB is not synced up. I'm not sure. /etc/openvswitch/conf.db is the local ovsdb-server database and not the OVN database. Numan If you run a tail on /etc/openvswitch/conf.db, do you see the ct zone ids toggling between 2 values constantly ? Thanks Numan Thanks Brendan _______________________________________________ discuss mailing list [email protected] https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgR-G4xGfo$ _______________________________________________ discuss mailing list [email protected] https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!cR934SfxrIJu507dsVUIyZ7JHH9WWkNjqT4uWiSsnnfk72lkytha0jMrSq39KbktpyU$ _______________________________________________ discuss mailing list [email protected] https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!aXU0ishuScB8BUBe7ocXxXDlPWZCYdhri_dfVWZN8rSI68YA6J3XGRVlo1SQy9umVfs$ _______________________________________________ discuss mailing list [email protected] https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!c1HxNgHI2KosY03K_FFa5GpfOez9mAgB_8fm8G8Z-hCxG9RpSlq-pE8OO1R0lILyU-k$ _______________________________________________ discuss mailing list [email protected] https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!fD4xiCtsxdVfl4DnJx7GuPacUj3Tt3j19-f571D1i2v_sJfL7xvt0W_aJeZva9Y7nh8$ _______________________________________________ discuss mailing list [email protected] https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXJ6POWWk$
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
