Hi,I finally got some debug logs, truncated after the failure occurs, the truncated entries just
are repeated updates of the same entry.
So some more light on this, It seems this is a timing issue. The test being run involves creating a number of Logical switches (LS), Routers (LR) and Distributed Router Port gateways (DR). And then immediately deleting them, with the last created DR being
deleted first. Our CMs is using the ovsdbapp python lib to do this.So it occurs to me that perhaps the objects get created in NB, but before they have been propagated to SB and to the HV chassis, we get the delete, and this causes updates to be sent to the chassis for a logical port that does not exist? Just a hypothesis.
The ovn-nbctl has synchronization flags (--wait) to guard against such behavior, does
ovsdbapp I wonder?In any-case the test fails (we see a runaway conf.db) pretty regularly, but not every time. The failure is always observed on the delete operations. If I put a delay after create and
before delete, then we don't see the failure. If anyone can shed light on this from the logs would be much appreciated. Thanks Brendan On 26/10/2021 17:11, Brendan Doyle wrote:
On 26/10/2021 15:50, Numan Siddique wrote:On Tue, Oct 26, 2021 at 8:20 AM Brendan Doyle <[email protected]> wrote:Hi,So what is very odd here, is that I have used ovn-nbctl to delete the NBconfig, so # ovn-nbctl show # ovn-sbctl lflow-list Yet I still see /etc/openvswitch/conf.db growing with updates for Logical switch ports that no longer exist!"],["ct-zone-ln-ls_vcn9195577_external_ugw","220"],["ct-zone-ln-ls_vcn9206002_external_igw","110"],["ct-zone-ln-ls_vcn9210052_external_igw","110"],["ct-zone-ln-ls_vcn9232395_external_ugw","75"],["ct-zone-ln-ls_vcn9236987_external_igw","110"],["ct-zone-ln-ls_vcn9236987_external_ugw","78"],["ct-zone-ln-ls_vcn9255861_external_igw","118"],["ct-zone-ln-ls_vcn9255861_external_ugw","100"],["ct-zone-ln-ls_vcn9319435_external_igw","87"],["ct-zone-ln-ls_vcn9352502_external_igw","40"],["ct-zone-ln-ls_vcn9402504_external_ugw","99"],["ct-zone-ln-ls_vcn9403404_external_igw","133"],["ct-zone-ln-ls_vcn9403404_external_ugw","114"],["ct-zone-ln-ls_vcn9461566_external_ugw","191"],["ct-zone-ln-ls_vcn9480000_external_igw","254"],["ct-zone-ln-ls_vcn9480000_external_ugw","236"],["ct-zone-ln-ls_vcn9492134_external_igw","262"],["ct-zone-ln-ls_vcn9523503_external_igw","207"],["ct-zone-ln-ls_vcn9542102_external_igw","133"],["ct-zone-ln-ls_vcn9542102_external_ugw","115"],["ct-zone-ln-ls_vcn9559658_external_igw","125"],["ct-zone-ln-ls_vcn9559658_external_ugw","78"],["ct-zone-ln-ls_vcn9594034_external_igw","49"],["ct-zone-ln-ls_vcn9619021_external_igw","133"],["ct-zone-ln-ls_vcn9634773_external_igw","292"],["ct-zone-ln-ls_vcn9649169_external_igw","132"],["ct-zone-ln-ls_vcn9649169_external_ugw","110"],["ct-zone-ln-ls_vcn9661290_external_ugw","78"],["ct-zone-ln-ls_vcn9734192_external_ugw","114"],["ct-zone-ln-ls_vcn9774252_external_igw","262"],["ct-zone-ln-ls_vcn9796262_external_igw","72"],["ct-zone-ln-ls_vcn9796262_external_ugw","54"],["ct-zone-ln-ls_vcn9805903_external_igw","147"],["ct-zone-ln-ls_vcn9805903_external_ugw","126"],["ct-zone-ln-ls_vcn9809895_external_igw","246"],["ct-zone-ln-ls_vcn9812576_external_ugw","78"],["ct-zone-ln-ls_vcn9834728_external_igw","110"],["ct-zone-ln-ls_vcn9886683_external_ugw","114"],["ct-zone-ln-ls_vcn9903419_external_ugw","235"],["ct-zone-ln-ls_vcn9917510_external_igw","56"],["ct-zone-ln-ls_vcn9917510_external_ugw","38"]]]}},"_comment":"ovn-controller:modifying OVS tunnels 'pcacn001'"} A shortened version of one entry Could it be that switch ports must be deleted beforedeleting the switch? I was under the impression once a switch is deletedit's ports get deleted?Yes. If you delete the switch, the switch ports get deleted too. After deleting the logical switch (or switch ports) do you see them to be deleted by ovn-northd in SB DB ? Run - ovn-sbctl list port_binding <deleted_port> or/and ovn-sbctl list datapath_binding <deleted_lswitch>I'd suggest you enable jsonrpc debug in ovn-controller and see what's happening.It would be helpful if you can share the ovn-controller debug logs. ovn-appctl -t ovn-controller vlog/set jsonrpc:dbgSo in my test I create a simple network then delete it so NB DB and SB DB are empty. # ovn-sbctl list port_binding # ovn-sbctl list datapath_binding #The network has a number of LS's and LR's and two Distributed Router (DR) ports (on separate LRs). When I just create one DR all seems fine, but when I add the second into the mix I get a runaway openvswitch/conf.db but NOT on all chassis. I have 4 chassis that I can schedule the DR ports to. In this latest test I observed the runaway conf.db on pcacn003 & pcacn005. The logs are too large to send in email, is there an ftp serverthat I can upload to?I will redo with debug enabled and collect updated logs. The conf.db on both pcacn003 &pcacn005 is several GBs.The only way to recover is to stop the OVS/OVN procs, then delete /etc/openvswitch/conf.dband restart them. BrendanThanks NumanI'm not sure. /etc/openvswitch/conf.db is the local ovsdb-server databaseswitch 712757c3-2481-4f8b-940c-05dc13ce37a5 (ls_vcn9319435_external_ugw)port ls_vcn9319435_external_ugw-lr_vcn9319435 type: router router-port: lr_vcn9319435-ls_vcn9319435_external_ugw port ln-ls_vcn9319435_external_ugw type: localnet addresses: ["unknown"] router 80c281af-319b-416b-8a17-0ce7b8901bb1 (lr_vcn9319435) port lr_vcn9319435-ls_vcn9319435_external_ugw mac: "00:13:97:88:31:90" networks: ["253.255.80.4/16"] gateway chassis: [pcacn002 pcacn003 pcacn001] port lr_vcn9319435-lsb_vcn9319435 mac: "00:13:97:d4:26:ec" networks: ["253.255.29.2/25"] nat 6c87050f-cd27-423e-815e-deda74bd9bc6 external ip: "253.255.80.4" logical ip: "10.221.0.0/16" type: "snat" Do each port have to be deleted or is it ok to just delete the switch and router? Brendan On 25/10/2021 16:10, Brendan Doyle wrote:On 25/10/2021 15:08, Numan Siddique wrote:-4c54-b6e2-38f192eacae3_snat","12"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_dnat","10"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_snat","11"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_dnat","2"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_snat","5"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_dnat","18"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_snat","3"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_dnat","12"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_snat","11"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_dnat","10"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_snat","12"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_dnat","1"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_snat","11"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_dnat","2"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_snat","13"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_dnat","11"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_snat","12"],["ct-zone-9498aca9-762On Fri, Oct 22, 2021 at 9:30 AM Brendan Doyle <[email protected]> wrote:Hi, Looking at /etc/openvswitch/conf.db I see it getting very large: [root@pcacn001 ~]# ls -l /etc/openvswitch/conf.db -rw-r--r--. 1 root root 6069248828 Oct 22 11:55 /etc/openvswitch/conf.dbAnd has lots and lots (mostly) "ovn-controller: modifying OVS tunnels"updates entries, like below. What are these? it does not seem normal? OVSDB JSON 4687 00e8788dd5d9af2aac5ca7724759017c52ddd580{"_date":1634903752117,"Bridge":{"745726c4-0451-4f52-a52b-1f9c5e85c703":{"external_ids":["map",[["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_dnat","4"],["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_snat","1"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_dnat","4"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_snat","5"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_dnat","17"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_snat","7"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_dnat","9"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_snat","10"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_dnat","3"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_snat","4"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_dnat","19"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_snat","18"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_dnat","10"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_snat","16"],["ct-zone-423896cb-5573-4c54-b6e2-38f192eacae3_dnat","9"],["ct-zone-423896cb-55733-4ce0-a0ff-d4d5c17d7223_dnat","19"],["ct-zone-9498aca9-7623-4ce0-a0ff-d4d5c17d7223_snat","15"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_dnat","18"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_snat","17"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_dnat","13"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_snat","10"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_dnat","8"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_snat","14"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_dnat","12"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_snat","11"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_dnat","13"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_snat","10"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_dnat","9"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_snat","1"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_dnat","16"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_snat","15"],["ct-zone-cfa46699-cc79-445e-a902-f1e37ff99806_dnat","5"],["ct-zone-cfa46699-cc79-445e-a902-f1e37ff99806_snat","2"],["ct-zone-cr-lr_vcn0747157-ls_vcn0747157_external_ugw","9"],["ct-zone-cr-lr_vcn1645571_igw-ls_vcn1645571_external_igw","21"],["ct-zone-cr-lr_vcn7319607-ls_vcn7319607_external_ugw","14"],["ct-zone-cr-lr_vcn7319607_igw-ls_vcn7319607_external_igw","21"],["ct-zone-cr-lr_vcn7395327_igw-ls_vcn7395327_external_igw","21"],["ct-zone-cr-lr_vcn9567153-ls_vcn9567153_external_ugw","1"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_dnat","9"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_snat","8"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_dnat","3"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_snat","6"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_dnat","12"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_snat","13"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_dnat","15"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_snat","14"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_dnat","16"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_snat","17"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_dnat","6"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_snat","7"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_dnat","13"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_snat","11"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_dnat","20"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_snat","19"],["ct-zone-ln-ls_vcn6603036_external_ugw","7"],["ct-zone-ln-ls_vcn7319607_external_igw","20"],["ct-zone-ln-ls_vcn7395327_external_ugw","7"],["ct-zone-ln-ls_vcn7836024_external_igw","20"],["ct-zone-ln-ls_vcn9567153_external_igw","21"],["ct-zone-ln-ls_vcn9567153_external_ugw","8"]]]}},"_comment":"ovn-controller:modifying OVS tunnels 'pcacn001'"}In which OVN version are you seeing this ?ovs-vsctl -V ovs-vsctl (Open vSwitch) 2.14.0_r0.0.0 DB Schema 8.2.0 # ovn-nbctl -V ovn-nbctl 20.09.0_r1.0.0 Open vSwitch Library 2.14.0 DB Schema 5.27.0I wonder if you're seeing this issue -https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgReUsdtEU$Have to step out for a bit will look at this when I can What I can say is that we are using ovsdbapp to configure central, and I see /etc/openvswitch/conf.db getting up to several Gb! so much so that systemd times out when you try start the service using it. I am also seeing ovs-vswitchd getting a SEGV on a regular basis which I think is related. I wondering if this patch might help [External] : Re: [ovs-dev] [PATCH branch-2.14] python:idl: Avoid sending transactions when the DB is not syncedup.and not the OVN database. NumanIf you run a tail on /etc/openvswitch/conf.db, do you see the ct zone ids toggling between 2 values constantly ? Thanks NumanThanks Brendan _______________________________________________ discuss mailing list [email protected]https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgR-G4xGfo$_______________________________________________ discuss mailing list [email protected]https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!cR934SfxrIJu507dsVUIyZ7JHH9WWkNjqT4uWiSsnnfk72lkytha0jMrSq39KbktpyU$_______________________________________________ discuss mailing list [email protected]https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!aXU0ishuScB8BUBe7ocXxXDlPWZCYdhri_dfVWZN8rSI68YA6J3XGRVlo1SQy9umVfs$_______________________________________________ discuss mailing list [email protected]https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!c1HxNgHI2KosY03K_FFa5GpfOez9mAgB_8fm8G8Z-hCxG9RpSlq-pE8OO1R0lILyU-k$
logs.tar.gz
Description: GNU Zip compressed data
_______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
