On Thu, Oct 28, 2021 at 12:21 PM Brendan Doyle <[email protected]> wrote: > > > I'm also hoping that this is the reason for the frequent SEGV's we see, > this a stacktrace that looks like: > > Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock > -vconsole:emer -vsyslog:err -vfi'. > Program terminated with signal 11, Segmentation fault. > #0 0x00007f69b4477e84 in classifier_lookup__ (cls=0x55edff13e8b8, > version=version@entry=72104, flow=flow@entry=0x7f699abb2490, > wc=wc@entry=0x7f699abd4730, > allow_conjunctive_matches=allow_conjunctive_matches@entry=true) > at lib/classifier.c:941 > > Full trace attached (almost 8000 frames deep!) when ovs-vswitchd SEGV's > like this systemd > can't restart it, as the the DB file is too large:
IMO ovs-vswitchd should not crash if the ovn-controller does something wrong. Seems like a bug in vswitchd. Thanks Numan > > ls -lh /etc/openvswitch/conf.db > -rw-r--r--. 1 root root 4.2G Oct 28 11:25 /etc/openvswitch/conf.db > > It just times out. > > Anyways we'll try the pactch and hopefully that will solve the problem. > > > On 28/10/2021 16:41, Numan Siddique wrote: > > On Thu, Oct 28, 2021 at 5:20 AM Brendan Doyle <[email protected]> > > wrote: > >> Numan, > >> > >> Just wondering if you got a chance to look at those logs? > > I looked into the logs, and as I had mentioned earlier you need this > > fix - > > https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXTrxBXf8$ > > > > Please let me know if you still see this issue with the latest OVN or > > with the version of OVN which has this fix. > > This fix is available from OVN 21.03 and onwards. > > > > Thanks > > Numan > > > >> Thanks > >> > >> Brendan > >> > >> On 27/10/2021 11:25, Brendan Doyle wrote: > >> > >> Hi, > >> > >> I finally got some debug logs, truncated after the failure occurs, the > >> truncated entries just > >> are repeated updates of the same entry. > >> > >> So some more light on this, It seems this is a timing issue. The test > >> being run involves > >> creating a number of Logical switches (LS), Routers (LR) and Distributed > >> Router Port > >> gateways (DR). And then immediately deleting them, with the last created > >> DR being > >> deleted first. Our CMs is using the ovsdbapp python lib to do this. > >> > >> So it occurs to me that perhaps the objects get created in NB, but before > >> they have been > >> propagated to SB and to the HV chassis, we get the delete, and this causes > >> updates to > >> be sent to the chassis for a logical port that does not exist? Just a > >> hypothesis. > >> > >> The ovn-nbctl has synchronization flags (--wait) to guard against such > >> behavior, does > >> ovsdbapp I wonder? > >> > >> In any-case the test fails (we see a runaway conf.db) pretty regularly, > >> but not every time. > >> The failure is always observed on the delete operations. If I put a delay > >> after create and > >> before delete, then we don't see the failure. > >> > >> If anyone can shed light on this from the logs would be much appreciated. > >> > >> Thanks > >> > >> Brendan > >> > >> > >> > >> > >> > >> > >> > >> On 26/10/2021 17:11, Brendan Doyle wrote: > >> > >> > >> > >> On 26/10/2021 15:50, Numan Siddique wrote: > >> > >> On Tue, Oct 26, 2021 at 8:20 AM Brendan Doyle <[email protected]> > >> wrote: > >> > >> Hi, > >> > >> > >> So what is very odd here, is that I have used ovn-nbctl to delete the NB > >> config, so > >> # ovn-nbctl show > >> # ovn-sbctl lflow-list > >> > >> Yet I still see /etc/openvswitch/conf.db growing with updates for > >> Logical switch ports that no longer exist! > >> > >> "],["ct-zone-ln-ls_vcn9195577_external_ugw","220"],["ct-zone-ln-ls_vcn9206002_external_igw","110"],["ct-zone-ln-ls_vcn9210052_external_igw","110"],["ct-zone-ln-ls_vcn9232395_external_ugw","75"],["ct-zone-ln-ls_vcn9236987_external_igw","110"],["ct-zone-ln-ls_vcn9236987_external_ugw","78"],["ct-zone-ln-ls_vcn9255861_external_igw","118"],["ct-zone-ln-ls_vcn9255861_external_ugw","100"],["ct-zone-ln-ls_vcn9319435_external_igw","87"],["ct-zone-ln-ls_vcn9352502_external_igw","40"],["ct-zone-ln-ls_vcn9402504_external_ugw","99"],["ct-zone-ln-ls_vcn9403404_external_igw","133"],["ct-zone-ln-ls_vcn9403404_external_ugw","114"],["ct-zone-ln-ls_vcn9461566_external_ugw","191"],["ct-zone-ln-ls_vcn9480000_external_igw","254"],["ct-zone-ln-ls_vcn9480000_external_ugw","236"],["ct-zone-ln-ls_vcn9492134_external_igw","262"],["ct-zone-ln-ls_vcn9523503_external_igw","207"],["ct-zone-ln-ls_vcn9542102_external_igw","133"],["ct-zone-ln-ls_vcn9542102_external_ugw","115"],["ct-zone-ln-ls_vcn9559658_external _igw","125"],["ct-zone-ln-ls_vcn9559658_external_ugw","78"],["ct-zone-ln-ls_vcn9594034_external_igw","49"],["ct-zone-ln-ls_vcn9619021_external_igw","133"],["ct-zone-ln-ls_vcn9634773_external_igw","292"],["ct-zone-ln-ls_vcn9649169_external_igw","132"],["ct-zone-ln-ls_vcn9649169_external_ugw","110"],["ct-zone-ln-ls_vcn9661290_external_ugw","78"],["ct-zone-ln-ls_vcn9734192_external_ugw","114"],["ct-zone-ln-ls_vcn9774252_external_igw","262"],["ct-zone-ln-ls_vcn9796262_external_igw","72"],["ct-zone-ln-ls_vcn9796262_external_ugw","54"],["ct-zone-ln-ls_vcn9805903_external_igw","147"],["ct-zone-ln-ls_vcn9805903_external_ugw","126"],["ct-zone-ln-ls_vcn9809895_external_igw","246"],["ct-zone-ln-ls_vcn9812576_external_ugw","78"],["ct-zone-ln-ls_vcn9834728_external_igw","110"],["ct-zone-ln-ls_vcn9886683_external_ugw","114"],["ct-zone-ln-ls_vcn9903419_external_ugw","235"],["ct-zone-ln-ls_vcn9917510_external_igw","56"],["ct-zone-ln-ls_vcn9917510_external_ugw","38"]]]}},"_comment":"ovn-controller: > >> modifying OVS tunnels 'pcacn001'"} > >> > >> A shortened version of one entry Could it be that switch ports must be > >> deleted before > >> deleting the switch? I was under the impression once a switch is deleted > >> it's ports get deleted? > >> > >> Yes. If you delete the switch, the switch ports get deleted too. > >> > >> After deleting the logical switch (or switch ports) do you see them to > >> be deleted by > >> ovn-northd in SB DB ? > >> > >> Run - ovn-sbctl list port_binding <deleted_port> > >> or/and > >> > >> ovn-sbctl list datapath_binding <deleted_lswitch> > >> > >> I'd suggest you enable jsonrpc debug in ovn-controller and see what's > >> happening. > >> It would be helpful if you can share the ovn-controller debug logs. > >> > >> ovn-appctl -t ovn-controller vlog/set jsonrpc:dbg > >> > >> > >> > >> So in my test I create a simple network then delete it so NB DB and SB DB > >> are empty. > >> > >> # ovn-sbctl list port_binding > >> # ovn-sbctl list datapath_binding > >> # > >> > >> The network has a number of LS's and LR's and two Distributed Router (DR) > >> ports (on > >> separate LRs). When I just create one DR all seems fine, but when I add > >> the second into > >> the mix I get a runaway openvswitch/conf.db but NOT on all chassis. I > >> have 4 chassis > >> that I can schedule the DR ports to. In this latest test I observed the > >> runaway conf.db > >> on pcacn003 & pcacn005. The logs are too large to send in email, is there > >> an ftp server > >> that I can upload to? > >> > >> I will redo with debug enabled and collect updated logs. The conf.db on > >> both pcacn003 & > >> pcacn005 is several GBs. > >> > >> > >> The only way to recover is to stop the OVS/OVN procs, then delete > >> /etc/openvswitch/conf.db > >> and restart them. > >> > >> Brendan > >> > >> > >> > >> > >> Thanks > >> Numan > >> > >> > >> switch 712757c3-2481-4f8b-940c-05dc13ce37a5 (ls_vcn9319435_external_ugw) > >> port ls_vcn9319435_external_ugw-lr_vcn9319435 > >> type: router > >> router-port: lr_vcn9319435-ls_vcn9319435_external_ugw > >> port ln-ls_vcn9319435_external_ugw > >> type: localnet > >> addresses: ["unknown"] > >> > >> router 80c281af-319b-416b-8a17-0ce7b8901bb1 (lr_vcn9319435) > >> port lr_vcn9319435-ls_vcn9319435_external_ugw > >> mac: "00:13:97:88:31:90" > >> networks: ["253.255.80.4/16"] > >> gateway chassis: [pcacn002 pcacn003 pcacn001] > >> port lr_vcn9319435-lsb_vcn9319435 > >> mac: "00:13:97:d4:26:ec" > >> networks: ["253.255.29.2/25"] > >> nat 6c87050f-cd27-423e-815e-deda74bd9bc6 > >> external ip: "253.255.80.4" > >> logical ip: "10.221.0.0/16" > >> type: "snat" > >> > >> Do each port have to be deleted or is it ok to just delete the switch > >> and router? > >> > >> Brendan > >> > >> On 25/10/2021 16:10, Brendan Doyle wrote: > >> > >> > >> On 25/10/2021 15:08, Numan Siddique wrote: > >> > >> On Fri, Oct 22, 2021 at 9:30 AM Brendan Doyle > >> <[email protected]> wrote: > >> > >> Hi, > >> > >> > >> Looking at /etc/openvswitch/conf.db I see it getting very large: > >> > >> [root@pcacn001 ~]# ls -l /etc/openvswitch/conf.db > >> -rw-r--r--. 1 root root 6069248828 Oct 22 11:55 > >> /etc/openvswitch/conf.db > >> > >> And has lots and lots (mostly) "ovn-controller: modifying OVS tunnels" > >> updates entries, like below. > >> What are these? it does not seem normal? > >> OVSDB JSON 4687 00e8788dd5d9af2aac5ca7724759017c52ddd580 > >> {"_date":1634903752117,"Bridge":{"745726c4-0451-4f52-a52b-1f9c5e85c703":{"external_ids":["map",[["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_dnat","4"],["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_snat","1"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_dnat","4"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_snat","5"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_dnat","17"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_snat","7"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_dnat","9"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_snat","10"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_dnat","3"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_snat","4"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_dnat","19"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_snat","18"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_dnat","10"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_snat","16"],["ct-zone-423896cb-5573-4c54-b6e2-38f192eacae3_dnat","9"],["ct-zone-423896cb-55 73 > >> > >> -4c54-b6e2-38f192eacae3_snat","12"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_dnat","10"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_snat","11"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_dnat","2"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_snat","5"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_dnat","18"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_snat","3"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_dnat","12"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_snat","11"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_dnat","10"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_snat","12"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_dnat","1"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_snat","11"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_dnat","2"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_snat","13"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_dnat","11"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_snat","12"],["ct-zone-9498aca9 -762 > >> > >> 3-4ce0-a0ff-d4d5c17d7223_dnat","19"],["ct-zone-9498aca9-7623-4ce0-a0ff-d4d5c17d7223_snat","15"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_dnat","18"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_snat","17"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_dnat","13"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_snat","10"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_dnat","8"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_snat","14"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_dnat","12"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_snat","11"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_dnat","13"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_snat","10"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_dnat","9"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_snat","1"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_dnat","16"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_snat","15"],["ct-zone-cfa46699-cc79-445e-a902-f1e37ff99806_dnat","5"],["ct-zone-cfa466 99-c > >> > >> c79-445e-a902-f1e37ff99806_snat","2"],["ct-zone-cr-lr_vcn0747157-ls_vcn0747157_external_ugw","9"],["ct-zone-cr-lr_vcn1645571_igw-ls_vcn1645571_external_igw","21"],["ct-zone-cr-lr_vcn7319607-ls_vcn7319607_external_ugw","14"],["ct-zone-cr-lr_vcn7319607_igw-ls_vcn7319607_external_igw","21"],["ct-zone-cr-lr_vcn7395327_igw-ls_vcn7395327_external_igw","21"],["ct-zone-cr-lr_vcn9567153-ls_vcn9567153_external_ugw","1"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_dnat","9"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_snat","8"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_dnat","3"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_snat","6"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_dnat","12"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_snat","13"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_dnat","15"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_snat","14"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_dnat","16"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_snat" ,"17 > >> > >> "],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_dnat","6"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_snat","7"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_dnat","13"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_snat","11"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_dnat","20"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_snat","19"],["ct-zone-ln-ls_vcn6603036_external_ugw","7"],["ct-zone-ln-ls_vcn7319607_external_igw","20"],["ct-zone-ln-ls_vcn7395327_external_ugw","7"],["ct-zone-ln-ls_vcn7836024_external_igw","20"],["ct-zone-ln-ls_vcn9567153_external_igw","21"],["ct-zone-ln-ls_vcn9567153_external_ugw","8"]]]}},"_comment":"ovn-controller: > >> > >> modifying OVS tunnels 'pcacn001'"} > >> > >> In which OVN version are you seeing this ? > >> > >> ovs-vsctl -V > >> ovs-vsctl (Open vSwitch) 2.14.0_r0.0.0 > >> DB Schema 8.2.0 > >> # ovn-nbctl -V > >> ovn-nbctl 20.09.0_r1.0.0 > >> Open vSwitch Library 2.14.0 > >> DB Schema 5.27.0 > >> > >> > >> > >> I wonder if you're seeing this issue - > >> https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgReUsdtEU$ > >> > >> Have to step out for a bit will look at this when I can > >> What I can say is that we are using ovsdbapp to configure central, and > >> I see /etc/openvswitch/conf.db > >> > >> getting up to several Gb! so much so that systemd times out when you > >> try start the service using it. > >> I am also seeing ovs-vswitchd getting a SEGV on a regular basis which > >> I think is related. > >> I wondering if this patch might help > >> > >> [External] : Re: [ovs-dev] [PATCH branch-2.14] python: > >> idl: Avoid sending transactions when the DB is not synced > >> up. > >> > >> I'm not sure. /etc/openvswitch/conf.db is the local ovsdb-server database > >> and not the OVN database. > >> > >> Numan > >> > >> If you run a tail on /etc/openvswitch/conf.db, do you see the ct zone > >> ids toggling between 2 values constantly ? > >> > >> Thanks > >> Numan > >> > >> Thanks > >> > >> Brendan > >> _______________________________________________ > >> discuss mailing list > >> [email protected] > >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgR-G4xGfo$ > >> > >> _______________________________________________ > >> discuss mailing list > >> [email protected] > >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!cR934SfxrIJu507dsVUIyZ7JHH9WWkNjqT4uWiSsnnfk72lkytha0jMrSq39KbktpyU$ > >> > >> > >> _______________________________________________ > >> discuss mailing list > >> [email protected] > >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!aXU0ishuScB8BUBe7ocXxXDlPWZCYdhri_dfVWZN8rSI68YA6J3XGRVlo1SQy9umVfs$ > >> > >> > >> _______________________________________________ > >> discuss mailing list > >> [email protected] > >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!c1HxNgHI2KosY03K_FFa5GpfOez9mAgB_8fm8G8Z-hCxG9RpSlq-pE8OO1R0lILyU-k$ > >> > >> > >> > >> > >> _______________________________________________ > >> discuss mailing list > >> [email protected] > >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!fD4xiCtsxdVfl4DnJx7GuPacUj3Tt3j19-f571D1i2v_sJfL7xvt0W_aJeZva9Y7nh8$ > >> > >> > >> _______________________________________________ > >> discuss mailing list > >> [email protected] > >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXJ6POWWk$ > > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
