On Thu, Oct 28, 2021 at 12:21 PM Brendan Doyle <[email protected]> wrote:
>
>
> I'm also hoping that this is the reason for the frequent SEGV's we see,
> this a stacktrace that looks like:
>
> Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock
> -vconsole:emer -vsyslog:err -vfi'.
> Program terminated with signal 11, Segmentation fault.
> #0  0x00007f69b4477e84 in classifier_lookup__ (cls=0x55edff13e8b8,
>      version=version@entry=72104, flow=flow@entry=0x7f699abb2490,
>      wc=wc@entry=0x7f699abd4730,
>      allow_conjunctive_matches=allow_conjunctive_matches@entry=true)
>      at lib/classifier.c:941
>
> Full trace attached (almost 8000 frames deep!) when ovs-vswitchd SEGV's
> like this systemd
> can't restart it, as the the DB file is too large:

IMO ovs-vswitchd should not crash if the ovn-controller does something wrong.
Seems like a  bug in vswitchd.

Thanks
Numan
>
>   ls -lh /etc/openvswitch/conf.db
> -rw-r--r--. 1 root root 4.2G Oct 28 11:25 /etc/openvswitch/conf.db
>
> It just times out.
>
> Anyways we'll try the pactch and hopefully that will solve the problem.
>
>
> On 28/10/2021 16:41, Numan Siddique wrote:
> > On Thu, Oct 28, 2021 at 5:20 AM Brendan Doyle <[email protected]> 
> > wrote:
> >> Numan,
> >>
> >> Just wondering if you got  a chance to look at those logs?
> > I looked into the logs,  and as I had mentioned earlier you need this
> > fix - 
> > https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXTrxBXf8$
> >
> > Please let me know if you still see this issue with the latest OVN or
> > with the version of OVN which has this fix.
> > This fix is available from OVN 21.03 and onwards.
> >
> > Thanks
> > Numan
> >
> >> Thanks
> >>
> >> Brendan
> >>
> >> On 27/10/2021 11:25, Brendan Doyle wrote:
> >>
> >> Hi,
> >>
> >> I finally got some debug logs, truncated after the failure occurs, the 
> >> truncated entries just
> >> are repeated updates of the same entry.
> >>
> >> So some more light on this, It seems this is a timing issue. The test 
> >> being run involves
> >> creating  a number of Logical switches (LS), Routers (LR) and Distributed 
> >> Router Port
> >> gateways (DR). And then immediately deleting them, with the last created 
> >> DR being
> >> deleted first. Our CMs is using the ovsdbapp python lib to do this.
> >>
> >> So it occurs to me that perhaps the objects get created in NB, but before 
> >> they have been
> >> propagated to SB and to the HV chassis, we get the delete, and this causes 
> >> updates to
> >> be sent to the chassis for a logical port that does not exist? Just a 
> >> hypothesis.
> >>
> >> The ovn-nbctl has synchronization flags (--wait) to guard against such 
> >> behavior, does
> >> ovsdbapp I wonder?
> >>
> >> In any-case the test fails (we see a runaway conf.db) pretty regularly, 
> >> but not every time.
> >> The failure is always observed on the delete operations. If I put a delay 
> >> after create and
> >> before delete, then we don't see the failure.
> >>
> >> If anyone can shed light on this from the logs would be much appreciated.
> >>
> >> Thanks
> >>
> >> Brendan
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 26/10/2021 17:11, Brendan Doyle wrote:
> >>
> >>
> >>
> >> On 26/10/2021 15:50, Numan Siddique wrote:
> >>
> >> On Tue, Oct 26, 2021 at 8:20 AM Brendan Doyle <[email protected]> 
> >> wrote:
> >>
> >> Hi,
> >>
> >>
> >> So what is very odd here, is that I have used ovn-nbctl to delete the NB
> >> config, so
> >> # ovn-nbctl show
> >> # ovn-sbctl lflow-list
> >>
> >> Yet I still see /etc/openvswitch/conf.db growing with updates for
> >> Logical switch ports that no longer exist!
> >>
> >> "],["ct-zone-ln-ls_vcn9195577_external_ugw","220"],["ct-zone-ln-ls_vcn9206002_external_igw","110"],["ct-zone-ln-ls_vcn9210052_external_igw","110"],["ct-zone-ln-ls_vcn9232395_external_ugw","75"],["ct-zone-ln-ls_vcn9236987_external_igw","110"],["ct-zone-ln-ls_vcn9236987_external_ugw","78"],["ct-zone-ln-ls_vcn9255861_external_igw","118"],["ct-zone-ln-ls_vcn9255861_external_ugw","100"],["ct-zone-ln-ls_vcn9319435_external_igw","87"],["ct-zone-ln-ls_vcn9352502_external_igw","40"],["ct-zone-ln-ls_vcn9402504_external_ugw","99"],["ct-zone-ln-ls_vcn9403404_external_igw","133"],["ct-zone-ln-ls_vcn9403404_external_ugw","114"],["ct-zone-ln-ls_vcn9461566_external_ugw","191"],["ct-zone-ln-ls_vcn9480000_external_igw","254"],["ct-zone-ln-ls_vcn9480000_external_ugw","236"],["ct-zone-ln-ls_vcn9492134_external_igw","262"],["ct-zone-ln-ls_vcn9523503_external_igw","207"],["ct-zone-ln-ls_vcn9542102_external_igw","133"],["ct-zone-ln-ls_vcn9542102_external_ugw","115"],["ct-zone-ln-ls_vcn9559658_external
 
_igw","125"],["ct-zone-ln-ls_vcn9559658_external_ugw","78"],["ct-zone-ln-ls_vcn9594034_external_igw","49"],["ct-zone-ln-ls_vcn9619021_external_igw","133"],["ct-zone-ln-ls_vcn9634773_external_igw","292"],["ct-zone-ln-ls_vcn9649169_external_igw","132"],["ct-zone-ln-ls_vcn9649169_external_ugw","110"],["ct-zone-ln-ls_vcn9661290_external_ugw","78"],["ct-zone-ln-ls_vcn9734192_external_ugw","114"],["ct-zone-ln-ls_vcn9774252_external_igw","262"],["ct-zone-ln-ls_vcn9796262_external_igw","72"],["ct-zone-ln-ls_vcn9796262_external_ugw","54"],["ct-zone-ln-ls_vcn9805903_external_igw","147"],["ct-zone-ln-ls_vcn9805903_external_ugw","126"],["ct-zone-ln-ls_vcn9809895_external_igw","246"],["ct-zone-ln-ls_vcn9812576_external_ugw","78"],["ct-zone-ln-ls_vcn9834728_external_igw","110"],["ct-zone-ln-ls_vcn9886683_external_ugw","114"],["ct-zone-ln-ls_vcn9903419_external_ugw","235"],["ct-zone-ln-ls_vcn9917510_external_igw","56"],["ct-zone-ln-ls_vcn9917510_external_ugw","38"]]]}},"_comment":"ovn-controller:
> >> modifying OVS tunnels 'pcacn001'"}
> >>
> >> A shortened version of one entry Could it be that switch ports must be
> >> deleted before
> >> deleting the switch? I was under the impression once a switch is deleted
> >> it's ports get deleted?
> >>
> >> Yes.  If you delete the switch,  the switch ports get deleted too.
> >>
> >> After deleting the logical switch (or switch ports) do you see them to
> >> be deleted by
> >> ovn-northd in SB DB ?
> >>
> >> Run - ovn-sbctl list port_binding <deleted_port>
> >> or/and
> >>
> >> ovn-sbctl list datapath_binding <deleted_lswitch>
> >>
> >> I'd suggest you enable jsonrpc debug in ovn-controller and see what's 
> >> happening.
> >> It would be helpful if you can share the ovn-controller debug logs.
> >>
> >> ovn-appctl -t ovn-controller vlog/set jsonrpc:dbg
> >>
> >>
> >>
> >> So in my test I create a simple network then delete it so NB DB and SB DB
> >> are empty.
> >>
> >> # ovn-sbctl list port_binding
> >> # ovn-sbctl list datapath_binding
> >> #
> >>
> >> The network has a number of LS's and LR's and two Distributed Router (DR) 
> >> ports (on
> >> separate LRs).  When I just create one DR all seems fine, but when I add 
> >> the second into
> >> the mix I get a runaway openvswitch/conf.db but NOT on all chassis. I  
> >> have 4 chassis
> >> that I can schedule  the DR ports to. In this latest test I observed  the 
> >> runaway conf.db
> >> on pcacn003 & pcacn005. The logs are too large to send in email, is there 
> >> an ftp server
> >> that I can upload to?
> >>
> >> I will redo with debug  enabled and collect updated logs. The conf.db on 
> >> both pcacn003 &
> >> pcacn005 is several GBs.
> >>
> >>
> >> The only way to recover is to stop the OVS/OVN procs, then delete 
> >> /etc/openvswitch/conf.db
> >> and restart them.
> >>
> >> Brendan
> >>
> >>
> >>
> >>
> >> Thanks
> >> Numan
> >>
> >>
> >> switch 712757c3-2481-4f8b-940c-05dc13ce37a5 (ls_vcn9319435_external_ugw)
> >>        port ls_vcn9319435_external_ugw-lr_vcn9319435
> >>            type: router
> >>            router-port: lr_vcn9319435-ls_vcn9319435_external_ugw
> >>        port ln-ls_vcn9319435_external_ugw
> >>            type: localnet
> >>            addresses: ["unknown"]
> >>
> >> router 80c281af-319b-416b-8a17-0ce7b8901bb1 (lr_vcn9319435)
> >>        port lr_vcn9319435-ls_vcn9319435_external_ugw
> >>            mac: "00:13:97:88:31:90"
> >>            networks: ["253.255.80.4/16"]
> >>            gateway chassis: [pcacn002 pcacn003 pcacn001]
> >>        port lr_vcn9319435-lsb_vcn9319435
> >>            mac: "00:13:97:d4:26:ec"
> >>            networks: ["253.255.29.2/25"]
> >>        nat 6c87050f-cd27-423e-815e-deda74bd9bc6
> >>            external ip: "253.255.80.4"
> >>            logical ip: "10.221.0.0/16"
> >>            type: "snat"
> >>
> >> Do each port have to be deleted or is it ok to just delete the switch
> >> and router?
> >>
> >> Brendan
> >>
> >> On 25/10/2021 16:10, Brendan Doyle wrote:
> >>
> >>
> >> On 25/10/2021 15:08, Numan Siddique wrote:
> >>
> >> On Fri, Oct 22, 2021 at 9:30 AM Brendan Doyle
> >> <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >>
> >> Looking at /etc/openvswitch/conf.db I see it getting very large:
> >>
> >> [root@pcacn001 ~]#  ls -l /etc/openvswitch/conf.db
> >> -rw-r--r--. 1 root root 6069248828 Oct 22 11:55
> >> /etc/openvswitch/conf.db
> >>
> >> And has lots and lots (mostly)  "ovn-controller: modifying OVS tunnels"
> >> updates entries, like below.
> >> What are these? it does not seem normal?
> >> OVSDB JSON 4687 00e8788dd5d9af2aac5ca7724759017c52ddd580
> >> {"_date":1634903752117,"Bridge":{"745726c4-0451-4f52-a52b-1f9c5e85c703":{"external_ids":["map",[["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_dnat","4"],["ct-zone-0dca7370-1c18-4117-84e4-a72f277ccc6c_snat","1"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_dnat","4"],["ct-zone-11637f38-8725-4c77-adfe-f9c4c804ae8c_snat","5"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_dnat","17"],["ct-zone-1de487d1-f3a5-4b15-bae4-aa8cf794fcf9_snat","7"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_dnat","9"],["ct-zone-22c71c2a-0e59-41cc-a2da-91d3c7276c11_snat","10"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_dnat","3"],["ct-zone-3228b120-4192-476b-ab67-51fb45e786d6_snat","4"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_dnat","19"],["ct-zone-3753ff1a-d0cf-48e4-b06a-640f0467d202_snat","18"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_dnat","10"],["ct-zone-3c1c02f4-31c9-45d4-9c63-54ad2122bb15_snat","16"],["ct-zone-423896cb-5573-4c54-b6e2-38f192eacae3_dnat","9"],["ct-zone-423896cb-55
 73
> >>
> >> -4c54-b6e2-38f192eacae3_snat","12"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_dnat","10"],["ct-zone-46b7b247-31a7-4fbb-88b9-0f3db042409c_snat","11"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_dnat","2"],["ct-zone-51376927-fca0-49b3-b0ba-1aa22153b366_snat","5"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_dnat","18"],["ct-zone-58033baa-916d-47d4-bcf0-d95f7fb1f861_snat","3"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_dnat","12"],["ct-zone-5f92f974-f0dc-4820-bb43-a14cc16d851f_snat","11"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_dnat","10"],["ct-zone-87055326-0535-4042-a0ff-bf0e9f494433_snat","12"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_dnat","1"],["ct-zone-8a840bfe-118f-4041-ac72-0637d6373ffc_snat","11"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_dnat","2"],["ct-zone-8fff9b0b-0fd6-42f9-ab77-e9f1475a5d82_snat","13"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_dnat","11"],["ct-zone-913c36a1-f987-4084-9119-f279b317c72f_snat","12"],["ct-zone-9498aca9
 -762
> >>
> >> 3-4ce0-a0ff-d4d5c17d7223_dnat","19"],["ct-zone-9498aca9-7623-4ce0-a0ff-d4d5c17d7223_snat","15"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_dnat","18"],["ct-zone-9c373522-fd02-424f-a2b3-14dc359062d2_snat","17"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_dnat","13"],["ct-zone-a28b45db-2dfb-4d38-905c-c5eb44da8c9c_snat","10"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_dnat","8"],["ct-zone-b1e8636a-5cf8-48ba-9693-793a59e5430d_snat","14"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_dnat","12"],["ct-zone-bbcc6e17-ee1e-4e82-b404-1dd0f1307002_snat","11"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_dnat","13"],["ct-zone-bd3b86b7-2aba-4ff7-a5f7-975612692aca_snat","10"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_dnat","9"],["ct-zone-cb94affd-f2aa-4bdd-9407-1e16ac046596_snat","1"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_dnat","16"],["ct-zone-ce71f6db-4dab-41ca-bd10-cd6204687b9d_snat","15"],["ct-zone-cfa46699-cc79-445e-a902-f1e37ff99806_dnat","5"],["ct-zone-cfa466
 99-c
> >>
> >> c79-445e-a902-f1e37ff99806_snat","2"],["ct-zone-cr-lr_vcn0747157-ls_vcn0747157_external_ugw","9"],["ct-zone-cr-lr_vcn1645571_igw-ls_vcn1645571_external_igw","21"],["ct-zone-cr-lr_vcn7319607-ls_vcn7319607_external_ugw","14"],["ct-zone-cr-lr_vcn7319607_igw-ls_vcn7319607_external_igw","21"],["ct-zone-cr-lr_vcn7395327_igw-ls_vcn7395327_external_igw","21"],["ct-zone-cr-lr_vcn9567153-ls_vcn9567153_external_ugw","1"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_dnat","9"],["ct-zone-d0232f68-8d26-454c-87bf-e79066a1ed62_snat","8"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_dnat","3"],["ct-zone-d161aaef-e73e-452c-9d77-f465718f1f67_snat","6"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_dnat","12"],["ct-zone-e2f0a229-15b0-4255-b52d-71b078239ed2_snat","13"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_dnat","15"],["ct-zone-e6986bf4-e813-4df0-9bfe-1de95ceb2e30_snat","14"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_dnat","16"],["ct-zone-e93b7a93-8507-4036-8281-f2be764a44da_snat"
 ,"17
> >>
> >> "],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_dnat","6"],["ct-zone-f3b9843a-d498-41dc-8244-0f87d9bc1384_snat","7"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_dnat","13"],["ct-zone-f42fcb51-0af6-426f-974b-1478a169a70c_snat","11"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_dnat","20"],["ct-zone-f708c12e-34b6-4657-b7d0-4b5ac5e0d6c7_snat","19"],["ct-zone-ln-ls_vcn6603036_external_ugw","7"],["ct-zone-ln-ls_vcn7319607_external_igw","20"],["ct-zone-ln-ls_vcn7395327_external_ugw","7"],["ct-zone-ln-ls_vcn7836024_external_igw","20"],["ct-zone-ln-ls_vcn9567153_external_igw","21"],["ct-zone-ln-ls_vcn9567153_external_ugw","8"]]]}},"_comment":"ovn-controller:
> >>
> >> modifying OVS tunnels 'pcacn001'"}
> >>
> >> In which OVN version are you seeing this ?
> >>
> >> ovs-vsctl -V
> >> ovs-vsctl (Open vSwitch) 2.14.0_r0.0.0
> >> DB Schema 8.2.0
> >> # ovn-nbctl -V
> >> ovn-nbctl 20.09.0_r1.0.0
> >> Open vSwitch Library 2.14.0
> >> DB Schema 5.27.0
> >>
> >>
> >>
> >> I wonder if you're seeing this issue -
> >> https://urldefense.com/v3/__https://github.com/ovn-org/ovn/commit/e7788554a7f5e824fc0d8afc6cbf20e94fe4245f__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgReUsdtEU$
> >>
> >> Have to step out for a bit will look at this when I can
> >> What I can say is that we are using ovsdbapp to configure central, and
> >> I see /etc/openvswitch/conf.db
> >>
> >> getting up to several Gb! so much so that systemd times out when you
> >> try start the service using it.
> >> I am also seeing ovs-vswitchd getting a SEGV on a regular basis which
> >> I think is related.
> >> I wondering if this patch might help
> >>
> >> [External] : Re: [ovs-dev] [PATCH branch-2.14] python:
> >>                 idl: Avoid sending transactions when the DB is not synced
> >>                 up.
> >>
> >> I'm not sure.   /etc/openvswitch/conf.db is the local ovsdb-server database
> >> and not the OVN database.
> >>
> >> Numan
> >>
> >> If you run a tail on /etc/openvswitch/conf.db, do you see the ct zone
> >> ids toggling between 2 values constantly ?
> >>
> >> Thanks
> >> Numan
> >>
> >> Thanks
> >>
> >> Brendan
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!bwIWH-KoNwkjzx2Sw8BLj6uGXg6zeGUoB-ZG4wtzO42NUmxA95Id3NxKLRgR-G4xGfo$
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!cR934SfxrIJu507dsVUIyZ7JHH9WWkNjqT4uWiSsnnfk72lkytha0jMrSq39KbktpyU$
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!aXU0ishuScB8BUBe7ocXxXDlPWZCYdhri_dfVWZN8rSI68YA6J3XGRVlo1SQy9umVfs$
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!c1HxNgHI2KosY03K_FFa5GpfOez9mAgB_8fm8G8Z-hCxG9RpSlq-pE8OO1R0lILyU-k$
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!fD4xiCtsxdVfl4DnJx7GuPacUj3Tt3j19-f571D1i2v_sJfL7xvt0W_aJeZva9Y7nh8$
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list
> >> [email protected]
> >> https://urldefense.com/v3/__https://mail.openvswitch.org/mailman/listinfo/ovs-discuss__;!!ACWV5N9M2RV99hQ!amdtq3tQhwFCtbvjxSuF5ItzNk_07I0bBJvt5mu3lbJc-NBU5rsCp9IIullXJ6POWWk$
>
> _______________________________________________
> discuss mailing list
> [email protected]
> https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
_______________________________________________
discuss mailing list
[email protected]
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to