On Tue, Nov 30, 2021 at 12:13 PM Daniel Alvarez <[email protected]> wrote: > > Hey Christian > > > On 30 Nov 2021, at 18:06, Christian Stelter <[email protected]> wrote: > > > > > > Hi! > > > > We’re observing currently packet loss on a 3 node etcd cluster (all 3 nodes > > on different hypervisors) on one of our open stack clusters running the > > victoria release deployed via kolla-ansible. > > > > Open vSwitch Library has version 2.13.3, the ovn-controller has version > > 20.03.2 and the underlaying OS is Ubuntu 20.04 with current patches. > > > > We can reproduce the packet loss with this etcd setup in different projects > > on that cluster, but not on a second cluster (our stage env) with the same > > software versions and the same hardware components and same sizing. > > > > When we replace the default security group with a security group that uses > > the CDIR of the project network as remote security group instead of > > “default” in the ingress rule (IP v4 Any Any) the etcd cluster performs > > without packet loss/recurring leader elections. > > I am confused as the default SG will block ingress traffic in OpenStack by > default. > > As this is an OVS/OVN ML, I would suggest to share the ACLs/Logical > Flows/OpenFlows for both cases. This question, framed like this requires > OpenStack (maybe even kolla-ansible if the default SG differs from the > reference implementation) and etcd knowledge so I would advise to isolate the > traffic pattern as much as possible as well as the packet loss % and other > potentially useful data. > > > > > > Other projects or applications seem not to be impacted. At least none that > > we know of. > > > > Any hints what could cause such a behavior? We suspect it's just a symptom > > of another problem that we are currently not aware of. > >
In my opinion this could be due to an old bug in ovn-controller related to wrong conjunction id generation. Is it possible for you to test with the latest OVN version ? If not can you run the below command and see if the packet loss issue is resolved ? Run - ovn-appctl -t ovn-controller recompute. If running this command solves the issue, then it's definitely a known issue which has been fixed in the later versions. If you can confirm this works I can share the commit which fixed this issue. Thanks Numan > > Kind regards, > > > > Christian Stelter > > _______________________________________________ > > discuss mailing list > > [email protected] > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > _______________________________________________ > discuss mailing list > [email protected] > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss _______________________________________________ discuss mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
