Hi Jon - From what I understand, while you might have gone to the trouble of configuring a lossless data centre ethernet, that guarantee of packet loss ends at the hypervisor. OVS (and other virtual switches) will drop packets rather than exert back pressure.
I saw a useful paper from IBM Zurich on developing a flow-controlled virtual switch: http://researcher.ibm.com/researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf <http://researcher.ibm.com/researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf> It’s a bit dated (2013) but may still apply. If you figure out a way of preventing this with modern OVS, I’d be very interested to know. Best wishes, Stig > On 21 Jun 2017, at 16:24, Jonathan Proulx <j...@csail.mit.edu> wrote: > > On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote: > :Are there any events going on during these outages that would cause > :reprogramming by the Neutron agent? (e.g. port updates) If not, it's likely > :an OVS issue and you might want to cross-post to the ovs-discuss mailing > :list. > > Guess I'll have to wander deeper into OVS land. > > No agent updates and nothing in ovs logs (at INFO), flipping to Debug > and there's so many messages they get dropped: > > 017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log messages in > last 0 seconds (most recently, 0 seconds ago) due to excessive rate > > /me wanders over to ovs-discuss > > Thanks, > -Jon > > :Can you check the vswitch logs during the packet loss to see if there are > :any messages indicating a reason? If that doesn't show anything and it can > :be reliably reproduced, it might be worth increasing the logging for the > :vswitch to debug. > : > : > : > :On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx <j...@csail.mit.edu> wrote: > : > :> Hi All, > :> > :> I have a very busy VM (well one of my users does I don't have access > :> but do have cooperative and copentent admin to interact with on th > :> eother end). > :> > :> At peak times it *sometimes* misses packets. I've been didding in for > :> a bit ant it looks like they get dropped in OVS land. > :> > :> The VM's main function in life is to pull down webpages from other > :> sites and analyze as requested. During peak times ( EU/US working > :> hours ) it sometimes hangs some requests and sometimes fails. > :> > :> Looking at traffic the out bound SYN request from VM is always good > :> and returning ACK always gets to physical interface of the hypervisosr > :> (on a provider vlan). > :> > :> When packets get dropped they do not make it to the qvoXXXXXXXX-XX on > :> the integration bridge. > :> > :> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping > :> from external to internal vlan-id but neither quite sure how to prove > :> that or what to do about it. > :> > :> My initial though had been to blame contrack but drops are happening > :> before the iptables rules and while there's a lot of connections on > :> this hypervisor: > :> > :> net.netfilter.nf_conntrack_count = 351880 > :> > :> There should be plent of overhead to handle: > :> > :> net.netfilter.nf_conntrack_max = 1048576 > :> > :> Anyone have thought son where to go with this? > :> > :> version details: > :> Ubuntu 14.04 > :> OpenStack Mitaka > :> ovs-vsctl (Open vSwitch) 2.5.0 > :> > :> Thanks, > :> -Jon > :> > :> -- > :> > :> _______________________________________________ > :> OpenStack-operators mailing list > :> OpenStack-operators@lists.openstack.org > :> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > :> > > -- > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
_______________________________________________ OpenStack-operators mailing list OpenStack-operators@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators