Hey Matt, There is a connection pool in https://github.com/boto/boto/blob/develop/boto/connection.py which could be causing issues...
-- dims On Thu, Jun 12, 2014 at 10:50 AM, Matt Riedemann <mrie...@linux.vnet.ibm.com> wrote: > > > On 6/10/2014 5:36 AM, Michael Still wrote: >> >> https://review.openstack.org/99002 adds more logging to >> nova/network/manager.py, but I think you're not going to love the >> debug log level. Was this the sort of thing you were looking for >> though? >> >> Michael >> >> On Mon, Jun 9, 2014 at 11:45 PM, Sean Dague <s...@dague.net> wrote: >>> >>> Based on some back of envelope math the gate is basically processing 2 >>> changes an hour, failing one of them. So if you want to know how long >>> the gate is, take the length / 2 in hours. >>> >>> Right now we're doing a lot of revert roulette, trying to revert things >>> that we think landed about the time things went bad. I call this >>> roulette because in many cases the actual issue isn't well understood. A >>> key reason for this is: >>> >>> *nova network is a blackhole* >>> >>> There is no work unit logging in nova-network, and no attempted >>> verification that the commands it ran did a thing. Most of these >>> failures that we don't have good understanding of are the network not >>> working under nova-network. >>> >>> So we could *really* use a volunteer or two to prioritize getting that >>> into nova-network. Without it we might manage to turn down the failure >>> rate by reverting things (or we might not) but we won't really know why, >>> and we'll likely be here again soon. >>> >>> -Sean >>> >>> -- >>> Sean Dague >>> http://dague.net >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> > > I mentioned this in the nova meeting today also but the assocated bug for > the nova-network ssh timeout issue is bug 1298472 [1]. > > My latest theory on that one is if there could be a race/network leak in the > ec2 third party tests in Tempest or something in the ec2 API in nova, > because I saw this [2] showing up in the n-net logs. My thinking is the > tests or the API are not tearing down cleanly and eventually network > resources are leaked and we start hitting those timeouts. Just a theory at > this point, but the ec2 3rd party tests do run concurrently with the > scenario tests so things could be colliding at that point, but I haven't had > time to dig into it, plus I have very little experience in those tests or > the ec2 API in nova. > > [1] https://bugs.launchpad.net/tempest/+bug/1298472 > [2] http://goo.gl/6f1dfw > > -- > > Thanks, > > Matt Riedemann > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -- Davanum Srinivas :: http://davanum.wordpress.com _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev