Sorry for top-posting -- using web mail client. Is it possible to change the retry interval in Cirros (or cloud-init?) so that the backoff is less than 60 seconds?
Best, -jay On Mon, Jan 20, 2014 at 10:23 AM, Darragh O'Reilly < dara2002-openst...@yahoo.com> wrote: > > I did a test to see what the dhcp client on cirros does. I killed the dhcp > agent and started an instance. The instance sent the first dhcp offer after > about 35 sec. Then another 60 sec later, and a final one after another 60 > sec. > > So a revised theory for what happened is this: > > t=0 tempest starts vm and starts polling for ACTIVE status > t=20 instance-->ACTIVE and tempest starts polling the floating ip for 60 > sec > t=40 instance does a dhcp discover - no response - so sets a timer for 60 > sec > t=45 ovs-agent sets the port vlan > t=80 tempest gives up and kills vm > t=100 instance would have sent another dhcp discover now if it had been > let live > > I think it would be worth trying to change that test to poll for 120 > seconds instead of 60. > > > On Monday, 20 January 2014, 11:23, Darragh O'Reilly < > dara2002-openst...@yahoo.com> wrote: > > Hi Salvatore, > > I presume it's this one? > > http://logs.openstack.org/38/65838/4/check/check-tempest-dsvm-neutron-isolated/d108e4a/logs/tempest.txt.gz?#_2014-01-19_20_50_14_604 > > Is it true that the cirros image just fires off a few dhcp discovers and > then gives up? If so, then maybe it did so before the tagging happened. Do > we have the instance console log? It took about 45 seconds from when the > port was created to when it was tagged. > > 2014-01-19 20:48:57.412 8142 DEBUG neutron.agent.linux.ovsdb_monitor [-] > Output received from ovsdb monitor: > {"data":[["3602a7b2-b559-4709-9bf0-53ae2af68d06","insert","tap496b808c-b5"]],"headings":["row","action","name"]} > <snip> > 2014-01-19 20:49:41.925 8142 DEBUG neutron.agent.linux.utils [-] > Command: ['sudo', '/usr/local/bin/neutron-rootwrap', > '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', 'Port', > 'tap496b808c-b5', 'tag=64'] > Exit code: 0 > > Darragh. > > >I have been seeing in the past 2 days timeout failures on gate jobs which > I > >am struggling to explain. An example is available in [1] > >These are the usual failure that we associate with bug 1253896, but this > >time I can verify that: > >- The floating IP is correctly wired (IP and NAT rules) > >- The DHCP port is correctly wired, as well as the VM port and the router > >port > >- The DHCP agent is correctly started for the network > > > >However, no DHCP DISCOVER request is sent. Only the DHCP RELEASE message > is > >seen. > >Any help at interpreting the logs will be appreciated. > > > > > >Salvatore > > > >[1] http://logs.openstack.org/38/65838 > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev