I think you're right Darragh. It was actually Montreal's snow and cold freezing my brain as I investigated the same issue a while ago and tried to change cirrOS to send a DHCPDISCOVER every 10 seconds instead of 60 seconds, but then I moved to something else as I wasn't even sure a new centos base image could have been brought into gate tests.
I think I also sent a related email to the mailing list, suggesting to increase timeouts to a value that would ensure at least a second DHCPDISCOVER is sent by the VM. Anyway, we have a few patches which should make this failure mode less frequent. They're all -2 currently as they're always failing the gate (and I don't know why). However, from another email Sean recently sent, it seems it's a general Neutron issue. Salvatore On 20 January 2014 10:51, Darragh O'Reilly <dara2002-openst...@yahoo.com>wrote: > > On Monday, 20 January 2014, 15:33, Jay Pipes <jaypi...@gmail.com> wrote: > > >Sorry for top-posting -- using web mail client. > no worries - it doesn't bother me. > > > >Is it possible to change the retry interval in Cirros (or cloud-init?) so > that the backoff is less than 60 seconds? > I think the udhcpc command line parameters are baked into the image. It's > part of BusyBox, and I'm not even sure if it's configurable from a > script/text file. > > > >Best, > > > -jay > > > > > > > > > >On Mon, Jan 20, 2014 at 10:23 AM, Darragh O'Reilly < > dara2002-openst...@yahoo.com> wrote: > > > > > >>I did a test to see what the dhcp client on cirros does. I killed the > dhcp agent and started an instance. The instance sent the first dhcp offer > after about 35 sec. Then another 60 sec later, and a final one after > another 60 sec. > >> > >> > >>So a revised theory for what happened is this: > >> > >>t=0 tempest starts vm and starts polling for ACTIVE status > >>t=20 instance-->ACTIVE and tempest starts polling the floating ip for 60 > sec > >>t=40 instance does a dhcp discover - no response - so sets a timer for > 60 sec > >>t=45 ovs-agent sets the port vlan > >>t=80 tempest gives up and kills vm > >>t=100 instance would have sent another dhcp discover now if it had been > let live > >> > >>I think it would be worth trying to change that test to poll for 120 > seconds instead of 60. > >> > >> > >> > >>On Monday, 20 January 2014, 11:23, Darragh O'Reilly < > dara2002-openst...@yahoo.com> wrote: > >> > >>Hi Salvatore, > >>> > >>> > >>>I presume it's this one? > >>> > http://logs.openstack.org/38/65838/4/check/check-tempest-dsvm-neutron-isolated/d108e4a/logs/tempest.txt.gz?#_2014-01-19_20_50_14_604 > >>> > >>> > >>>Is it true that the cirros image just fires off a few dhcp discovers > and then gives up? If so, then maybe it did so before the tagging happened. > Do we have the instance console log? It took about 45 seconds from when the > port was created to when it was tagged. > >>> > >>> > >>>2014-01-19 20:48:57.412 8142 DEBUG neutron.agent.linux.ovsdb_monitor > [-] Output > received from ovsdb monitor: > > {"data":[["3602a7b2-b559-4709-9bf0-53ae2af68d06","insert","tap496b808c-b5"]],"headings":["row","action","name"]} > >>><snip> > >>>2014-01-19 20:49:41.925 8142 DEBUG neutron.agent.linux.utils [-] > >>>Command: > ['sudo', '/usr/local/bin/neutron-rootwrap', > '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', 'set', > 'Port', 'tap496b808c-b5', 'tag=64'] > >>>Exit code: 0 > >>> > >>> > >>>Darragh. > >>> > >>> > >>> > >>>>I have been seeing in the past 2 days timeout failures on gate jobs > which I > >>>>am struggling to explain. An example is > available in [1] > >>>>These are the usual failure that we associate with bug 1253896, but > this > >>>>time I can verify that: > >>>>- The floating IP is correctly wired (IP and NAT rules) > >>>>- The DHCP port is correctly wired, as well as the VM port and the > router > >>>>port > >>>>- The DHCP agent is correctly started for the network > >>>> > >>>>However, no DHCP DISCOVER request is sent. Only the DHCP RELEASE > message is > >>>>seen. > >>>>Any help at interpreting the logs will be appreciated. > >>>> > >>>> > >>>>Salvatore > >>>> > >>>>[1] http://logs.openstack.org/38/65838 > >>> > >>> > >>> > >>_______________________________________________ > >>OpenStack-dev mailing list > >>OpenStack-dev@lists.openstack.org > >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >> > >> > > > > > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev