Hey all, I think I found a serious bug in our usage of eventlet thread local storage. Please check out this snippet [1].
This is how we use eventlet TLS in Nova and common Oslo code [2]. This could explain how [3] actually breaks TripleO devtest story and our gates. Am I right? Or I am missing something and should get some sleep? :) Thanks, Roman [1] http://paste.openstack.org/show/53686/ [2] https://github.com/openstack/nova/blob/master/nova/openstack/common/local.py#L48 [3] https://github.com/openstack/nova/commit/85332012dede96fa6729026c2a90594ea0502ac5 On Wed, Nov 20, 2013 at 5:55 PM, Derek Higgins <[email protected]> wrote: > On 20/11/13 14:21, Anita Kuno wrote: >> Thanks for posting this, Joe. It really helps to create focus so we can >> address these bugs. >> >> We are chatting in #openstack-neutron about 1251784, 1249065, and 1251448. >> >> We are looking for someone to work on 1251784 - I had mentioned it at >> Monday's Neutron team meeting and am trying to shop it around in >> -neutron now. We need someone other than Salvatore, Aaron or Maru to >> work on this since they each have at least one very important bug they >> are working on. Please join us in #openstack-neutron and lend a hand - >> all of OpenStack needs your help. > > I've been hitting this in tripleo intermittently for the last few days > (or it at least looks to be the same bug), this morning while trying to > debug the problem I noticed http request/responses happening out of > order. I've added details to the bug. > > https://bugs.launchpad.net/tripleo/+bug/1251784 > >> >> Bug 1249065 is assigned to Aaron Rosen, who isn't in the channel at the >> moment, so I don't have an update on his progress or any blockers he is >> facing. Hopefully (if you are reading this Aaron) he will join us in >> channel soon and I had hear from him about his status. >> >> Bug 1251448 is assigned to Maru Newby, who I am talking with now in >> -neutron. He is addressing the bug. I will share what information I have >> regarding this one when I have some. >> >> We are all looking forward to a more stable gate and this information >> really helps. >> >> Thanks again, Joe, >> Anita. >> >> On 11/20/2013 01:09 AM, Joe Gordon wrote: >>> Hi All, >>> >>> As many of you have noticed the gate has been in very bad shape over the >>> past few days. Here is a list of some of the top open bugs (without >>> pending patches, and many recent hits) that we are hitting. Gate won't be >>> stable, and it will be hard to get your code merged, until we fix these >>> bugs. >>> >>> 1) https://bugs.launchpad.net/bugs/1251920 >>> nova >>> 468 Hits >>> 2) https://bugs.launchpad.net/bugs/1251784 >>> neutron, Nova >>> 328 Hits >>> 3) https://bugs.launchpad.net/bugs/1249065 >>> neutron >>> 122 hits >>> 4) https://bugs.launchpad.net/bugs/1251448 >>> neutron >>> 65 Hits >>> >>> Raw Data: >>> >>> >>> Note: If a bug has any hits for anything besides failure, it means the >>> fingerprint isn't perfect. >>> >>> Elastic recheck known issues >>> Bug: https://bugs.launchpad.net/bugs/1251920 => message:"assertionerror: >>> console output was empty" AND filename:"console.html" Title: Tempest >>> failures due to failure to return console logs from an instance Project: >>> Status nova: Confirmed Hits FAILURE: 468 Bug: >>> https://bugs.launchpad.net/bugs/1251784 => message:"Connection to neutron >>> failed: Maximum attempts reached" AND filename:"logs/screen-n-cpu.txt" >>> Title: nova+neutron scheduling error: Connection to neutron failed: Maximum >>> attempts reached Project: Status neutron: New nova: New Hits FAILURE: 328 >>> UNSTABLE: 13 SUCCESS: 275 Bug: https://bugs.launchpad.net/bugs/1240256 => >>> message:" 503" AND filename:"logs/syslog.txt" AND >>> syslog_program:"proxy-server" Title: swift proxy-server returning 503 >>> during tempest run Project: Status openstack-ci: Incomplete swift: New >>> tempest: New Hits FAILURE: 136 SUCCESS: 83 >>> Pending Patch Bug: https://bugs.launchpad.net/bugs/1249065 => message:"No >>> nw_info cache associated with instance" AND >>> filename:"logs/screen-n-api.txt" Title: Tempest failure: >>> tempest/scenario/test_snapshot_pattern.py Project: Status neutron: New >>> nova: Confirmed Hits FAILURE: 122 Bug: >>> https://bugs.launchpad.net/bugs/1252514 => message:"Got error from Swift: >>> put_object" AND filename:"logs/screen-g-api.txt" Title: glance doesn't >>> recover if Swift returns an error Project: Status devstack: New glance: New >>> swift: New Hits FAILURE: 95 >>> Pending Patch Bug: https://bugs.launchpad.net/bugs/1244255 => >>> message:"NovaException: Unexpected vif_type=binding_failed" AND >>> filename:"logs/screen-n-cpu.txt" Title: binding_failed because of l2 agent >>> assumed down Project: Status neutron: Fix Committed Hits FAILURE: 92 >>> SUCCESS: 29 Bug: https://bugs.launchpad.net/bugs/1251448 => message:" >>> possible networks found, use a Network ID to be more specific. (HTTP 400)" >>> AND filename:"console.html" Title: BadRequest: Multiple possible networks >>> found, use a Network ID to be more specific. Project: Status neutron: New >>> Hits FAILURE: 65 Bug: https://bugs.launchpad.net/bugs/1239856 => >>> message:"tempest/services" AND message:"/images_client.py" AND >>> message:"wait_for_image_status" AND filename:"console.html" Title: >>> "TimeoutException: Request timed out" on >>> tempest.api.compute.images.test_list_image_filters.ListImageFiltersTestXML >>> Project: Status glance: New Hits FAILURE: 62 Bug: >>> https://bugs.launchpad.net/bugs/1235435 => message:"One or more ports have >>> an IP allocation from this subnet" AND message:" SubnetInUse: Unable to >>> complete operation on subnet" AND filename:"logs/screen-q-svc.txt" Title: >>> 'SubnetInUse: Unable to complete operation on subnet UUID. One or more >>> ports have an IP allocation from this subnet.' Project: Status neutron: >>> Incomplete nova: Fix Committed tempest: New Hits FAILURE: 48 Bug: >>> https://bugs.launchpad.net/bugs/1224001 => >>> message:"tempest.scenario.test_network_basic_ops AssertionError: Timed out >>> waiting for" AND filename:"console.html" Title: test_network_basic_ops >>> fails waiting for network to become available Project: Status neutron: In >>> Progress swift: Invalid tempest: Invalid Hits FAILURE: 42 Bug: >>> https://bugs.launchpad.net/bugs/1218391 => message:"Cannot 'createImage'" >>> AND filename:"console.html" Title: >>> tempest.api.compute.images.test_images_oneserver.ImagesOneServerTestXML.test_delete_image_that_is_not_yet_active >>> spurious failure Project: Status nova: Confirmed swift: Confirmed tempest: >>> Confirmed Hits FAILURE: 25 >>> >>> >>> >>> best, >>> Joe Gordon >>> >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> [email protected] >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
