Re: [openstack-dev] [oslo.config] Centralized config management
Excerpts from Nachi Ueno's message of 2014-01-10 13:42:30 -0700: > Hi Flavio, Clint > > I agree with you guys. > sorry, may be, I wasn't clear. My opinion is to remove every > configuration in the node, > and every configuration should be done by API from central resource > manager. (nova-api or neturon server etc). > > This is how to add new hosts, in cloudstack, vcenter, and openstack. > > Cloudstack: "Go to web UI, add Host/ID/PW". > http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.0.2/html/Installation_Guide/host-add.html > > vCenter: "Go to vsphere client, Host/ID/PW". > https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.solutions.doc%2FGUID-A367585C-EB0E-4CEB-B147-817C1E5E8D1D.html > > Openstack, > - Manual >- setup mysql connection config, rabbitmq/qpid connection config, > keystone config,, neturon config, > http://docs.openstack.org/havana/install-guide/install/apt/content/nova-compute.html > > We have some deployment system including chef / puppet / packstack, TripleO > - Chef/Puppet >Setup chef node >Add node/ apply role > - Packstack >- Generate answer file > > https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/2/html/Getting_Started_Guide/sect-Running_PackStack_Non-interactively.html >- packstack --install-hosts=192.168.1.0,192.168.1.1,192.168.1.2 > - TripleO >- UnderCloud >nova baremetal node add >- OverCloud >modify heat template > > For residence in this mailing list, Chef/Puppet or third party tool is > easy to use. > However, I believe they are magical tools to use for many operators. > Furthermore, these development system tend to take time to support > newest release. > so most of users, OpenStack release didn't means it can be usable for them. > > IMO, current way to manage configuration is the cause of this issue. > If we manage everything via API, we can manage cluster by horizon. > Then user can do "go to horizon, just add host". > > It may take time to migrate config to API, so one easy step is to convert > existing config for API resources. This is the purpose of this proposal. > Hi Nachi. What you've described is the vision for TripleO and Tuskar. We do not lag the release. We run CD and will be in the gate "real soon now" so that TripleO should be able to fully deploy Icehouse on Icehouse release day. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help
On 01/09/2014 04:16 PM, Russell Bryant wrote: > On 01/08/2014 05:53 PM, Joe Gordon wrote: >> Hi All, >> >> As you know the gate has been in particularly bad shape (gate queue over >> 100!) this week due to a number of factors. One factor is how many major >> outstanding bugs we have in the gate. Below is a list of the top 4 open >> gate bugs. >> >> Here are some fun facts about this list: >> * All bugs have been open for over a month >> * All are nova bugs >> * These 4 bugs alone were hit 588 times which averages to 42 hits per >> day (data is over two weeks)! >> >> If we want the gate queue to drop and not have to continuously run >> 'recheck bug x' we need to fix these bugs. So I'm looking for >> volunteers to help debug and fix these bugs. > > I created the following etherpad to help track the most important Nova > gate bugs. who is actively working on them, and any patches that we have > in flight to help address them: > > https://etherpad.openstack.org/p/nova-gate-issue-tracking > > Please jump in if you can. We shouldn't wait for the gate bug day to > move on these. Even if others are already looking at a bug, feel free > to do the same. We need multiple sets of eyes on each of these issues. > Some good progress from the last few days: After looking at a lot of failures, we determined that the vast majority of failures are performance related. The load being put on the OpenStack deployment is just too high. We're working to address this to make the gate more reliable in a number of ways. 1) (merged) https://review.openstack.org/#/c/65760/ The large-ops test was cut back from spawning 100 instances to 50. From the commit message: It turns out the variance in cloud instances is very high, especially when comparing different cloud providers and regions. This test was originally added as a regression test for the nova-network issues with rootwrap. At which time this test wouldn't pass for 30 instances. So 50 is still a valid regression test. 2) (merged) https://review.openstack.org/#/c/45766/ nova-compute is able to do work in parallel very well. nova-conductor can not by default due to the details of our use of eventlet + how we talk to MySQL. The way you allow nova-conductor to do its work in parallel is by running multiple conductor workers. We had not enabled this by default in devstack, so our 4 vCPU test nodes were only using a single conductor worker. They now use 4 conductor workers. 3) (still testing) https://review.openstack.org/#/c/65805/ Right now when tempest runs in the devstack-gate jobs, it runs with concurrency=4 (run 4 tests at once). Unfortunately, it appears that this maxes out the deployment and results in timeouts (usually network related). This patch changes tempest concurrency to 2 instead of 4. The initial results are quite promising. The tests have been passing reliably so far, but we're going to continue to recheck this for a while longer for more data. One very interesting observation on this came from Jim where he said "A quick glance suggests 1.2x -- 1.4x change in runtime." If the deployment were *not* being maxed out, we would expect this change to result in much closer to a 2x runtime increase. 4) (approved, not yet merged) https://review.openstack.org/#/c/65784/ nova-network seems to be the largest bottleneck in terms of performance problems when nova is maxed out on these test nodes. This patch is one quick speedup we can make by not using rootwrap in a few cases where it wasn't necessary. These really add up. 5) https://review.openstack.org/#/c/65989/ This patch isn't a candidate for merging, but was written to test the theory that by updating nova-network to use conductor instead of direct database access, nova-network will be able to do work in parallel better than it does today, just as we have observed with nova-compute. Dan's initial test results from this are **very** promising. Initial testing showed a 20% speedup in runtime and a 33% decrease in CPU consumption by nova-network. Doing this properly will not be quick, but I'm hopeful that we can complete it by the Icehouse release. We will need to convert nova-network to use Nova's object model. Much of this work is starting to catch nova-network up on work that we've been doing in the rest of the tree but have passed on doing for nova-network due to nova-network being in a freeze. 6) (no patch yet) We haven't had time to dive too deep into this yet, but we would also like to revisit our locking usage and how it is affecting nova-network performance. There may be some more significant improvements we can make there. Final notes: I am hopeful that by addressing these performance issues both in Nova's code, as well as by turning down the test load, that we will see a significant increase in gate reliability in the near future. I apologize on behalf of the Nova team for Nova's contribution to gate instability. *Thank you* to everyone who has been helpin
Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help
First, thanks a ton for diving in on all this Russell. The big push by the Nova team recently is really helpful. On 01/11/2014 09:57 AM, Russell Bryant wrote: On 01/09/2014 04:16 PM, Russell Bryant wrote: On 01/08/2014 05:53 PM, Joe Gordon wrote: Hi All, As you know the gate has been in particularly bad shape (gate queue over 100!) this week due to a number of factors. One factor is how many major outstanding bugs we have in the gate. Below is a list of the top 4 open gate bugs. Here are some fun facts about this list: * All bugs have been open for over a month * All are nova bugs * These 4 bugs alone were hit 588 times which averages to 42 hits per day (data is over two weeks)! If we want the gate queue to drop and not have to continuously run 'recheck bug x' we need to fix these bugs. So I'm looking for volunteers to help debug and fix these bugs. I created the following etherpad to help track the most important Nova gate bugs. who is actively working on them, and any patches that we have in flight to help address them: https://etherpad.openstack.org/p/nova-gate-issue-tracking Please jump in if you can. We shouldn't wait for the gate bug day to move on these. Even if others are already looking at a bug, feel free to do the same. We need multiple sets of eyes on each of these issues. Some good progress from the last few days: After looking at a lot of failures, we determined that the vast majority of failures are performance related. The load being put on the OpenStack deployment is just too high. We're working to address this to make the gate more reliable in a number of ways. 1) (merged) https://review.openstack.org/#/c/65760/ The large-ops test was cut back from spawning 100 instances to 50. From the commit message: It turns out the variance in cloud instances is very high, especially when comparing different cloud providers and regions. This test was originally added as a regression test for the nova-network issues with rootwrap. At which time this test wouldn't pass for 30 instances. So 50 is still a valid regression test. 2) (merged) https://review.openstack.org/#/c/45766/ nova-compute is able to do work in parallel very well. nova-conductor can not by default due to the details of our use of eventlet + how we talk to MySQL. The way you allow nova-conductor to do its work in parallel is by running multiple conductor workers. We had not enabled this by default in devstack, so our 4 vCPU test nodes were only using a single conductor worker. They now use 4 conductor workers. 3) (still testing) https://review.openstack.org/#/c/65805/ Right now when tempest runs in the devstack-gate jobs, it runs with concurrency=4 (run 4 tests at once). Unfortunately, it appears that this maxes out the deployment and results in timeouts (usually network related). This patch changes tempest concurrency to 2 instead of 4. The initial results are quite promising. The tests have been passing reliably so far, but we're going to continue to recheck this for a while longer for more data. One very interesting observation on this came from Jim where he said "A quick glance suggests 1.2x -- 1.4x change in runtime." If the deployment were *not* being maxed out, we would expect this change to result in much closer to a 2x runtime increase. We could also address this by locally turning up timeouts on operations that are timing out. Which would let those things take the time they need. Before dropping the concurrency I'd really like to make sure we can point to specific fails that we think will go away. There was a lot of speculation around nova-network, however the nova-network timeout errors only pop up on elastic search on large-ops jobs, not normal tempest jobs. Definitely making OpenStack more idle will make more tests pass. The Neutron team has experienced that. It would be a ton better if we could actually feed back a 503 with a retry time (which I realize is a ton of work). Because if we decide we're now always pinned to only 2way, we have to start doing some major rethinking on our test strategy, as we'll be way outside the soft 45min time budget we've been trying to operate on. We'd actually been planning on going up to 8way, but were waiting for some issues to get fixed before we did that. It would sort of immediately put a moratorium on new tests. If that's what we need to do, that's what we need to do, but we should talk it through. 4) (approved, not yet merged) https://review.openstack.org/#/c/65784/ nova-network seems to be the largest bottleneck in terms of performance problems when nova is maxed out on these test nodes. This patch is one quick speedup we can make by not using rootwrap in a few cases where it wasn't necessary. These really add up. 5) https://review.openstack.org/#/c/65989/ This patch isn't a candidate for merging, but was written to test the theory that by updating nova-network to use conductor instead of direct
[openstack-dev] [OpenStack-Dev][Cinder] Cinder driver maintainers/contact wiki
Hey Cinder Team! One of the things that's getting increasingly difficult as we grow the number of drivers in the tree and I try to get the driver cert initiative kicked off is rounding up an "expert" for each of the drivers in the tree. I've started a simple wiki page / matrix [1] that is designed to show the driver/vendor name and the contact info for folks that are designated managers of each of those drivers as well as any additional engineering resources that might be available. If you're a Cinder team member, and especially if you're a vendor contributing to Cinder have a look and help flush out the chart. This helps me with a number of things including: 1. Tracking down help when I'm mucking around trying to fix bugs in other peoples drivers 2. Who to contact when somebody on the team needs help understanding specifics about a driver 3. Who to assign work items to when dealing with a driver 4. Who to contact for driver cert submissions 5. Public place for folks that are implementing OpenStack to see what they're getting in for (ie does somebody from company X even participate/support this code any more) Thanks, John [1]: https://wiki.openstack.org/wiki/Cinder/driver-maintainers ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help
+1 Very interesting to read about these bottlenecks and very grateful they are being addressed. Sent from my really tiny device... > On Jan 11, 2014, at 8:44 AM, "Sean Dague" wrote: > > First, thanks a ton for diving in on all this Russell. The big push by the > Nova team recently is really helpful. > >> On 01/11/2014 09:57 AM, Russell Bryant wrote: >>> On 01/09/2014 04:16 PM, Russell Bryant wrote: On 01/08/2014 05:53 PM, Joe Gordon wrote: Hi All, As you know the gate has been in particularly bad shape (gate queue over 100!) this week due to a number of factors. One factor is how many major outstanding bugs we have in the gate. Below is a list of the top 4 open gate bugs. Here are some fun facts about this list: * All bugs have been open for over a month * All are nova bugs * These 4 bugs alone were hit 588 times which averages to 42 hits per day (data is over two weeks)! If we want the gate queue to drop and not have to continuously run 'recheck bug x' we need to fix these bugs. So I'm looking for volunteers to help debug and fix these bugs. >>> >>> I created the following etherpad to help track the most important Nova >>> gate bugs. who is actively working on them, and any patches that we have >>> in flight to help address them: >>> >>> https://etherpad.openstack.org/p/nova-gate-issue-tracking >>> >>> Please jump in if you can. We shouldn't wait for the gate bug day to >>> move on these. Even if others are already looking at a bug, feel free >>> to do the same. We need multiple sets of eyes on each of these issues. >> >> Some good progress from the last few days: >> >> After looking at a lot of failures, we determined that the vast majority >> of failures are performance related. The load being put on the >> OpenStack deployment is just too high. We're working to address this to >> make the gate more reliable in a number of ways. >> >> 1) (merged) https://review.openstack.org/#/c/65760/ >> >> The large-ops test was cut back from spawning 100 instances to 50. From >> the commit message: >> >> It turns out the variance in cloud instances is very high, especially >> when comparing different cloud providers and regions. This test was >> originally added as a regression test for the nova-network issues with >> rootwrap. At which time this test wouldn't pass for 30 instances. So >> 50 is still a valid regression test. >> >> 2) (merged) https://review.openstack.org/#/c/45766/ >> >> nova-compute is able to do work in parallel very well. nova-conductor >> can not by default due to the details of our use of eventlet + how we >> talk to MySQL. The way you allow nova-conductor to do its work in >> parallel is by running multiple conductor workers. We had not enabled >> this by default in devstack, so our 4 vCPU test nodes were only using a >> single conductor worker. They now use 4 conductor workers. >> >> 3) (still testing) https://review.openstack.org/#/c/65805/ >> >> Right now when tempest runs in the devstack-gate jobs, it runs with >> concurrency=4 (run 4 tests at once). Unfortunately, it appears that >> this maxes out the deployment and results in timeouts (usually network >> related). >> >> This patch changes tempest concurrency to 2 instead of 4. The initial >> results are quite promising. The tests have been passing reliably so >> far, but we're going to continue to recheck this for a while longer for >> more data. >> >> One very interesting observation on this came from Jim where he said "A >> quick glance suggests 1.2x -- 1.4x change in runtime." If the >> deployment were *not* being maxed out, we would expect this change to >> result in much closer to a 2x runtime increase. > > We could also address this by locally turning up timeouts on operations that > are timing out. Which would let those things take the time they need. > > Before dropping the concurrency I'd really like to make sure we can point to > specific fails that we think will go away. There was a lot of speculation > around nova-network, however the nova-network timeout errors only pop up on > elastic search on large-ops jobs, not normal tempest jobs. Definitely making > OpenStack more idle will make more tests pass. The Neutron team has > experienced that. > > It would be a ton better if we could actually feed back a 503 with a retry > time (which I realize is a ton of work). > > Because if we decide we're now always pinned to only 2way, we have to start > doing some major rethinking on our test strategy, as we'll be way outside the > soft 45min time budget we've been trying to operate on. We'd actually been > planning on going up to 8way, but were waiting for some issues to get fixed > before we did that. It would sort of immediately put a moratorium on new > tests. If that's what we need to do, that's what we need to do, but we should > talk it through. > >> 4) (approved, not yet merg
[openstack-dev] [infra] javascript templating library choice for status pages
As someone that's done a decent amount of hacking on status.html/status.js, I think we're getting to a level of complexity on our JS status pages that we should probably stop doing this all inline (probably should have stopped a while ago). I'd like to propose that we pick some javascript templating framework, and start incrementally porting bits over there over time. My current thought is - http://handlebarsjs.com/ - mostly because it's only a template library, won't cause us to do a complete rewrite, and we can move it in in parts. Other opinions are welcome. But if we get an ACK on some approach, we can then start phasing it in, vs. the current state of the art which is way too much string append. -Sean -- Sean Dague Samsung Research America s...@dague.net / sean.da...@samsung.com http://dague.net ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [QA] Changes to Tempest run_tests.sh
Hi everyone, I just wanted to bring up some changes that recently merged to tempest. As part of the tempest unit tests blueprint I converted the run_tests.sh script to execute unit tests instead of running tempest itself. This makes the run_tests.sh script consistent with the other projects to run unit tests. To run tempest I added a separate script run_tempest.sh So moving forward people who were running tempest using run_tests.sh should now use the run_tempest.sh script instead. It behaves the same way as run_test.sh did before, so there shouldn't be any change there. -Matt Treinish ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Tuskar-UI navigation
The Resources(Nodes) item that is collapsible on the left hand side in that attached wireframes is a Panel Group in the Infrastructure Dashboard. The plan is to make Panel Groups expandable/collapsible with the UI improvements. There is nothing in Horizon's implementation that prevents the Panels under Resources(Nodes) to be in separate directories. Currently, each Panel in a Dashboard is in an separate directory in the Dashboard directory. As the potential number of panels in a Dashboard grows, I see no reason to not make a subdirectory for each panel group. David > -Original Message- > From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com] > Sent: Saturday, January 11, 2014 12:50 AM > To: OpenStack Development Mailing List (not for usage questions) > Subject: [openstack-dev] [Horizon][Tuskar] Tuskar-UI navigation > > Hey all, > > I have a question regarding the development of the tuskar-ui navigation. > > So, to give some background: we are currently working off the wireframes > that Jaromir Coufal has developed: > > http://people.redhat.com/~jcoufal/openstack/tripleo/2013-12-03_tripleo- > ui_02-resources.pdf > > In these wireframes, you can see a left-hand navigation for Resources (which > we have since renamed Nodes). This > left-hand navigation includes sub-navigation for Resources: Overview, > Resource Nodes, Unallocated, etc. > > It seems like the "Horizon way" to implement this would be to create a > 'nodes/' directory within our dashboard. > We would create a tabs.py with a Tab for Overview, Resource Nodes, > Unallocated, etc, and views.py would contain > a single TabbedTableView populated by our tabs. > > However, this prevents us from using left-handed navigation. As a result, > our nodes/ directory currently appears > as such: https://github.com/openstack/tuskar- > ui/tree/master/tuskar_ui/infrastructure/nodes > > 'overview', 'resource', and 'free' are subdirectories within nodes, and they > each define their own panel.py, > enabling them to appear in the left-handed navigation. > > This leads to the following questions: > > * Would our current workaround be acceptable? Or should we follow > Horizon precedent more closely? > * I understand that a more flexible navigation system is currently under > development > (https://blueprints.launchpad.net/horizon/+spec/navigation- > enhancement) - would it be preferred that > we follow Horizon precedent until that navigation system is ready, rather > than use our own workarounds? > > Thanks in advance for any opinions! > > > Tzu-Mainn Chen > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Tuskar-UI navigation
Thanks! Just wanted to check before we went deeper into our coding. - Original Message - > The Resources(Nodes) item that is collapsible on the left hand side in that > attached wireframes is a Panel Group in the Infrastructure Dashboard. The > plan is to make Panel Groups expandable/collapsible with the UI > improvements. There is nothing in Horizon's implementation that prevents > the Panels under Resources(Nodes) to be in separate directories. Currently, > each Panel in a Dashboard is in an separate directory in the Dashboard > directory. As the potential number of panels in a Dashboard grows, I see no > reason to not make a subdirectory for each panel group. Just to be clear, we're not talking about making a subdirectory per panel group; we're talking about making a subdirectory for each panel within that panel group. We've already tested that as a solution and it works, but I guess my question was more about what Horizon standards exist around this, if any. Changing from the following. . . nodes/urls.py - contains IndexView, FreeNodesView, ResourceNodesView . . . to. . . nodes/ | + overview/urls.py - contains IndexView | + free/urls.py - contains FreeNodesView | + resource/urls.py - contains ResourcesNodesView . . . purely for the sake of navigation - seems a bit - ugly? - to me, but if it's acceptable by Horizon standards, then we're fine with it as well :) Mainn > David > > > -Original Message- > > From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com] > > Sent: Saturday, January 11, 2014 12:50 AM > > To: OpenStack Development Mailing List (not for usage questions) > > Subject: [openstack-dev] [Horizon][Tuskar] Tuskar-UI navigation > > > > Hey all, > > > > I have a question regarding the development of the tuskar-ui navigation. > > > > So, to give some background: we are currently working off the wireframes > > that Jaromir Coufal has developed: > > > > http://people.redhat.com/~jcoufal/openstack/tripleo/2013-12-03_tripleo- > > ui_02-resources.pdf > > > > In these wireframes, you can see a left-hand navigation for Resources > > (which > > we have since renamed Nodes). This > > left-hand navigation includes sub-navigation for Resources: Overview, > > Resource Nodes, Unallocated, etc. > > > > It seems like the "Horizon way" to implement this would be to create a > > 'nodes/' directory within our dashboard. > > We would create a tabs.py with a Tab for Overview, Resource Nodes, > > Unallocated, etc, and views.py would contain > > a single TabbedTableView populated by our tabs. > > > > However, this prevents us from using left-handed navigation. As a result, > > our nodes/ directory currently appears > > as such: https://github.com/openstack/tuskar- > > ui/tree/master/tuskar_ui/infrastructure/nodes > > > > 'overview', 'resource', and 'free' are subdirectories within nodes, and > > they > > each define their own panel.py, > > enabling them to appear in the left-handed navigation. > > > > This leads to the following questions: > > > > * Would our current workaround be acceptable? Or should we follow > > Horizon precedent more closely? > > * I understand that a more flexible navigation system is currently under > > development > > (https://blueprints.launchpad.net/horizon/+spec/navigation- > > enhancement) - would it be preferred that > > we follow Horizon precedent until that navigation system is ready, rather > > than use our own workarounds? > > > > Thanks in advance for any opinions! > > > > > > Tzu-Mainn Chen > > > > ___ > > OpenStack-dev mailing list > > OpenStack-dev@lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help
On 01/11/2014 11:38 AM, Sean Dague wrote: >> 3) (still testing) https://review.openstack.org/#/c/65805/ >> >> Right now when tempest runs in the devstack-gate jobs, it runs with >> concurrency=4 (run 4 tests at once). Unfortunately, it appears that >> this maxes out the deployment and results in timeouts (usually network >> related). >> >> This patch changes tempest concurrency to 2 instead of 4. The initial >> results are quite promising. The tests have been passing reliably so >> far, but we're going to continue to recheck this for a while longer for >> more data. >> >> One very interesting observation on this came from Jim where he said "A >> quick glance suggests 1.2x -- 1.4x change in runtime." If the >> deployment were *not* being maxed out, we would expect this change to >> result in much closer to a 2x runtime increase. > > We could also address this by locally turning up timeouts on operations > that are timing out. Which would let those things take the time they need. > > Before dropping the concurrency I'd really like to make sure we can > point to specific fails that we think will go away. There was a lot of > speculation around nova-network, however the nova-network timeout errors > only pop up on elastic search on large-ops jobs, not normal tempest > jobs. Definitely making OpenStack more idle will make more tests pass. > The Neutron team has experienced that. > > It would be a ton better if we could actually feed back a 503 with a > retry time (which I realize is a ton of work). > > Because if we decide we're now always pinned to only 2way, we have to > start doing some major rethinking on our test strategy, as we'll be way > outside the soft 45min time budget we've been trying to operate on. We'd > actually been planning on going up to 8way, but were waiting for some > issues to get fixed before we did that. It would sort of immediately put > a moratorium on new tests. If that's what we need to do, that's what we > need to do, but we should talk it through. I can try to write up some detailed analysis on a few failures next week to help justify it, but FWIW, when I was looking this last week, I felt like making this change was going to fix a lot more than the nova-network timeout errors. If we can already tell this is going to improve reliability, both when using nova-network and neutron, then I think that should be enough to justify it. Taking longer seems acceptable if that comes with a more acceptable pass rate. Right now I'd like to see us set concurrency=2 while we work on the more difficult performance improvements to both neutron and nova-network, and we can turn it back up later on once we're able to demonstrate that it passes reliably without failures with a root cause of test load being too high. >> 5) https://review.openstack.org/#/c/65989/ >> >> This patch isn't a candidate for merging, but was written to test the >> theory that by updating nova-network to use conductor instead of direct >> database access, nova-network will be able to do work in parallel better >> than it does today, just as we have observed with nova-compute. >> >> Dan's initial test results from this are **very** promising. Initial >> testing showed a 20% speedup in runtime and a 33% decrease in CPU >> consumption by nova-network. >> >> Doing this properly will not be quick, but I'm hopeful that we can >> complete it by the Icehouse release. We will need to convert >> nova-network to use Nova's object model. Much of this work is starting >> to catch nova-network up on work that we've been doing in the rest of >> the tree but have passed on doing for nova-network due to nova-network >> being in a freeze. > > I'm a huge +1 on fixing this in nova-network. Of course. This is just a bit of a longer term effort. -- Russell Bryant ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Tuskar-UI navigation
> -Original Message- > From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com] > Sent: Saturday, January 11, 2014 2:23 PM > To: OpenStack Development Mailing List (not for usage questions) > Subject: Re: [openstack-dev] Tuskar-UI navigation > > Thanks! Just wanted to check before we went deeper into our coding. > > - Original Message - > > The Resources(Nodes) item that is collapsible on the left hand side in that > > attached wireframes is a Panel Group in the Infrastructure Dashboard. The > > plan is to make Panel Groups expandable/collapsible with the UI > > improvements. There is nothing in Horizon's implementation that prevents > > the Panels under Resources(Nodes) to be in separate directories. > Currently, > > each Panel in a Dashboard is in an separate directory in the Dashboard > > directory. As the potential number of panels in a Dashboard grows, I see > no > > reason to not make a subdirectory for each panel group. > > Just to be clear, we're not talking about making a subdirectory per panel > group; > we're talking about making a subdirectory for each panel within that panel > group. > We've already tested that as a solution and it works, but I guess my question > was > more about what Horizon standards exist around this, if any. > > Changing from the following. . . > > nodes/urls.py - contains IndexView, FreeNodesView, ResourceNodesView > > . . . to. . . > > nodes/ > | > + overview/urls.py - contains IndexView > | > + free/urls.py - contains FreeNodesView > | > + resource/urls.py - contains ResourcesNodesView This is what I envisioned. I think it's actually cleaner and easier to navigate. > > . . . purely for the sake of navigation - seems a bit - ugly? - to me, but if > it's > acceptable by Horizon standards, then we're fine with it as well :) > > > Mainn > > > David > > > > > -Original Message- > > > From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com] > > > Sent: Saturday, January 11, 2014 12:50 AM > > > To: OpenStack Development Mailing List (not for usage questions) > > > Subject: [openstack-dev] [Horizon][Tuskar] Tuskar-UI navigation > > > > > > Hey all, > > > > > > I have a question regarding the development of the tuskar-ui navigation. > > > > > > So, to give some background: we are currently working off the > wireframes > > > that Jaromir Coufal has developed: > > > > > > http://people.redhat.com/~jcoufal/openstack/tripleo/2013-12- > 03_tripleo- > > > ui_02-resources.pdf > > > > > > In these wireframes, you can see a left-hand navigation for Resources > > > (which > > > we have since renamed Nodes). This > > > left-hand navigation includes sub-navigation for Resources: Overview, > > > Resource Nodes, Unallocated, etc. > > > > > > It seems like the "Horizon way" to implement this would be to create a > > > 'nodes/' directory within our dashboard. > > > We would create a tabs.py with a Tab for Overview, Resource Nodes, > > > Unallocated, etc, and views.py would contain > > > a single TabbedTableView populated by our tabs. > > > > > > However, this prevents us from using left-handed navigation. As a result, > > > our nodes/ directory currently appears > > > as such: https://github.com/openstack/tuskar- > > > ui/tree/master/tuskar_ui/infrastructure/nodes > > > > > > 'overview', 'resource', and 'free' are subdirectories within nodes, and > > > they > > > each define their own panel.py, > > > enabling them to appear in the left-handed navigation. > > > > > > This leads to the following questions: > > > > > > * Would our current workaround be acceptable? Or should we follow > > > Horizon precedent more closely? > > > * I understand that a more flexible navigation system is currently under > > > development > > > (https://blueprints.launchpad.net/horizon/+spec/navigation- > > > enhancement) - would it be preferred that > > > we follow Horizon precedent until that navigation system is ready, > rather > > > than use our own workarounds? > > > > > > Thanks in advance for any opinions! > > > > > > > > > Tzu-Mainn Chen > > > > > > ___ > > > OpenStack-dev mailing list > > > OpenStack-dev@lists.openstack.org > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > ___ David ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster
On Wed, Jan 8, 2014 at 10:48 PM, Matt Riedemann wrote: > Another question. This patch [1] failed turbo-hipster after it was approved > but I don't know if that's a gating or just voting job, i.e. should someone > do 'reverify migrations' on that patch or just let it sit and ignore > turbo-hipster? > > [1] https://review.openstack.org/#/c/59824/ Sorry for the slow reply, I'm at a conference this week and have been flat out. turbo-hipster is a check only, and doesn't run in gate. So, it will never respond to a "reverify" comment. Cheers, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster
On Wed, Jan 8, 2014 at 10:57 PM, Sean Dague wrote: [snip] > So instead of trying to fix the individual runs, because t-h runs pretty > fast, can you just fix it with bulk. It seems like the issue in a migration > taking a long time isn't a race in OpenStack, it's completely variability in > the underlying system. > > And it seems that the failing case is going to be 100% repeatable, and > infrequent. > > So it seems like you could solve the fail side by only reporting fail > results on 3 fails in a row: RESULT && RESULT && RESULT > > Especially valid if Results are coming from different AZs, so any local > issues should be masked. Whilst this is true, I worry about codifying flakiness in tests (as shown by the gate experience). Instead I'm working on the root causes of the flakiness. I've done some work this week on first order metrics for migration expense (IO ops per migration) instead of second order metrics (wall time), so I am hoping this will help once deployed. Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster
Please note that turbo-hipster currently has -1 voting disabled while we work through these issues. +1 voting is still enabled though. Michael On Sun, Jan 12, 2014 at 3:47 PM, Michael Still wrote: > On Wed, Jan 8, 2014 at 10:57 PM, Sean Dague wrote: > > [snip] > >> So instead of trying to fix the individual runs, because t-h runs pretty >> fast, can you just fix it with bulk. It seems like the issue in a migration >> taking a long time isn't a race in OpenStack, it's completely variability in >> the underlying system. >> >> And it seems that the failing case is going to be 100% repeatable, and >> infrequent. >> >> So it seems like you could solve the fail side by only reporting fail >> results on 3 fails in a row: RESULT && RESULT && RESULT >> >> Especially valid if Results are coming from different AZs, so any local >> issues should be masked. > > Whilst this is true, I worry about codifying flakiness in tests (as > shown by the gate experience). Instead I'm working on the root causes > of the flakiness. > > I've done some work this week on first order metrics for migration > expense (IO ops per migration) instead of second order metrics (wall > time), so I am hoping this will help once deployed. > > Michael > > -- > Rackspace Australia -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev