Re: [openstack-dev] [oslo.config] Centralized config management

2014-01-11 Thread Clint Byrum
Excerpts from Nachi Ueno's message of 2014-01-10 13:42:30 -0700:
> Hi Flavio, Clint
> 
> I agree with you guys.
> sorry, may be, I wasn't clear. My opinion is to remove every
> configuration in the node,
> and every configuration should be done by API from central resource
> manager. (nova-api or neturon server etc).
> 
> This is how to add new hosts, in cloudstack, vcenter, and openstack.
> 
> Cloudstack: "Go to web UI, add Host/ID/PW".
> http://cloudstack.apache.org/docs/en-US/Apache_CloudStack/4.0.2/html/Installation_Guide/host-add.html
> 
> vCenter: "Go to vsphere client, Host/ID/PW".
> https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vsphere.solutions.doc%2FGUID-A367585C-EB0E-4CEB-B147-817C1E5E8D1D.html
> 
> Openstack,
> - Manual
>- setup mysql connection config, rabbitmq/qpid connection config,
> keystone config,, neturon config, 
> http://docs.openstack.org/havana/install-guide/install/apt/content/nova-compute.html
> 
> We have some deployment system including chef / puppet / packstack, TripleO
> - Chef/Puppet
>Setup chef node
>Add node/ apply role
> - Packstack
>-  Generate answer file
>   
> https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux_OpenStack_Platform/2/html/Getting_Started_Guide/sect-Running_PackStack_Non-interactively.html
>-  packstack --install-hosts=192.168.1.0,192.168.1.1,192.168.1.2
> - TripleO
>- UnderCloud
>nova baremetal node add
>- OverCloud
>modify heat template
> 
> For residence in this mailing list, Chef/Puppet or third party tool is
> easy to use.
> However,  I believe they are magical tools to use for many operators.
> Furthermore, these development system tend to take time to support
> newest release.
> so most of users, OpenStack release didn't means it can be usable for them.
> 
> IMO, current way to manage configuration is the cause of this issue.
> If we manage everything via API, we can manage cluster by horizon.
> Then user can do "go to horizon, just add host".
> 
> It may take time to migrate config to API, so one easy step is to convert
> existing config for API resources. This is the purpose of this proposal.
> 

Hi Nachi. What you've described is the vision for TripleO and Tuskar. We
do not lag the release. We run CD and will be in the gate "real soon
now" so that TripleO should be able to fully deploy Icehouse on Icehouse
release day.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help

2014-01-11 Thread Russell Bryant
On 01/09/2014 04:16 PM, Russell Bryant wrote:
> On 01/08/2014 05:53 PM, Joe Gordon wrote:
>> Hi All, 
>>
>> As you know the gate has been in particularly bad shape (gate queue over
>> 100!) this week due to a number of factors. One factor is how many major
>> outstanding bugs we have in the gate.  Below is a list of the top 4 open
>> gate bugs.
>>
>> Here are some fun facts about this list:
>> * All bugs have been open for over a month
>> * All are nova bugs
>> * These 4 bugs alone were hit 588 times which averages to 42 hits per
>> day (data is over two weeks)!
>>
>> If we want the gate queue to drop and not have to continuously run
>> 'recheck bug x' we need to fix these bugs.  So I'm looking for
>> volunteers to help debug and fix these bugs.
> 
> I created the following etherpad to help track the most important Nova
> gate bugs. who is actively working on them, and any patches that we have
> in flight to help address them:
> 
>   https://etherpad.openstack.org/p/nova-gate-issue-tracking
> 
> Please jump in if you can.  We shouldn't wait for the gate bug day to
> move on these.  Even if others are already looking at a bug, feel free
> to do the same.  We need multiple sets of eyes on each of these issues.
> 

Some good progress from the last few days:

After looking at a lot of failures, we determined that the vast majority
of failures are performance related.  The load being put on the
OpenStack deployment is just too high.  We're working to address this to
make the gate more reliable in a number of ways.

1) (merged) https://review.openstack.org/#/c/65760/

The large-ops test was cut back from spawning 100 instances to 50.  From
the commit message:

  It turns out the variance in cloud instances is very high, especially
  when comparing different cloud providers and regions. This test was
  originally added as a regression test for the nova-network issues with
  rootwrap. At which time this test wouldn't pass for 30 instances.  So
  50 is still a valid regression test.

2) (merged) https://review.openstack.org/#/c/45766/

nova-compute is able to do work in parallel very well.  nova-conductor
can not by default due to the details of our use of eventlet + how we
talk to MySQL.  The way you allow nova-conductor to do its work in
parallel is by running multiple conductor workers.  We had not enabled
this by default in devstack, so our 4 vCPU test nodes were only using a
single conductor worker.  They now use 4 conductor workers.

3) (still testing) https://review.openstack.org/#/c/65805/

Right now when tempest runs in the devstack-gate jobs, it runs with
concurrency=4 (run 4 tests at once).  Unfortunately, it appears that
this maxes out the deployment and results in timeouts (usually network
related).

This patch changes tempest concurrency to 2 instead of 4.  The initial
results are quite promising.  The tests have been passing reliably so
far, but we're going to continue to recheck this for a while longer for
more data.

One very interesting observation on this came from Jim where he said "A
quick glance suggests 1.2x -- 1.4x change in runtime."  If the
deployment were *not* being maxed out, we would expect this change to
result in much closer to a 2x runtime increase.

4) (approved, not yet merged) https://review.openstack.org/#/c/65784/

nova-network seems to be the largest bottleneck in terms of performance
problems when nova is maxed out on these test nodes.  This patch is one
quick speedup we can make by not using rootwrap in a few cases where it
wasn't necessary.  These really add up.

5) https://review.openstack.org/#/c/65989/

This patch isn't a candidate for merging, but was written to test the
theory that by updating nova-network to use conductor instead of direct
database access, nova-network will be able to do work in parallel better
than it does today, just as we have observed with nova-compute.

Dan's initial test results from this are **very** promising.  Initial
testing showed a 20% speedup in runtime and a 33% decrease in CPU
consumption by nova-network.

Doing this properly will not be quick, but I'm hopeful that we can
complete it by the Icehouse release.  We will need to convert
nova-network to use Nova's object model.  Much of this work is starting
to catch nova-network up on work that we've been doing in the rest of
the tree but have passed on doing for nova-network due to nova-network
being in a freeze.

6) (no patch yet)

We haven't had time to dive too deep into this yet, but we would also
like to revisit our locking usage and how it is affecting nova-network
performance.  There may be some more significant improvements we can
make there.


Final notes:

I am hopeful that by addressing these performance issues both in Nova's
code, as well as by turning down the test load, that we will see a
significant increase in gate reliability in the near future.  I
apologize on behalf of the Nova team for Nova's contribution to gate
instability.

*Thank you* to everyone who has been helpin

Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help

2014-01-11 Thread Sean Dague
First, thanks a ton for diving in on all this Russell. The big push by 
the Nova team recently is really helpful.


On 01/11/2014 09:57 AM, Russell Bryant wrote:

On 01/09/2014 04:16 PM, Russell Bryant wrote:

On 01/08/2014 05:53 PM, Joe Gordon wrote:

Hi All,

As you know the gate has been in particularly bad shape (gate queue over
100!) this week due to a number of factors. One factor is how many major
outstanding bugs we have in the gate.  Below is a list of the top 4 open
gate bugs.

Here are some fun facts about this list:
* All bugs have been open for over a month
* All are nova bugs
* These 4 bugs alone were hit 588 times which averages to 42 hits per
day (data is over two weeks)!

If we want the gate queue to drop and not have to continuously run
'recheck bug x' we need to fix these bugs.  So I'm looking for
volunteers to help debug and fix these bugs.


I created the following etherpad to help track the most important Nova
gate bugs. who is actively working on them, and any patches that we have
in flight to help address them:

   https://etherpad.openstack.org/p/nova-gate-issue-tracking

Please jump in if you can.  We shouldn't wait for the gate bug day to
move on these.  Even if others are already looking at a bug, feel free
to do the same.  We need multiple sets of eyes on each of these issues.



Some good progress from the last few days:

After looking at a lot of failures, we determined that the vast majority
of failures are performance related.  The load being put on the
OpenStack deployment is just too high.  We're working to address this to
make the gate more reliable in a number of ways.

1) (merged) https://review.openstack.org/#/c/65760/

The large-ops test was cut back from spawning 100 instances to 50.  From
the commit message:

   It turns out the variance in cloud instances is very high, especially
   when comparing different cloud providers and regions. This test was
   originally added as a regression test for the nova-network issues with
   rootwrap. At which time this test wouldn't pass for 30 instances.  So
   50 is still a valid regression test.

2) (merged) https://review.openstack.org/#/c/45766/

nova-compute is able to do work in parallel very well.  nova-conductor
can not by default due to the details of our use of eventlet + how we
talk to MySQL.  The way you allow nova-conductor to do its work in
parallel is by running multiple conductor workers.  We had not enabled
this by default in devstack, so our 4 vCPU test nodes were only using a
single conductor worker.  They now use 4 conductor workers.

3) (still testing) https://review.openstack.org/#/c/65805/

Right now when tempest runs in the devstack-gate jobs, it runs with
concurrency=4 (run 4 tests at once).  Unfortunately, it appears that
this maxes out the deployment and results in timeouts (usually network
related).

This patch changes tempest concurrency to 2 instead of 4.  The initial
results are quite promising.  The tests have been passing reliably so
far, but we're going to continue to recheck this for a while longer for
more data.

One very interesting observation on this came from Jim where he said "A
quick glance suggests 1.2x -- 1.4x change in runtime."  If the
deployment were *not* being maxed out, we would expect this change to
result in much closer to a 2x runtime increase.


We could also address this by locally turning up timeouts on operations 
that are timing out. Which would let those things take the time they need.


Before dropping the concurrency I'd really like to make sure we can 
point to specific fails that we think will go away. There was a lot of 
speculation around nova-network, however the nova-network timeout errors 
only pop up on elastic search on large-ops jobs, not normal tempest 
jobs. Definitely making OpenStack more idle will make more tests pass. 
The Neutron team has experienced that.


It would be a ton better if we could actually feed back a 503 with a 
retry time (which I realize is a ton of work).


Because if we decide we're now always pinned to only 2way, we have to 
start doing some major rethinking on our test strategy, as we'll be way 
outside the soft 45min time budget we've been trying to operate on. We'd 
actually been planning on going up to 8way, but were waiting for some 
issues to get fixed before we did that. It would sort of immediately put 
a moratorium on new tests. If that's what we need to do, that's what we 
need to do, but we should talk it through.



4) (approved, not yet merged) https://review.openstack.org/#/c/65784/

nova-network seems to be the largest bottleneck in terms of performance
problems when nova is maxed out on these test nodes.  This patch is one
quick speedup we can make by not using rootwrap in a few cases where it
wasn't necessary.  These really add up.

5) https://review.openstack.org/#/c/65989/

This patch isn't a candidate for merging, but was written to test the
theory that by updating nova-network to use conductor instead of direct

[openstack-dev] [OpenStack-Dev][Cinder] Cinder driver maintainers/contact wiki

2014-01-11 Thread John Griffith
Hey Cinder Team!

One of the things that's getting increasingly difficult as we grow the
number of drivers in the tree and I try to get the driver cert
initiative kicked off is rounding up an "expert" for each of the
drivers in the tree.  I've started a simple wiki page / matrix [1]
that is designed to show the driver/vendor name and the contact info
for folks that are designated managers of each of those drivers as
well as any additional engineering resources that might be available.

If you're a Cinder team member, and especially if you're a vendor
contributing to Cinder have a look and help flush out the chart.  This
helps me with a number of things including:
1. Tracking down help when I'm mucking around trying to fix bugs in
other peoples drivers
2. Who to contact when somebody on the team needs help understanding
specifics about a driver
3. Who to assign work items to when dealing with a driver
4. Who to contact for driver cert submissions
5. Public place for folks that are implementing OpenStack to see what
they're getting in for (ie does somebody from company X even
participate/support this code any more)

Thanks,
John

[1]: https://wiki.openstack.org/wiki/Cinder/driver-maintainers

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help

2014-01-11 Thread Joshua Harlow
+1

Very interesting to read about these bottlenecks and very grateful they are 
being addressed.

Sent from my really tiny device...

> On Jan 11, 2014, at 8:44 AM, "Sean Dague"  wrote:
> 
> First, thanks a ton for diving in on all this Russell. The big push by the 
> Nova team recently is really helpful.
> 
>> On 01/11/2014 09:57 AM, Russell Bryant wrote:
>>> On 01/09/2014 04:16 PM, Russell Bryant wrote:
 On 01/08/2014 05:53 PM, Joe Gordon wrote:
 Hi All,
 
 As you know the gate has been in particularly bad shape (gate queue over
 100!) this week due to a number of factors. One factor is how many major
 outstanding bugs we have in the gate.  Below is a list of the top 4 open
 gate bugs.
 
 Here are some fun facts about this list:
 * All bugs have been open for over a month
 * All are nova bugs
 * These 4 bugs alone were hit 588 times which averages to 42 hits per
 day (data is over two weeks)!
 
 If we want the gate queue to drop and not have to continuously run
 'recheck bug x' we need to fix these bugs.  So I'm looking for
 volunteers to help debug and fix these bugs.
>>> 
>>> I created the following etherpad to help track the most important Nova
>>> gate bugs. who is actively working on them, and any patches that we have
>>> in flight to help address them:
>>> 
>>>   https://etherpad.openstack.org/p/nova-gate-issue-tracking
>>> 
>>> Please jump in if you can.  We shouldn't wait for the gate bug day to
>>> move on these.  Even if others are already looking at a bug, feel free
>>> to do the same.  We need multiple sets of eyes on each of these issues.
>> 
>> Some good progress from the last few days:
>> 
>> After looking at a lot of failures, we determined that the vast majority
>> of failures are performance related.  The load being put on the
>> OpenStack deployment is just too high.  We're working to address this to
>> make the gate more reliable in a number of ways.
>> 
>> 1) (merged) https://review.openstack.org/#/c/65760/
>> 
>> The large-ops test was cut back from spawning 100 instances to 50.  From
>> the commit message:
>> 
>>   It turns out the variance in cloud instances is very high, especially
>>   when comparing different cloud providers and regions. This test was
>>   originally added as a regression test for the nova-network issues with
>>   rootwrap. At which time this test wouldn't pass for 30 instances.  So
>>   50 is still a valid regression test.
>> 
>> 2) (merged) https://review.openstack.org/#/c/45766/
>> 
>> nova-compute is able to do work in parallel very well.  nova-conductor
>> can not by default due to the details of our use of eventlet + how we
>> talk to MySQL.  The way you allow nova-conductor to do its work in
>> parallel is by running multiple conductor workers.  We had not enabled
>> this by default in devstack, so our 4 vCPU test nodes were only using a
>> single conductor worker.  They now use 4 conductor workers.
>> 
>> 3) (still testing) https://review.openstack.org/#/c/65805/
>> 
>> Right now when tempest runs in the devstack-gate jobs, it runs with
>> concurrency=4 (run 4 tests at once).  Unfortunately, it appears that
>> this maxes out the deployment and results in timeouts (usually network
>> related).
>> 
>> This patch changes tempest concurrency to 2 instead of 4.  The initial
>> results are quite promising.  The tests have been passing reliably so
>> far, but we're going to continue to recheck this for a while longer for
>> more data.
>> 
>> One very interesting observation on this came from Jim where he said "A
>> quick glance suggests 1.2x -- 1.4x change in runtime."  If the
>> deployment were *not* being maxed out, we would expect this change to
>> result in much closer to a 2x runtime increase.
> 
> We could also address this by locally turning up timeouts on operations that 
> are timing out. Which would let those things take the time they need.
> 
> Before dropping the concurrency I'd really like to make sure we can point to 
> specific fails that we think will go away. There was a lot of speculation 
> around nova-network, however the nova-network timeout errors only pop up on 
> elastic search on large-ops jobs, not normal tempest jobs. Definitely making 
> OpenStack more idle will make more tests pass. The Neutron team has 
> experienced that.
> 
> It would be a ton better if we could actually feed back a 503 with a retry 
> time (which I realize is a ton of work).
> 
> Because if we decide we're now always pinned to only 2way, we have to start 
> doing some major rethinking on our test strategy, as we'll be way outside the 
> soft 45min time budget we've been trying to operate on. We'd actually been 
> planning on going up to 8way, but were waiting for some issues to get fixed 
> before we did that. It would sort of immediately put a moratorium on new 
> tests. If that's what we need to do, that's what we need to do, but we should 
> talk it through.
> 
>> 4) (approved, not yet merg

[openstack-dev] [infra] javascript templating library choice for status pages

2014-01-11 Thread Sean Dague
As someone that's done a decent amount of hacking on 
status.html/status.js, I think we're getting to a level of complexity on 
our JS status pages that we should probably stop doing this all inline 
(probably should have stopped a while ago).


I'd like to propose that we pick some javascript templating framework, 
and start incrementally porting bits over there over time.


My current thought is - http://handlebarsjs.com/ - mostly because it's 
only a template library, won't cause us to do a complete rewrite, and we 
can move it in in parts. Other opinions are welcome.


But if we get an ACK on some approach, we can then start phasing it in, 
vs. the current state of the art which is way too much string append.


-Sean

--
Sean Dague
Samsung Research America
s...@dague.net / sean.da...@samsung.com
http://dague.net

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [QA] Changes to Tempest run_tests.sh

2014-01-11 Thread Matthew Treinish
Hi everyone,

I just wanted to bring up some changes that recently merged to tempest. As part
of the tempest unit tests blueprint I converted the run_tests.sh script to
execute unit tests instead of running tempest itself. This makes the
run_tests.sh script consistent with the other projects to run unit tests. To
run tempest I added a separate script run_tempest.sh

So moving forward people who were running tempest using run_tests.sh should
now use the run_tempest.sh script instead. It behaves the same way as
run_test.sh did before, so there shouldn't be any change there.

-Matt Treinish

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Tuskar-UI navigation

2014-01-11 Thread Lyle, David
The Resources(Nodes) item that is collapsible on the left hand side in that 
attached wireframes is a Panel Group in the Infrastructure Dashboard.  The plan 
is to make Panel Groups expandable/collapsible with the UI improvements.  There 
is nothing in Horizon's implementation that prevents the Panels under 
Resources(Nodes) to be in separate directories.  Currently, each Panel in a 
Dashboard is in an separate directory in the Dashboard directory.  As the 
potential number of panels in a Dashboard grows, I see no reason to not make a 
subdirectory for each panel group.

David

> -Original Message-
> From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com]
> Sent: Saturday, January 11, 2014 12:50 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: [openstack-dev] [Horizon][Tuskar] Tuskar-UI navigation
> 
> Hey all,
> 
> I have a question regarding the development of the tuskar-ui navigation.
> 
> So, to give some background: we are currently working off the wireframes
> that Jaromir Coufal has developed:
> 
> http://people.redhat.com/~jcoufal/openstack/tripleo/2013-12-03_tripleo-
> ui_02-resources.pdf
> 
> In these wireframes, you can see a left-hand navigation for Resources (which
> we have since renamed Nodes).  This
> left-hand navigation includes sub-navigation for Resources: Overview,
> Resource Nodes, Unallocated, etc.
> 
> It seems like the "Horizon way" to implement this would be to create a
> 'nodes/' directory within our dashboard.
> We would create a tabs.py with a Tab for Overview, Resource Nodes,
> Unallocated, etc, and views.py would contain
> a single TabbedTableView populated by our tabs.
> 
> However, this prevents us from using left-handed navigation.  As a result,
> our nodes/ directory currently appears
> as such: https://github.com/openstack/tuskar-
> ui/tree/master/tuskar_ui/infrastructure/nodes
> 
> 'overview', 'resource', and 'free' are subdirectories within nodes, and they
> each define their own panel.py,
> enabling them to appear in the left-handed navigation.
> 
> This leads to the following questions:
> 
> * Would our current workaround be acceptable?  Or should we follow
> Horizon precedent more closely?
> * I understand that a more flexible navigation system is currently under
> development
>   (https://blueprints.launchpad.net/horizon/+spec/navigation-
> enhancement) - would it be preferred that
>   we follow Horizon precedent until that navigation system is ready, rather
> than use our own workarounds?
> 
> Thanks in advance for any opinions!
> 
> 
> Tzu-Mainn Chen
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Tuskar-UI navigation

2014-01-11 Thread Tzu-Mainn Chen
Thanks!  Just wanted to check before we went deeper into our coding.

- Original Message -
> The Resources(Nodes) item that is collapsible on the left hand side in that
> attached wireframes is a Panel Group in the Infrastructure Dashboard.  The
> plan is to make Panel Groups expandable/collapsible with the UI
> improvements.  There is nothing in Horizon's implementation that prevents
> the Panels under Resources(Nodes) to be in separate directories.  Currently,
> each Panel in a Dashboard is in an separate directory in the Dashboard
> directory.  As the potential number of panels in a Dashboard grows, I see no
> reason to not make a subdirectory for each panel group.

Just to be clear, we're not talking about making a subdirectory per panel group;
we're talking about making a subdirectory for each panel within that panel 
group.
We've already tested that as a solution and it works, but I guess my question 
was
more about what Horizon standards exist around this, if any.

Changing from the following. . .

nodes/urls.py - contains IndexView, FreeNodesView, ResourceNodesView

. . . to. . .

nodes/
 |
 + overview/urls.py - contains IndexView
 |
 + free/urls.py - contains FreeNodesView
 |
 + resource/urls.py - contains ResourcesNodesView

. . . purely for the sake of navigation - seems a bit - ugly? - to me, but if 
it's
acceptable by Horizon standards, then we're fine with it as well :)


Mainn

> David
> 
> > -Original Message-
> > From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com]
> > Sent: Saturday, January 11, 2014 12:50 AM
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: [openstack-dev] [Horizon][Tuskar] Tuskar-UI navigation
> > 
> > Hey all,
> > 
> > I have a question regarding the development of the tuskar-ui navigation.
> > 
> > So, to give some background: we are currently working off the wireframes
> > that Jaromir Coufal has developed:
> > 
> > http://people.redhat.com/~jcoufal/openstack/tripleo/2013-12-03_tripleo-
> > ui_02-resources.pdf
> > 
> > In these wireframes, you can see a left-hand navigation for Resources
> > (which
> > we have since renamed Nodes).  This
> > left-hand navigation includes sub-navigation for Resources: Overview,
> > Resource Nodes, Unallocated, etc.
> > 
> > It seems like the "Horizon way" to implement this would be to create a
> > 'nodes/' directory within our dashboard.
> > We would create a tabs.py with a Tab for Overview, Resource Nodes,
> > Unallocated, etc, and views.py would contain
> > a single TabbedTableView populated by our tabs.
> > 
> > However, this prevents us from using left-handed navigation.  As a result,
> > our nodes/ directory currently appears
> > as such: https://github.com/openstack/tuskar-
> > ui/tree/master/tuskar_ui/infrastructure/nodes
> > 
> > 'overview', 'resource', and 'free' are subdirectories within nodes, and
> > they
> > each define their own panel.py,
> > enabling them to appear in the left-handed navigation.
> > 
> > This leads to the following questions:
> > 
> > * Would our current workaround be acceptable?  Or should we follow
> > Horizon precedent more closely?
> > * I understand that a more flexible navigation system is currently under
> > development
> >   (https://blueprints.launchpad.net/horizon/+spec/navigation-
> > enhancement) - would it be preferred that
> >   we follow Horizon precedent until that navigation system is ready, rather
> > than use our own workarounds?
> > 
> > Thanks in advance for any opinions!
> > 
> > 
> > Tzu-Mainn Chen
> > 
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] top gate bugs: a plea for help

2014-01-11 Thread Russell Bryant
On 01/11/2014 11:38 AM, Sean Dague wrote:
>> 3) (still testing) https://review.openstack.org/#/c/65805/
>>
>> Right now when tempest runs in the devstack-gate jobs, it runs with
>> concurrency=4 (run 4 tests at once).  Unfortunately, it appears that
>> this maxes out the deployment and results in timeouts (usually network
>> related).
>>
>> This patch changes tempest concurrency to 2 instead of 4.  The initial
>> results are quite promising.  The tests have been passing reliably so
>> far, but we're going to continue to recheck this for a while longer for
>> more data.
>>
>> One very interesting observation on this came from Jim where he said "A
>> quick glance suggests 1.2x -- 1.4x change in runtime."  If the
>> deployment were *not* being maxed out, we would expect this change to
>> result in much closer to a 2x runtime increase.
> 
> We could also address this by locally turning up timeouts on operations
> that are timing out. Which would let those things take the time they need.
> 
> Before dropping the concurrency I'd really like to make sure we can
> point to specific fails that we think will go away. There was a lot of
> speculation around nova-network, however the nova-network timeout errors
> only pop up on elastic search on large-ops jobs, not normal tempest
> jobs. Definitely making OpenStack more idle will make more tests pass.
> The Neutron team has experienced that.
> 
> It would be a ton better if we could actually feed back a 503 with a
> retry time (which I realize is a ton of work).
> 
> Because if we decide we're now always pinned to only 2way, we have to
> start doing some major rethinking on our test strategy, as we'll be way
> outside the soft 45min time budget we've been trying to operate on. We'd
> actually been planning on going up to 8way, but were waiting for some
> issues to get fixed before we did that. It would sort of immediately put
> a moratorium on new tests. If that's what we need to do, that's what we
> need to do, but we should talk it through.

I can try to write up some detailed analysis on a few failures next week
to help justify it, but FWIW, when I was looking this last week, I felt
like making this change was going to fix a lot more than the
nova-network timeout errors.

If we can already tell this is going to improve reliability, both when
using nova-network and neutron, then I think that should be enough to
justify it.  Taking longer seems acceptable if that comes with a more
acceptable pass rate.

Right now I'd like to see us set concurrency=2 while we work on the more
difficult performance improvements to both neutron and nova-network, and
we can turn it back up later on once we're able to demonstrate that it
passes reliably without failures with a root cause of test load being
too high.

>> 5) https://review.openstack.org/#/c/65989/
>>
>> This patch isn't a candidate for merging, but was written to test the
>> theory that by updating nova-network to use conductor instead of direct
>> database access, nova-network will be able to do work in parallel better
>> than it does today, just as we have observed with nova-compute.
>>
>> Dan's initial test results from this are **very** promising.  Initial
>> testing showed a 20% speedup in runtime and a 33% decrease in CPU
>> consumption by nova-network.
>>
>> Doing this properly will not be quick, but I'm hopeful that we can
>> complete it by the Icehouse release.  We will need to convert
>> nova-network to use Nova's object model.  Much of this work is starting
>> to catch nova-network up on work that we've been doing in the rest of
>> the tree but have passed on doing for nova-network due to nova-network
>> being in a freeze.
> 
> I'm a huge +1 on fixing this in nova-network.

Of course.  This is just a bit of a longer term effort.

-- 
Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Tuskar-UI navigation

2014-01-11 Thread Lyle, David
> -Original Message-
> From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com]
> Sent: Saturday, January 11, 2014 2:23 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] Tuskar-UI navigation
> 
> Thanks!  Just wanted to check before we went deeper into our coding.
> 
> - Original Message -
> > The Resources(Nodes) item that is collapsible on the left hand side in that
> > attached wireframes is a Panel Group in the Infrastructure Dashboard.  The
> > plan is to make Panel Groups expandable/collapsible with the UI
> > improvements.  There is nothing in Horizon's implementation that prevents
> > the Panels under Resources(Nodes) to be in separate directories.
> Currently,
> > each Panel in a Dashboard is in an separate directory in the Dashboard
> > directory.  As the potential number of panels in a Dashboard grows, I see
> no
> > reason to not make a subdirectory for each panel group.
> 
> Just to be clear, we're not talking about making a subdirectory per panel
> group;
> we're talking about making a subdirectory for each panel within that panel
> group.
> We've already tested that as a solution and it works, but I guess my question
> was
> more about what Horizon standards exist around this, if any.
> 
> Changing from the following. . .
> 
> nodes/urls.py - contains IndexView, FreeNodesView, ResourceNodesView
> 
> . . . to. . .
> 
> nodes/
>  |
>  + overview/urls.py - contains IndexView
>  |
>  + free/urls.py - contains FreeNodesView
>  |
>  + resource/urls.py - contains ResourcesNodesView

This is what I envisioned.  I think it's actually cleaner and easier to 
navigate.

> 
> . . . purely for the sake of navigation - seems a bit - ugly? - to me, but if 
> it's
> acceptable by Horizon standards, then we're fine with it as well :)
> 
> 
> Mainn
> 
> > David
> >
> > > -Original Message-
> > > From: Tzu-Mainn Chen [mailto:tzuma...@redhat.com]
> > > Sent: Saturday, January 11, 2014 12:50 AM
> > > To: OpenStack Development Mailing List (not for usage questions)
> > > Subject: [openstack-dev] [Horizon][Tuskar] Tuskar-UI navigation
> > >
> > > Hey all,
> > >
> > > I have a question regarding the development of the tuskar-ui navigation.
> > >
> > > So, to give some background: we are currently working off the
> wireframes
> > > that Jaromir Coufal has developed:
> > >
> > > http://people.redhat.com/~jcoufal/openstack/tripleo/2013-12-
> 03_tripleo-
> > > ui_02-resources.pdf
> > >
> > > In these wireframes, you can see a left-hand navigation for Resources
> > > (which
> > > we have since renamed Nodes).  This
> > > left-hand navigation includes sub-navigation for Resources: Overview,
> > > Resource Nodes, Unallocated, etc.
> > >
> > > It seems like the "Horizon way" to implement this would be to create a
> > > 'nodes/' directory within our dashboard.
> > > We would create a tabs.py with a Tab for Overview, Resource Nodes,
> > > Unallocated, etc, and views.py would contain
> > > a single TabbedTableView populated by our tabs.
> > >
> > > However, this prevents us from using left-handed navigation.  As a result,
> > > our nodes/ directory currently appears
> > > as such: https://github.com/openstack/tuskar-
> > > ui/tree/master/tuskar_ui/infrastructure/nodes
> > >
> > > 'overview', 'resource', and 'free' are subdirectories within nodes, and
> > > they
> > > each define their own panel.py,
> > > enabling them to appear in the left-handed navigation.
> > >
> > > This leads to the following questions:
> > >
> > > * Would our current workaround be acceptable?  Or should we follow
> > > Horizon precedent more closely?
> > > * I understand that a more flexible navigation system is currently under
> > > development
> > >   (https://blueprints.launchpad.net/horizon/+spec/navigation-
> > > enhancement) - would it be preferred that
> > >   we follow Horizon precedent until that navigation system is ready,
> rather
> > > than use our own workarounds?
> > >
> > > Thanks in advance for any opinions!
> > >
> > >
> > > Tzu-Mainn Chen
> > >
> > > ___
> > > OpenStack-dev mailing list
> > > OpenStack-dev@lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > ___

David


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster

2014-01-11 Thread Michael Still
On Wed, Jan 8, 2014 at 10:48 PM, Matt Riedemann
 wrote:

> Another question.  This patch [1] failed turbo-hipster after it was approved
> but I don't know if that's a gating or just voting job, i.e. should someone
> do 'reverify migrations' on that patch or just let it sit and ignore
> turbo-hipster?
>
> [1] https://review.openstack.org/#/c/59824/

Sorry for the slow reply, I'm at a conference this week and have been
flat out. turbo-hipster is a check only, and doesn't run in gate. So,
it will never respond to a "reverify" comment.

Cheers,
Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster

2014-01-11 Thread Michael Still
On Wed, Jan 8, 2014 at 10:57 PM, Sean Dague  wrote:

[snip]

> So instead of trying to fix the individual runs, because t-h runs pretty
> fast, can you just fix it with bulk. It seems like the issue in a migration
> taking a long time isn't a race in OpenStack, it's completely variability in
> the underlying system.
>
> And it seems that the failing case is going to be 100% repeatable, and
> infrequent.
>
> So it seems like you could solve the fail side by only reporting fail
> results on 3 fails in a row: RESULT && RESULT && RESULT
>
> Especially valid if Results are coming from different AZs, so any local
> issues should be masked.

Whilst this is true, I worry about codifying flakiness in tests (as
shown by the gate experience). Instead I'm working on the root causes
of the flakiness.

I've done some work this week on first order metrics for migration
expense (IO ops per migration) instead of second order metrics (wall
time), so I am hoping this will help once deployed.

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Bogus -1 scores from turbo hipster

2014-01-11 Thread Michael Still
Please note that turbo-hipster currently has -1 voting disabled while
we work through these issues. +1 voting is still enabled though.

Michael

On Sun, Jan 12, 2014 at 3:47 PM, Michael Still  wrote:
> On Wed, Jan 8, 2014 at 10:57 PM, Sean Dague  wrote:
>
> [snip]
>
>> So instead of trying to fix the individual runs, because t-h runs pretty
>> fast, can you just fix it with bulk. It seems like the issue in a migration
>> taking a long time isn't a race in OpenStack, it's completely variability in
>> the underlying system.
>>
>> And it seems that the failing case is going to be 100% repeatable, and
>> infrequent.
>>
>> So it seems like you could solve the fail side by only reporting fail
>> results on 3 fails in a row: RESULT && RESULT && RESULT
>>
>> Especially valid if Results are coming from different AZs, so any local
>> issues should be masked.
>
> Whilst this is true, I worry about codifying flakiness in tests (as
> shown by the gate experience). Instead I'm working on the root causes
> of the flakiness.
>
> I've done some work this week on first order metrics for migration
> expense (IO ops per migration) instead of second order metrics (wall
> time), so I am hoping this will help once deployed.
>
> Michael
>
> --
> Rackspace Australia



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev