Re: [openstack-dev] [tc][rally][qa] Application for a new OpenStack Program: Performance and Scalability

Jay Pipes Mon, 04 Aug 2014 09:35:24 -0700

On 08/04/2014 11:21 AM, Boris Pavlovic wrote:

Rally is quite monolithic and can't be split


I think this is one of the roots of the problem that folks like David
and Sean keep coming around to. If Rally were less monolithic, it
would be easier to say "OK, bring this piece into Tempest, have this

piece be a separate library and live in the QA program, and have theservice endpoint that allows operators to store and periodically measureSLA performance indicators against their cloud."

Incidentally, this is one of the problems that Scalr faced when applyingfor incubation, years ago, and one of the reasons the PPB at the timevoted not to incubate Scalr: it had a monolithic design that crossed toomany different lines in terms of duplicating functionality that alreadyexisted in a number of other projects.

, that is why I think Rally should be a separated program (i.e.
Rally scope is just different from QA scope). As well, It's not clear
for me, why collaboration is possible only in case of one program? In
my opinion collaboration & programs are irrelevant things.


Sure, it's certainly possible for collaboration to happen across
programs. I think what Sean is alluding to is the fact that the Tempest
and Rally communities have done little collaboration to date, and that
is worrying to him.

About collaboration between Rally & Tempest teams... Major goal of
integration Tempest in Rally is to make it simpler to use tempest on
production clouds via OpenStack API.


Plenty of folks run Tempest without Rally against production clouds as
an acceptance test platform. I see no real benefit to arguing that Rally
is for running against production clouds and Tempest is for
non-production clouds. There just isn't much of a difference there.

That said, an Operator Tools program is actually an entirely different
concept -- with a different audience and mission from the QA program. I
think you've seen here some initial support for such a proposed Operator
Tools program.

The problem I see is that Rally is not *yet* exposing the REST service
endpoint that would make it a full-fledged Operator Tool outside the
scope of its current QA focus. Once Rally does indeed publish a REST API
that exposes resource endpoints for an operator to store a set of KPIs
associated with an SLA, and allows the operator to store the run
schedule that Rally would use to go and test such metrics, *then* would
be the appropriate time to suggest that Rally be the pilot project in
this new Operator Tools program, IMO.

This work requires a lot of collaboration between teams, as you
already mention we should work on improving measuring durations and
tempest.conf generation. I fully agree that this belongs to Tempest.
By the way, Rally team is already helping with this part.

In my opinion, end result should be something like: Rally just calls
Tempest (or couple of scripts from tempest) and store results to its
DB, presenting to end user tempest functionality via OpenStack API.
To get this done, we should implement next features in tempest: 1)
Auto  tempest.conf generation 2) Production ready cleanup  - tempest
should be absolutely safe for run against cloud 3) Improvements
related to time measurement. 4) Integration of OSprofiler & Tempest.

I'm sure all of those things would be welcome additions to Tempest. Atthe same time, Rally contributors would do well to work on an initialREST API endpoint that would expose the resources I denoted above.


Best,
-jay

So in any case I would prefer to continue collaboration..

Thoughts?


Best regards, Boris Pavlovic




On Mon, Aug 4, 2014 at 4:24 PM, Sean Dague <[email protected]
<mailto:[email protected]>> wrote:

On 07/31/2014 06:55 AM, Angus Salkeld wrote:

On Sun, 2014-07-27 at 07:57 -0700, Sean Dague wrote:

On 07/26/2014 05:51 PM, Hayes, Graham wrote:

On Tue, 2014-07-22 at 12:18 -0400, Sean Dague wrote:

On 07/22/2014 11:58 AM, David Kranz wrote:

On 07/22/2014 10:44 AM, Sean Dague wrote:

Honestly, I'm really not sure I see this as a different

program, but is

really something that should be folded into the QA
program.

I feel like

a top level effort like this is going to lead to a lot
of

duplication in

the data analysis that's currently going on, as well as

functionality

for better load driver UX.

-Sean

+1 It will also lead to pointless discussions/arguments
about which activities are part of "QA" and which are part
of "Performance and Scalability Testing".


I think that those discussions will still take place, it will

just be on

a per repository basis, instead of a per program one.

[snip]


Right, 100% agreed. Rally would remain with it's own repo +

review team,

just like grenade.

-Sean


Is the concept of a separate review team not the point of a

program?


In the the thread from Designate's Incubation request Thierry

said [1]:

"Programs" just let us bless goals and teams and let them
organize code however they want, with contribution to any
code repo

under that

umbrella being considered "official" and
ATC-status-granting.


I do think that this is something that needs to be clarified
by

the TC -

Rally could not get a PTL if they were part of the QA project,

but every

time we get a program request, the same discussion happens.

I think that mission statements can be edited to fit new

programs as

they occur, and that it is more important to let teams that

have been

working closely together to stay as a distinct group.


My big concern here is that many of the things that these

efforts have

been doing are things we actually want much closer to the base.
For instance, metrics on Tempest runs.

When Rally was first created it had it's own load generator. It

took a

ton of effort to keep the team from duplicating that and instead

just

use some subset of Tempest. Then when measuring showed up, we

actually

said that is something that would be great in Tempest, so

whoever ran

it, be it for Testing, Monitoring, or Performance gathering,

would have

access to that data. But the Rally team went off in a corner and

did it

otherwise. That's caused the QA team to have to go and redo this

work

from scratch with subunit2sql, in a way that can be consumed by

multiple

efforts.

So I'm generally -1 to this being a separate effort on the basis

that so

far the team has decided to stay in their own sandbox instead of
 participating actively where many of us thing the functions

should be

added. I also think this isn't like Designate, because this isn't
intended to be part of the integrated release.


From reading Boris's email it seems like rally will provide a
horizon panel and api to back it (for the operator to kick of
performance

runs

and view stats). So this does seem like something that would be a
part of the integrated release (if I am reading things correctly).

Is the QA program happy to extend their scope to include that? QA
could become "Quality Assurance of upstream code and running
OpenStack installations". If not we need to find some other program
for rally.


I think that's realistically already the scope of the QA program, we
 might just need to change the governance wording.

Tempest has always been intended to be run on production clouds
(public or private) to ensure proper function. Many operators are
doing this today as part of normal health management. And we
continue to evolve it to be something which works well in that
environment.

All the statistics collection / analysis parts in Rally today I think
are basically things that should be part of any Tempest installation
/ run. It's cool that Rally did a bunch of work there, but having
that code outside of Tempest is sort of problematic, especially as
there are huge issues with the collection of that data because of
missing timing information in subunit. So realistically to get
accurate results there needs to be additional events added into
Tempest tests to build this correctly. If you stare at the raw
results here today they have such huge accuracy problems (due to
unaccounted for time in setupClass, which is a known problem) to the
point of being misleading, and possibly actually harmful.

These are things that are fixable, but hard to do outside of the
Tempest project itself. Exporting accurate timing / stats should be
a feature close to the test load, not something that's done
externally with guessing and fudge factors.

So every time I look at the docs in Rally -
https://github.com/stackforge/rally I see largely features that
should be coming out of the test runners themselves.

-Sean

-- Sean Dague http://dague.net


_______________________________________________ OpenStack-dev
mailing list [email protected]
<mailto:[email protected]>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




_______________________________________________ OpenStack-dev
mailing list [email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tc][rally][qa] Application for a new OpenStack Program: Performance and Scalability

Reply via email to