Re: [openstack-dev] [all] The future of the integrated release

2014-08-08 Thread Devananda van der Veen
On Fri, Aug 8, 2014 at 2:06 AM, Thierry Carrez  wrote:
>
> Michael Still wrote:
> > [...] I think an implied side effect of
> > the runway system is that nova-drivers would -2 blueprint reviews
> > which were not occupying a slot.
> >
> > (If we start doing more -2's I think we will need to explore how to
> > not block on someone with -2's taking a vacation. Some sort of role
> > account perhaps).
>
> Ideally CodeReview-2s should be kept for blocking code reviews on
> technical grounds, not procedural grounds. For example it always feels
> weird to CodeReview-2 all feature patch reviews on Feature Freeze day --
> that CodeReview-2 really doesn't have the same meaning as a traditional
> CodeReview-2.
>
> For those "procedural blocks" (feature freeze, waiting for runway
> room...), it might be interesting to introduce a specific score
> (Workflow-2 perhaps) that drivers could set. That would not prevent code
> review from happening, that would just clearly express that this is not
> ready to land for release cycle / organizational reasons.
>
> Thoughts?
>

+1

In addition to distinguishing between procedural and technical blocks, this
sounds like it will also solve the current problem when a core
reviewer has gone on
vacation after blocking something for procedural reasons.

-Deva

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-08 Thread Devananda van der Veen
On Tue, Aug 5, 2014 at 9:03 AM, Thierry Carrez  wrote:
> We seem to be unable to address some key issues in the software we
> produce, and part of it is due to strategic contributors (and core
> reviewers) being overwhelmed just trying to stay afloat of what's
> happening. For such projects, is it time for a pause ? Is it time to
> define key cycle goals and defer everything else ?

I think it's quite reasonable for a project to set aside some time to
focus on stability, whether that is a whole release cycle or just a
milestone. However, I think your question here is more about
OpenStack-wide issues, and how we enco^D^D^D^D whether we can require
integrated projects that are seen as having gate-affecting instability
to pause and address that.

> On the integrated release side, "more projects" means stretching our
> limited strategic resources more. Is it time for the Technical Committee
> to more aggressively define what is "in" and what is "out" ? If we go
> through such a redefinition, shall we push currently-integrated projects
> that fail to match that definition out of the "integrated release" inner
> circle ?

"The integrated release" is an overloaded term at the moment. Outside
of the developer community, I see it often cited as a blessing of a
project's legitimacy and production-worthiness. While I feel that a
non-production-ready project should not be "in the integrated
release", this has not been upheld as a litmus test for integration in
the past. Also, this does not imply that non-integrated projects
should not be used in production. I've lost track of how many times
I've heard someone say, "Why would I deploy Ironic when it hasn't
graduated yet."

Integration is foremost an artifact of our testing and development
processes -- an indication that a project has been following the
release cadence, adheres to cross-project norms, is ready for
cogating, and can be counted on to produce timely and stable builds at
release time. This can plainly be seen by looking at the criteria for
incubation and integration in our governance repo [1]. As written
today, this does not have anything to do with the technical merit or
production-worthiness of a project. It also does not have anything to
do with what "layer" the project sits at -- whether it is IaaS, PaaS,
or SaaS does not dictate whether it can be integrated.

The TC has begun to scrutinize new projects more carefully on
technical grounds, particularly since some recently-integrated
projects have run into scaling limitations that have hampered their
adoption. I believe this sort of technical guidance (or curation, if
you will) is an essential function of the TC. We've learned that
integration bestows legitimacy, as well as assumptions of stability,
performance, and scalability, upon projects which are integrated.
While that wasn't the intent, I think we need to accept that that is
where we stand. We will be faced with some hard decisions regarding
projects, both incubated and already integrated, which are clearly not
meeting those expectations today.

-Devananda


[1] 
http://git.openstack.org/cgit/openstack/governance/tree/reference/incubation-integration-requirements.rst

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-08 Thread Devananda van der Veen
On Tue, Aug 5, 2014 at 10:02 AM, Monty Taylor  wrote:
> On 08/05/2014 09:03 AM, Thierry Carrez wrote:
>>
>> Hi everyone,
>>
>> With the incredible growth of OpenStack, our development community is
>> facing complex challenges. How we handle those might determine the
>> ultimate success or failure of OpenStack.
>>
>> With this cycle we hit new limits in our processes, tools and cultural
>> setup. This resulted in new limiting factors on our overall velocity,
>> which is frustrating for developers. This resulted in the burnout of key
>> firefighting resources. This resulted in tension between people who try
>> to get specific work done and people who try to keep a handle on the big
>> picture.
>>
>> It all boils down to an imbalance between strategic and tactical
>> contributions. At the beginning of this project, we had a strong inner
>> group of people dedicated to fixing all loose ends. Then a lot of
>> companies got interested in OpenStack and there was a surge in tactical,
>> short-term contributions. We put on a call for more resources to be
>> dedicated to strategic contributions like critical bugfixing,
>> vulnerability management, QA, infrastructure... and that call was
>> answered by a lot of companies that are now key members of the OpenStack
>> Foundation, and all was fine again. But OpenStack contributors kept on
>> growing, and we grew the narrowly-focused population way faster than the
>> cross-project population.
>>
>> At the same time, we kept on adding new projects to incubation and to
>> the integrated release, which is great... but the new developers you get
>> on board with this are much more likely to be tactical than strategic
>> contributors. This also contributed to the imbalance. The penalty for
>> that imbalance is twofold: we don't have enough resources available to
>> solve old, known OpenStack-wide issues; but we also don't have enough
>> resources to identify and fix new issues.
>>
>> We have several efforts under way, like calling for new strategic
>> contributors, driving towards in-project functional testing, making
>> solving rare issues a more attractive endeavor, or hiring resources
>> directly at the Foundation level to help address those. But there is a
>> topic we haven't raised yet: should we concentrate on fixing what is
>> currently in the integrated release rather than adding new projects ?
>>
>> We seem to be unable to address some key issues in the software we
>> produce, and part of it is due to strategic contributors (and core
>> reviewers) being overwhelmed just trying to stay afloat of what's
>> happening. For such projects, is it time for a pause ? Is it time to
>> define key cycle goals and defer everything else ?
>>
>> On the integrated release side, "more projects" means stretching our
>> limited strategic resources more. Is it time for the Technical Committee
>> to more aggressively define what is "in" and what is "out" ? If we go
>> through such a redefinition, shall we push currently-integrated projects
>> that fail to match that definition out of the "integrated release" inner
>> circle ?
>>
>> The TC discussion on what the integrated release should or should not
>> include has always been informally going on. Some people would like to
>> strictly limit to end-user-facing projects. Some others suggest that
>> "OpenStack" should just be about integrating/exposing/scaling smart
>> functionality that lives in specialized external projects, rather than
>> trying to outsmart those by writing our own implementation. Some others
>> are advocates of carefully moving up the stack, and to resist from
>> further addressing IaaS+ services until we "complete" the pure IaaS
>> space in a satisfactory manner. Some others would like to build a
>> roadmap based on AWS services. Some others would just add anything that
>> fits the incubation/integration requirements.
>>
>> On one side this is a long-term discussion, but on the other we also
>> need to make quick decisions. With 4 incubated projects, and 2 new ones
>> currently being proposed, there are a lot of people knocking at the door.
>>
>> Thanks for reading this braindump this far. I hope this will trigger the
>> open discussions we need to have, as an open source project, to reach
>> the next level.
>
>
> Yes.
>
> Additionally, and I think we've been getting better at this in the 2 cycles
> that we've had an all-elected TC, I think we need to learn how to say no on
> technical merit - and we need to learn how to say "thank you for your
> effort, but this isn't working out" Breaking up with someone is hard to do,
> but sometimes it's best for everyone involved.
>

I agree.

The challenge is scaling the technical assessment of projects. We're
all busy, and digging deeply enough into a new project to make an
accurate assessment of it is time consuming. Some times, there are
impartial subject-matter experts who can spot problems very quickly,
but how do we actually gauge fitness?

Letting the industry field-test a 

Re: [openstack-dev] [oslo] usage patterns for oslo.config

2014-08-08 Thread Devananda van der Veen
On Fri, Aug 8, 2014 at 12:41 PM, Doug Hellmann  wrote:
>
> That’s right. The preferred approach is to put the register_opt() in
> *runtime* code somewhere before the option will be used. That might be in
> the constructor for a class that uses an option, for example, as described
> in
> http://docs.openstack.org/developer/oslo.config/cfg.html#registering-options
>
> Doug

Interesting.

I've been following the prevailing example in Nova, which is to
register opts at the top of a module, immediately after defining them.
Is there a situation in which one approach is better than the other?

Thanks,
Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][ceilometer] swapping the roles of mongodb and sqlalchemy for ceilometer in Tempest

2014-08-09 Thread Devananda van der Veen
On Aug 9, 2014 4:22 AM, "Eoghan Glynn"  wrote:
>
>
> Hi Folks,
>
> Dina Belova has recently landed some infra patches[1,2] to create
> an experimental mongodb-based Tempest job. This effectively just
> overrides the ceilometer storage backend config so that mongodb
> is used instead of sql-alchemy. The new job has been running
> happily for a few days so I'd like now to consider the path
> forwards with this.
>
> One of our Juno goals under the TC gap analysis was to more fully
> gate against mongodb, given that this is the storage backend
> recommended/supported by many distros. The sql-alchemy backend,
> on the other hand, is more suited for proofs of concept or small
> deployments. However up to now we've been hampered from reflecting
> that reality in the gate, due to the gate being stuck on Precise
> for a long time, as befits LTS, and the version of mongodb needed
> by ceilometer (i.e. 2.4) effectively unavailable on that Ubuntu
> release (in fact it was limited to 2.0.4).
>
> So the orientation towards gating on sql-alchemy was mostly
> driven by legacy issues in the gate's usage of Precise, as
> opposed to this being considered the most logical basket in
> which to put all our testing eggs.
>
> However, we're now finally in the brave new world of Trusty :)
> So I would like to make the long-delayed change over soon.
>
> This would involve transposing the roles of sql-alchemy and
> mongodb in the gate - the mongodb variant becomes the "blessed"
> job run by default, whereas the sql-alchemy based job to
> relegated to the second tier.
>
> So my questions are:
>
>  (a) would the QA side of the house be agreeable to this switch?
>
> and:
>
>  (b) how long would the mongodb job need to be stable in this
>  experimental mode before we pull the trigger on swicthing?
>
> If the answer to (a) is yes, we can get infra patches proposed
> early next week to make the swap.
>
> Cheers,
> Eoghan
>
> [1]
https://review.openstack.org/#/q/project:openstack-infra/config+branch:master+topic:ceilometer-mongodb-job,n,z
> [2]
https://review.openstack.org/#/q/project:openstack-infra/devstack-gate+branch:master+topic:ceilometer-backend,n,z
>

My interpretation of the gap analysis [1] is merely that you have coverage,
not that you switch to it and relegate the SQLAlchemy tests to second
chair. I believe that's a dangerous departure from current standards. A
dependency on mongodb, due to it's AGPL license, and lack of sufficient
support for a non-AGPL storage back end, has consistently been raised as a
blocking issue for Marconi. [2]

-Deva

[1]
https://wiki.openstack.org/wiki/Governance/TechnicalCommittee/Ceilometer_Gap_Coverage

[2]
http://lists.openstack.org/pipermail/openstack-dev/2014-March/030510.html
is a very articulate example of this objection
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [qa][ceilometer] swapping the roles of mongodb and sqlalchemy for ceilometer in Tempest

2014-08-11 Thread Devananda van der Veen
On Mon, Aug 11, 2014 at 3:27 PM, Joe Gordon  wrote:
>
>
>
> On Mon, Aug 11, 2014 at 3:07 PM, Eoghan Glynn  wrote:
>>
>>
>>
>> > Ignoring the question of is it ok to say: 'to run ceilometer in any sort
>> > of
>> > non-trivial deployment you must manager yet another underlying service,
>> > mongodb' I would prefer not adding an addition gate variant to all
>> > projects.
>> > With the effort to reduce the number of gate variants we have [0] I
>> > would
>> > prefer to see just ceilometer gate on both mongodb and sqlalchemy and
>> > the
>> > main integrated gate [1] pick just one.
>>
>> Just checking to see that I fully understand what you mean there, Joe.
>>
>> So would we:
>>
>>  (a) add a new integrated-gate-ceilometer project-template to [1],
>>  in the style of integrated-gate-neutron or integrated-gate-sahara,
>>  which would replicate the main integrated-gate template but with
>>  the addition of gate-tempest-dsvm-ceilometer-mongodb(-full)
>>
>> or:
>>
>>  (b) simply move gate-tempest-dsvm-ceilometer-mongodb(-full) from
>>  the experimental column[2] in the openstack-ceilometer project,
>>  to the gate column on that project
>>
>> or:
>>
>>  (c) something else
>>
>> Please excuse the ignorance of gate mechanics inherent in that question.
>
>
>
> Correct, AFAIK (a) or (b) would be sufficient.
>
> There is another option, which is make the mongodb version the default in
> integrated-gate and only run SQLA on ceilometer.
>

Joe,

I believe this last option is equivalent to making mongodb the
recommended implementation by virtue of suddenly being the most tested
implementation. I would prefer not to see that.

Eoghan,

IIUC (and I am not an infra expert) I would suggest (b) since this
keeps the mongo tests within the ceilometer project only, which I
think is fine from a "what we test is what we recommend" standpoint.

Also, if there is a situation where a change in Nova passes with
ceilometer+mysql and thus lands in Nova, but fails with
ceilometer+mongodb, yes, that would break the ceilometer project's
gate (but not the integrated gate). It would also indicate a
substantial abstraction violation within ceilometer. I have proposed
exactly this model for Ironic's deploy driver testing, and am willing
to accept the consequences within the project if we break our own
abstractions.

Regards,
Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-12 Thread Devananda van der Veen
On Tue, Aug 12, 2014 at 10:44 AM, Dolph Mathews  wrote:
>
> On Tue, Aug 12, 2014 at 12:30 AM, Joe Gordon  wrote:
>>
>> Slow review: by limiting the number of blueprints up we hope to focus our
>> efforts on fewer concurrent things
>> slow code turn around: when a blueprint is given a slot (runway) we will
>> first make sure the author/owner is available for fast code turnaround.
>>
>> If a blueprint review stalls out (slow code turnaround, stalemate in
>> review discussions etc.) we will take the slot and give it to another
>> blueprint.
>
>
> How is that more efficient than today's do-the-best-we-can approach? It just
> sounds like bureaucracy to me.
>
> Reading between the lines throughout this thread, it sounds like what we're
> lacking is a reliable method to communicate review prioritization to core
> reviewers.

AIUI, that is precisely what the proposed "slots" would do -- allow
the PTL (or the drivers team) to reliably communicate review
prioritization to the core review team, in a way that is *not* just
more noise on IRC, and is visible to all contributors.

-Deva

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-13 Thread Devananda van der Veen
On Wed, Aug 13, 2014 at 5:37 AM, Mark McLoughlin  wrote:
> On Fri, 2014-08-08 at 15:36 -0700, Devananda van der Veen wrote:
>> On Tue, Aug 5, 2014 at 10:02 AM, Monty Taylor 
wrote:
>
>> > Yes.
>> >
>> > Additionally, and I think we've been getting better at this in the 2
cycles
>> > that we've had an all-elected TC, I think we need to learn how to say
no on
>> > technical merit - and we need to learn how to say "thank you for your
>> > effort, but this isn't working out" Breaking up with someone is hard
to do,
>> > but sometimes it's best for everyone involved.
>> >
>>
>> I agree.
>>
>> The challenge is scaling the technical assessment of projects. We're
>> all busy, and digging deeply enough into a new project to make an
>> accurate assessment of it is time consuming. Some times, there are
>> impartial subject-matter experts who can spot problems very quickly,
>> but how do we actually gauge fitness?
>
> Yes, it's important the TC does this and it's obvious we need to get a
> lot better at it.
>
> The Marconi architecture threads are an example of us trying harder (and
> kudos to you for taking the time), but it's a little disappointing how
> it has turned out. On the one hand there's what seems like a "this
> doesn't make any sense" gut feeling and on the other hand an earnest,
> but hardly bite-sized justification for how the API was chosen and how
> it lead to the architecture. Frustrating that appears to not be
> resulting in either improved shared understanding, or improved
> architecture. Yet everyone is trying really hard.

Sometimes "trying really hard" is not enough. Saying goodbye is hard, but
as has been pointed out already in this thread, sometimes it's necessary.

>
>> Letting the industry field-test a project and feed their experience
>> back into the community is a slow process, but that is the best
>> measure of a project's success. I seem to recall this being an
>> implicit expectation a few years ago, but haven't seen it discussed in
>> a while.
>
> I think I recall us discussing a "must have feedback that it's
> successfully deployed" requirement in the last cycle, but we recognized
> that deployers often wait until a project is integrated.

In the early discussions about incubation, we respected the need to
officially recognize a project as part of OpenStack just to create the
uptick in adoption necessary to mature projects. Similarly, integration is
a recognition of the maturity of a project, but I think we have graduated
several projects long before they actually reached that level of maturity.
Actually running a project at scale for a period of time is the only way to
know it is mature enough to run it in production at scale.

I'm just going to toss this out there. What if we set the graduation bar to
"is in production in at least two sizeable clouds" (note that I'm not
saying "public clouds"). Trove is the only project that has, to my
knowledge, met that bar prior to graduation, and it's the only project that
graduated since Havana that I can, off hand, point at as clearly
successful. Heat and Ceilometer both graduated prior to being in
production; a few cycles later, they're still having adoption problems and
looking at large architectural changes. I think the added cost to OpenStack
when we integrate immature or unstable projects is significant enough at
this point to justify a more defensive posture.

FWIW, Ironic currently doesn't meet that bar either - it's in production in
only one public cloud. I'm not aware of large private installations yet,
though I suspect there are some large private deployments being spun up
right now, planning to hit production with the Juno release.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Feature proposal freeze

2014-08-13 Thread Devananda van der Veen
Hi all,

As previously announced, ironic is now entering feature proposal freeze for
juno. This means that all open spec reviews will be blocked, and will need
to be resubmitted when kilo opens. This will be when the first juno RC is
tagged, or sooner, and will be announced on this list.

What ever specs have landed is what we've got, and it's all tracked on
launchpad here.
https://launchpad.net/ironic/+milestone/juno-3

Juno 3 is slated to be tagged on September 4th, marking the start of
feature freeze. Unmerged features will be blocked, and require a feature
freeze exception at this point. As we have only three weeks to finish
implementing and reviewing these features, this time really should be spent
focusing on code reviews, ensuring the code matches the proposed design,
has adequate test coverage, etc.

If you're a non-core contributor, you can help with reviews too. If you're
one of the folks working in the nova driver, tempest coverage, nova
upgrade, or grenade testing - please keep doing those important things.

If you are a contributor working on a feature that has an approved spec and
BP targeted to juno, do not wait until the last minute to post the code.
That's a good way to not get reviews, and will make your code less likely
to get a feature freeze exception. If you don't think you'll have time to
finish your feature before September 4, please let Lucas know. Otherwise
he's going to nag you a bunch.

Thanks,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] The future of the integrated release

2014-08-14 Thread Devananda van der Veen
On Aug 14, 2014 2:04 AM, "Eoghan Glynn"  wrote:
>
>
> > >> Letting the industry field-test a project and feed their experience
> > >> back into the community is a slow process, but that is the best
> > >> measure of a project's success. I seem to recall this being an
> > >> implicit expectation a few years ago, but haven't seen it discussed
in
> > >> a while.
> > >
> > > I think I recall us discussing a "must have feedback that it's
> > > successfully deployed" requirement in the last cycle, but we
recognized
> > > that deployers often wait until a project is integrated.
> >
> > In the early discussions about incubation, we respected the need to
> > officially recognize a project as part of OpenStack just to create the
> > uptick in adoption necessary to mature projects. Similarly, integration
is a
> > recognition of the maturity of a project, but I think we have graduated
> > several projects long before they actually reached that level of
maturity.
> > Actually running a project at scale for a period of time is the only
way to
> > know it is mature enough to run it in production at scale.
> >
> > I'm just going to toss this out there. What if we set the graduation
bar to
> > "is in production in at least two sizeable clouds" (note that I'm not
saying
> > "public clouds"). Trove is the only project that has, to my knowledge,
met
> > that bar prior to graduation, and it's the only project that graduated
since
> > Havana that I can, off hand, point at as clearly successful. Heat and
> > Ceilometer both graduated prior to being in production; a few cycles
later,
> > they're still having adoption problems and looking at large
architectural
> > changes. I think the added cost to OpenStack when we integrate immature
or
> > unstable projects is significant enough at this point to justify a more
> > defensive posture.
> >
> > FWIW, Ironic currently doesn't meet that bar either - it's in
production in
> > only one public cloud. I'm not aware of large private installations yet,
> > though I suspect there are some large private deployments being spun up
> > right now, planning to hit production with the Juno release.
>
> We have some hard data from the user survey presented at the Juno summit,
> with respectively 26 & 53 production deployments of Heat and Ceilometer
> reported.
>
> There's no cross-referencing of deployment size with services in
production
> in those data presented, though it may be possible to mine that out of the
> raw survey responses.

Indeed, and while that would be useful information, I was referring to the
deployment of those services at scale prior to graduation, not post
graduation.

Best,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Juno-3 milestone released

2014-09-04 Thread Devananda van der Veen
Hi all!

The Juno 3 milestone has been tagged, and I am very proud of everyone
who worked on it, especially over the last few weeks while I was away.

  https://launchpad.net/ironic/+milestone/juno-3

We had targeted 14 blueprints to this milestone, and managed to land
13 of them! This includes a large number of new features and drivers
which were under development for most of the cycle (some even in
development for more than one cycle). Below is a short summary of the
new drivers:

- ironic-python-agent support (what Rackspace OnMetal is using)
- iLO power driver and virtual-media deploy driver
- DRAC power driver
- SNMP power driver
- IPMI double bridging support (eg. for Moonshot or similar
high-density chassis)
- iPXE support

UEFI boot support was not quite ready, and has been given a
feature-freeze-exception for one week.

Many thanks to everyone who put in the hard work to implement and
review all these features, helping to make this the most featureful
milestone in Ironic's history!

-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Ironic] Unique way to get a registered machine?

2014-09-06 Thread Devananda van der Veen
On Aug 22, 2014 12:48 AM, "Steve Kowalik"  wrote:
>
> On 22/08/14 17:35, Chris Jones wrote:
>>
>> Hi
>>
>> When register-nodes blows up, is the error we get from Ironic
sufficiently unique that we can just consume it and move on?
>>

You should get a clear error when attempting to add a port with a
preexisting MAC. However, at this point, a new node has already been
created (this won't fail since it includes no unique info). When catching
the duplicate MAC error, you should delete the just-created node.

>> I'm all for making the API more powerful wrt inspecting the current
setup, but I also like idempotency :)
>
>
> If the master nodes list changes, because say you add a second NIC, and
up the amount of RAM for a few of your nodes, we then want to update those
details in the baremetal service, rather than skipping those nodes since
they are already registered.
>
>

If you want to update information about a node, you must have that nodes
UUID, whether cached or retrieved on-demand. You can retrieve this by
searching for a port with a known MAC to determine which node is associated
with that port.

You will have a problem updating the existing NIC(s) if you don't cache the
UUID, as MAC address is currently the only other uniquely identifying data
point for a node.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Ironic] Unique way to get a registered machine?

2014-09-06 Thread Devananda van der Veen
Woops, meant to respond to this in the email I just sent...

On Aug 21, 2014 11:35 PM, "Steve Kowalik"  wrote:
>
> For other drivers, we think that the pm_address for each machine
will be unique. Would it be possible add some advice to that effect to
Ironic's driver API?
>

pm_address is cruft on the old nova API, and was replaced with driver_info
in ironic.

Node driver_info is presumably also unique, but the format varies between
drivers; uniqueness is not enforced, nor is it searchable in the REST API.
In some cases, there are half a dozen data points contained in driver_info
(eg, for double bridged IPMI in the case of moonshot) which collectively
inform the driver how to connect to and manage that node.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Devananda van der Veen
On Tue, Sep 9, 2014 at 4:12 PM, Samuel Merritt  wrote:
> On 9/9/14, 12:03 PM, Monty Taylor wrote:
[snip]
>> So which is it? Because it sounds like to me it's a thing that actually
>> does NOT need to diverge in technology in any way, but that I've been
>> told that it needs to diverge because it's delivering a different set of
>> features - and I'm pretty sure if it _is_ the thing that needs to
>> diverge in technology because of its feature set, then it's a thing I
>> don't think we should be implementing in python in OpenStack because it
>> already exists and it's called AMQP.
>
>
> Whether Zaqar is more like AMQP or more like email is a really strange
> metric to use for considering its inclusion.
>

I don't find this strange at all -- I had been judging the technical
merits of Zaqar (ex-Marconi) for the last ~18 months based on the
understanding that it aimed to provide Queueing-as-a-Service, and
found its delivery of that to be lacking on technical grounds. The
implementation did not meet my view of what a queue service should
provide; it is based on some serious antipatterns (storing a queue in
an RDBMS is probably the most obvious); and in fact, it isn't even
queue-like in the access patterns enabled by the REST API (random
access to a set != a queue). That was the basis for a large part of my
objections to the project over time, and a source of frustration for
me as the developers justified many of their positions rather than
accepted feedback and changed course during the incubation period. The
reason for this seems clear now...

As was pointed out in the TC meeting today, Zaqar is (was?) actually
aiming to provide Messaging-as-a-Service -- not queueing as a service!
This is another way of saying "it's more like email and less like
AMQP", which means my but-its-not-a-queue objection to the project's
graduation is irrelevant, and I need to rethink about all my previous
assessments of the project.

The questions now before us are:
- should OpenStack include, in the integrated release, a
messaging-as-a-service component?
- is Zaqar a technically sound implementation of such a service?

As an aside, there are still references to Zaqar as a queue in both
the wiki [0], in the governance repo [1], and on launchpad [2].

Regards,
Devananda


[0] "Multi-tenant queues based on Keystone project IDs"
  https://wiki.openstack.org/wiki/Zaqar#Key_features

[1] "Queue service" is even the official OpenStack Program name, and
the mission statement starts with "To produce an OpenStack message
queueing API and service."
  
http://git.openstack.org/cgit/openstack/governance/tree/reference/programs.yaml#n315

[2] "Zaqar is a new OpenStack project to create a multi-tenant cloud
queuing service"
  https://launchpad.net/zaqar

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Devananda van der Veen
On Thu, Sep 4, 2014 at 1:44 PM, Kurt Griffiths
 wrote:
[snip]
> Does a Qpid/Rabbit/Kafka provisioning service make sense? Probably. Would
> such a service totally overlap in terms of use-cases with Zaqar? Community
> feedback suggests otherwise. Will there be some other kind of thing that
> comes out of the woodwork? Possibly. (Heck, if something better comes
> along I for one have no qualms in shifting resources to the more elegant
> solution--again, use the best tool for the job.) This process happens all
> the time in the broader open-source world. But this process takes a
> healthy amount of time, plus broad exposure and usage, which is something
> that you simply don’t get as a non-integrated project in the OpenStack
> ecosystem.

While that is de rigueur today, it's actually at the core of the
current problem space. Blessing a project by integrating it is not a
scalable long-term solution. We don't have a model to integrate >1
project for the same space // of the same type, or to bless the
stability of a non-integrated project. You won't see two messaging
services, or two compute services, in the integrated release. In fact,
integration is supposed to occur only *after* the community has sorted
out "a winner" within a given space. In my view, it should also happen
only after the community has proven a project to be stable and
scalable in production.

It should be self-evident that, for a large and healthy ecosystem of
production-quality projects to be created and flourish, we can not
pick a winner and shut down competition by integrating a project
*prior* to that project getting "broad exposure and usage". A practice
of integrating projects merely to get them exposure and contributors
is self-defeating.


> In any case, it’s pretty clear to me that Zaqar graduating should not be
> viewed as making it "the officially blessed messaging service for the
> cloud”

That's exactly what graduation does, though. Your statement in the
previous paragraph - that non-integrated projects don't get adoption -
only furthers this point.

> and nobody is allowed to have any other ideas, ever.

Of course other people can have other ideas -- but we don't have a
precedent for handling it inside the community. Look at Ceilometer -
there are at least two other projects which attempted to fill that
space, but we haven't any means to accept them into OpenStack without
either removing Ceilometer or encouraging those projects to merge into
Ceilometer.

> If that
> happens, it’s only a symptom of a deeper perception/process problem that
> is far from unique to Zaqar. In fact, I think it touches on all
> non-integrated projects, and many integrated ones as well.
>

Yup.

I agree that we shouldn't hold Zaqar hostage while the community sorts
out the small-tent-big-camp questions. But I also feel like we _must_
sort that out soon, because the current system (integrate all the
things!) doesn't appear to be sustainable for much longer.


-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

2014-09-09 Thread Devananda van der Veen
On Tue, Sep 9, 2014 at 5:31 PM, Boris Pavlovic  wrote:
>
> Devananda,
>>
>>
>> While that is de rigueur today, it's actually at the core of the
>> current problem space. Blessing a project by integrating it is not a
>> scalable long-term solution. We don't have a model to integrate >1
>> project for the same space // of the same type, or to bless the
>> stability of a non-integrated project. You won't see two messaging
>> services, or two compute services, in the integrated release. In fact,
>> integration is supposed to occur only *after* the community has sorted
>> out "a winner" within a given space. In my view, it should also happen
>> only after the community has proven a project to be stable and
>> scalable in production.
>
>
> After looking at such profiles:
> http://boris-42.github.io/ngk.html
> And getting 150 DB requests (without neutron) to create one single VM, I 
> don't believe that set of current integrated OpenStack projects is scalable 
> well. (I mean without customization)

I'm not going to defend the DB performance of Nova or other services.
This thread isn't the place for that discussion.

>
> So I would like to say 2 things:
>
> - Rules should be the same for all projects (including incubated/integrated)

Yup. This is why the TC revisits integrated projects once per cycle now, too.

>
> - Nothing should be incubated/integrated.

This is a blatant straw-man. If you're suggesting we stop all
integration testing, release management, etc -- the very things which
the integrated release process coordinates... well, I don't think
that's what you're saying. Except it is.

> Cause projects have to evolve, to evolve they need competition. In other 
> words, monopoly sux in any moment of time (even after community decided to 
> chose project A and not project B)
>

In order for a project to evolve, a project needs people contributing
to it. More often than not, that is because someone is using the
project, and it doesn't do what they want, so they improve it in some
way. Incubation was intended to be a signal to early adopters to begin
using (and thus, hopefully, contributing to) a project, encouraging
collaboration and reducing NIH friction between corporations within
the ecosystem. It hasn't gone exactly as planned, but it's also worked
fairly well for _this_ purpose, in my opinion.

However, adding more and more projects into the integrated release,
and thus increasing the testing complexity and imposing greater
requirements on operators -- this is an imminent scaling problem, as
Sean has eloquently pointed out before in several long email threads
which I won't recount here.

All of this is to say that Kurt's statement:
  "[You don't get] broad exposure and usage... as a non-integrated
project in the OpenStack ecosystem."
is an accurate representation of one problem facing OpenStack today. I
don't think we solve that problem by following the established norm -
we solve it by creating a mechanism for non-integrated projects to get
the exposure and usage they need _without_ becoming a burden on our
QA, docs, and release teams, and without forcing that project upon
operators.

But as I said earlier, we shouldn't hold Zaqar hostage while we sort
out what that solution looks like...


Anyhow, my apologies for the bike shed. I felt it was worth voicing my
disagreement with Kurt's statement that graduation should not be
viewed as an official blessing of Zaqar as OpenStack's Messaging
Service. Today, I believe that's exactly what it is. With that
blessing comes an additional burden on the community to support it.

Perhaps that will change in the future.

-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-10 Thread Devananda van der Veen
On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths
 wrote:
> Hi folks,
>
> In this second round of performance testing, I benchmarked the new Redis
> driver. I used the same setup and tests as in Round 1 to make it easier to
> compare the two drivers. I did not test Redis in master-slave mode, but
> that likely would not make a significant difference in the results since
> Redis replication is asynchronous[1].
>
> As always, the usual benchmarking disclaimers apply (i.e., take these
> numbers with a grain of salt; they are only intended to provide a ballpark
> reference; you should perform your own tests, simulating your specific
> scenarios and using your own hardware; etc.).
>
> ## Setup ##
>
> Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to
> mitigate noisy neighbor when running the performance tests:
>
> * 1x Load Generator
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar-bench
> * 1x Web Head
> * Hardware
> * 1x Intel Xeon E5-2680 v2 2.8Ghz
> * 32 GB RAM
> * 10Gbps NIC
> * 32GB SATADOM
> * Software
> * Debian Wheezy
> * Python 2.7.3
> * zaqar server
> * storage=mongodb
> * partitions=4
> * MongoDB URI configured with w=majority
> * uWSGI + gevent
> * config: http://paste.openstack.org/show/100592/
> * app.py: http://paste.openstack.org/show/100593/
> * 3x MongoDB Nodes
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * mongod 2.6.4
> * Default config, except setting replSet and enabling periodic
>   logging of CPU and I/O
> * Journaling enabled
> * Profiling on message DBs enabled for requests over 10ms
> * 1x Redis Node
> * Hardware
> * 2x Intel Xeon E5-2680 v2 2.8Ghz
> * 128 GB RAM
> * 10Gbps NIC
> * 2x LSI Nytro WarpDrive BLP4-1600[2]
> * Software
> * Debian Wheezy
> * Redis 2.4.14
> * Default config (snapshotting and AOF enabled)
> * One process
>
> As in Round 1, Keystone auth is disabled and requests go over HTTP, not
> HTTPS. The latency introduced by enabling these is outside the control of
> Zaqar, but should be quite minimal (speaking anecdotally, I would expect
> an additional 1-3ms for cached tokens and assuming an optimized TLS
> termination setup).
>
> For generating the load, I again used the zaqar-bench tool. I would like
> to see the team complete a large-scale Tsung test as well (including a
> full HA deployment with Keystone and HTTPS enabled), but decided not to
> wait for that before publishing the results for the Redis driver using
> zaqar-bench.
>
> CPU usage on the Redis node peaked at around 75% for the one process. To
> better utilize the hardware, a production deployment would need to run
> multiple Redis processes and use Zaqar's backend pooling feature to
> distribute queues across the various instances.
>
> Several different messaging patterns were tested, taking inspiration
> from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar)
>
> Each test was executed three times and the best time recorded.
>
> A ~1K sample message (1398 bytes) was used for all tests.
>
> ## Results ##
>
> ### Event Broadcasting (Read-Heavy) ###
>
> OK, so let's say you have a somewhat low-volume source, but tons of event
> observers. In this case, the observers easily outpace the producer, making
> this a read-heavy workload.
>
> Options
> * 1 producer process with 5 gevent workers
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4[6] queues
> * 10-second duration
>
> Results
> * Redis
> * Producer: 1.7 ms/req,  585 req/sec
> * Observer: 1.5 ms/req, 1254 req/sec
> * Mongo
> * Producer: 2.2 ms/req,  454 req/sec
> * Observer: 1.5 ms/req, 1224 req/sec
>
> ### Event Broadcasting (Balanced) ###
>
> This test uses the same number of producers and consumers, but note that
> the observers are still listing (up to) 5 messages at a time[4], so they
> still outpace the producers, but not as quickly as before.
>
> Options
> * 2 producer processes with 25 gevent workers each
> * 1 message posted per request
> * 2 observer processes with 25 gevent workers each
> * 5 messages listed per request by the observers
> * Load distributed across 4 queues
> * 10-second duration
>
> Results
> * Redis
> * Producer: 1.4 ms/req, 1374 req/sec
> * Observer: 1.6 ms/req, 1178 req/sec
> * Mongo
> 

Re: [openstack-dev] [zaqar] Juno Performance Testing (Round 2)

2014-09-11 Thread Devananda van der Veen
On Wed, Sep 10, 2014 at 6:09 PM, Kurt Griffiths
 wrote:
> On 9/10/14, 3:58 PM, "Devananda van der Veen" 
> wrote:
>
>>I'm going to assume that, for these benchmarks, you configured all the
>>services optimally.
>
> Sorry for any confusion; I am not trying to hide anything about the setup.
> I thought I was pretty transparent about the way uWSGI, MongoDB, and Redis
> were configured. I tried to stick to mostly default settings to keep
> things simple, making it easier for others to reproduce/verify the results.
>
> Is there further information about the setup that you were curious about
> that I could provide? Was there a particular optimization that you didn’t
> see that you would recommend?
>

Nope.

>>I'm not going to question why you didn't run tests
>>with tens or hundreds of concurrent clients,
>
> If you review the different tests, you will note that a couple of them
> used at least 100 workers. That being said, I think we ought to try higher
> loads in future rounds of testing.
>

Perhaps I misunderstand what "2 processes with 25 gevent workers"
means - I think this means you have two _processes_ which are using
greenthreads and eventlet, and so each of those two python processes
is swapping between 25 coroutines. From a load generation standpoint,
this is not the same as having 100 concurrent client _processes_.

>>or why you only ran the
>>tests for 10 seconds.
>
> In Round 1 I did mention that i wanted to do a followup with a longer
> duration. However, as I alluded to in the preamble for Round 2, I kept
> things the same for the redis tests to compare with the mongo ones done
> previously.
>
> We’ll increase the duration in the next round of testing.
>

Sure - consistency between tests is good. But I don't believe that a
10-second benchmark is ever enough to suss out service performance.
Lots of things only appear after high load has been applied for a
period of time as eg. caches fill up, though this leads to my next
point below...

>>Instead, I'm actually going to question how it is that, even with
>>relatively beefy dedicated hardware (128 GB RAM in your storage
>>nodes), Zaqar peaked at around 1,200 messages per second.
>
> I went back and ran some of the tests and never saw memory go over ~20M
> (as observed with redis-top) so these same results should be obtainable on
> a box with a lot less RAM.

Whoa. So, that's a *really* important piece of information which was,
afaict, missing from your previous email(s). I hope you can understand
how, with the information you provided ("the Redis server has 128GB
RAM") I was shocked at the low performance.

> Furthermore, the tests only used 1 CPU on the
> Redis host, so again, similar results should be achievable on a much more
> modest box.

You described fairy beefy hardware but didn't utilize it fully -- I
was expecting your performance test to attempt to stress the various
components of a Zaqar installation and, at least in some way, attempt
to demonstrate what the capacity of a Zaqar deployment might be on the
hardware you have available. Thus my surprise at the low numbers. If
that wasn't your intent (and given the CPU/RAM usage your tests
achieved, it's not what you achieved) then my disappointment in those
performance numbers is unfounded.

But I hope you can understand, if I'm looking at a service benchmark
to gauge how well that service might perform in production, seeing
expensive hardware perform disappointingly slowly is not a good sign.

>
> FWIW, I went back and ran a couple scenarios to get some more data points.
> First, I did one with 50 producers and 50 observers. In that case, the
> single CPU on which the OS scheduled the Redis process peaked at 30%. The
> second test I did was with 50 producers + 5 observers + 50 consumers
> (which claim messages and delete them rather than simply page through
> them). This time, Redis used 78% of its CPU. I suppose this should not be
> surprising because the consumers do a lot more work than the observers.
> Meanwhile, load on the web head was fairly high; around 80% for all 20
> CPUs. This tells me that python and/or uWSGI are working pretty hard to
> serve these requests, and there may be some opportunities to optimize that
> layer. I suspect there are also some opportunities to reduce the number of
> Redis operations and roundtrips required to claim a batch of messages.
>

OK - those resource usages sound better. At least you generated enough
load to saturate the uWSGI process CPU, which is a good point to look
at performance of the system.

At that peak, what was the:
- average msgs/sec
- min/max/avg/stdev time to [post|get|delete] a message

> The other thing to consider is that in these first two rounds I did 

Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Sep 15, 2014 8:20 AM, "James Slagle"  wrote:
>
> On Mon, Sep 15, 2014 at 7:44 AM, Steven Hardy  wrote:
> > All,
> >
> > Starting this thread as a follow-up to a strongly negative reaction by the
> > Ironic PTL to my patches[1] adding initial Heat->Ironic integration, and
> > subsequent very detailed justification and discussion of why they may be
> > useful in this spec[2].
> >
> > Back in Atlanta, I had some discussions with folks interesting in making
> > "ready state"[3] preparation of bare-metal resources possible when
> > deploying bare-metal nodes via TripleO/Heat/Ironic.
>
> After a cursory reading of the references, it seems there's a couple of 
> issues:
> - are the features to move hardware to a "ready-state" even going to
> be in Ironic proper, whether that means in ironic at all or just in
> contrib.
> - assuming some of the features are there, should Heat have any Ironic
> resources given that Ironic's API is admin-only.
>
> >
> > The initial assumption is that there is some discovery step (either
> > automatic or static generation of a manifest of nodes), that can be input
> > to either Ironic or Heat.
>
> I think it makes a lot of sense to use Heat to do the bulk
> registration of nodes via Ironic. I understand the argument that the
> Ironic API should be "admin-only" a little bit for the non-TripleO
> case, but for TripleO, we only have admins interfacing with the
> Undercloud. The user of a TripleO undercloud is the deployer/operator
> and in some scenarios this may not be the undercloud admin. So,
> talking about TripleO, I don't really buy that the Ironic API is
> admin-only.
>

When I say the ironic API is admin only, I'm speaking to the required
permissions for accessing it. One must be authenticated with keystone
with the "admin" privilege. Borrowing from the ops guide:

" An administrative super user, which has full permissions across all
projects and should be used with great care."

I'm not sure how TripleO is dividing operator and admin in the
undercloud - so I just want to be clear on what you mean when you say
"may not be the undercloud admin". Simply put, to use Ironic in the
undercloud, you must have "admin" privileges in the undercloud -- or
you need to disable Ironic's auth entirely.

> Therefore, why not have some declarative Heat resources for things
> like Ironic nodes, that the deployer can make use of in a Heat
> template to do bulk node registration?
>
> The alternative listed in the spec:
>
> "Don’t implement the resources and rely on scripts which directly
> interact with the Ironic API, prior to any orchestration via Heat."
>
> would just be a bit silly IMO. That goes against one of the main
> drivers of TripleO, which is to use OpenStack wherever possible. Why
> go off and write some other thing that is going to parse a
> json/yaml/csv of nodes and orchestrate a bunch of Ironic api calls?
> Why would it be ok for that other thing to use Ironic's "admin-only"
> API yet claim it's not ok for Heat on the undercloud to do so?
>

Heat has a mission. It's not just a hammer with which to parse
json/yaml/etc into a for loop and throw text at an API. From the wiki:

"... to create a human- and machine-accessible service for managing
the entire lifecycle of infrastructure and applications within
OpenStack clouds."

The resources ironic exposes are not "within OpenStack clouds." They
are _underneath_ the cloud. Actually, they're underneath the
undercloud.

Configuring a node in Ironic is akin to configuring the SAN from which
Cinder provisions volumes. That is clearly a thing which an operator
needs to do -- but are you suggesting that, if my SAN has a REST API,
I should use Heat to configure it?

This is the crux of my objection. I would be surprised if your answer
is "yes, heat should be used to configure my SAN". If you're wondering
why I used that example, it's because folks already asked me if they
can use Ironic to deploy their SAN's management software, perform
firmware upgrades on it, and so on. (I said "no, not today" but it's
an interesting scope discussion for Ironic).

> > Following discovery, but before an undercloud deploying OpenStack onto the
> > nodes, there are a few steps which may be desired, to get the hardware into
> > a state where it's ready and fully optimized for the subsequent deployment:
> >

As others have said already, based on discussions during the Juno
cycle, discovery has not landed, and most of us agree that it is out
of scope.

> > - Updating and aligning firmware to meet requirements of qualification or
> >   site policy
> > - Optimization of BIOS configuration to match workloads the node is
> >   expected to run
> > - Management of machine-local storage, e.g configuring local RAID for
> >   optimal resilience or performance.
> >

These steps are desirable not just the first time a node is added to
ironic, but often subsequently, either between every deployment, or
when the operator changes the role/function that hardware fulfills, or
i

Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Mon, Sep 15, 2014 at 9:50 AM, Clint Byrum  wrote:
> Excerpts from Steven Hardy's message of 2014-09-15 04:44:24 -0700:
>>
>>
> First, Ironic is hidden under Nova as far as TripleO is concerned. So
> mucking with the servers underneath Nova during deployment is a difficult
> proposition. Would I look up the Ironic node ID of the nova server,
> and then optimize it for the workload after the workload arrived? Why
> wouldn't I just do that optimization before the deployment?
>

Except, using Ironic to configure a node's hardware probably requires
rebooting that node -- and thus interrupting the workload that was
just deployed onto it, and possibly (if you're rebuilding a RAID)
destroying that instance. Clearly this doesn't make sense.

>> What is required is some tool to take a text definition of the required
>> configuration, turn it into a correctly sequenced series of API calls to
>> Ironic, expose any data associated with those API calls, and declare
>> success or failure on completion.  This is what Heat does.
>>
>
> I'd rather see Ironic define or adopt a narrow scope document format
> that it can consume for bulk loading. Heat is extremely generic, and thus
> carries a ton of complexity for what is probably doable with a CSV file.

Yup. See my previous comments. Heat is not a generic "manipulate this
text file" hammer. That's Perl.

-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Mon, Sep 15, 2014 at 10:51 AM, Jay Faulkner  wrote:
> Steven,
>
> It's important to note that two of the blueprints you reference:
>
> https://blueprints.launchpad.net/ironic/+spec/drac-raid-mgmt
> https://blueprints.launchpad.net/ironic/+spec/drac-hw-discovery
>
> are both very unlikely to land in Ironic -- these are configuration and 
> discovery pieces that best fit inside a operator-deployed CMDB, rather than 
> Ironic trying to extend its scope significantly to include these type of 
> functions. I expect the scoping or Ironic with regards to hardware 
> discovery/interrogation as well as configuration of hardware (like I will 
> outline below) to be hot topics in Ironic design summit sessions at Paris.
>
> A good way of looking at it is that Ironic is responsible for hardware *at 
> provision time*. Registering the nodes in Ironic, as well as hardware 
> settings/maintenance/etc while a workload is provisioned is left to the 
> operators' CMDB.
>
> This means what Ironic *can* do is modify the configuration of a node at 
> provision time based on information passed down the provisioning pipeline. 
> For instance, if you wanted to configure certain firmware pieces at provision 
> time, you could do something like this:
>
> Nova flavor sets capability:vm_hypervisor in the flavor that maps to the 
> Ironic node. This would map to an Ironic driver that exposes vm_hypervisor as 
> a capability, and upon seeing capability:vm_hypervisor has been requested, 
> could then configure the firmware/BIOS of the machine to 'hypervisor 
> friendly' settings, such as VT bit on and Turbo mode off. You could map 
> multiple different combinations of capabilities as different Ironic flavors, 
> and have them all represent different configurations of the same pool of 
> nodes. So, you end up with two categories of abilities: inherent abilities of 
> the node (such as amount of RAM or CPU installed), and configurable abilities 
> (i.e. things than can be turned on/off at provision time on demand) -- or 
> perhaps, in the future, even things like RAM and CPU will be dynamically 
> provisioned into nodes at provision time.
>


Thanks for the explanation, Jay.

Steven, in response to your question "[what would] just do that
optimization before the deployment?" -- see Jay's example above. This
path has grown out of several discussions we've had over the last two
years, and is closer aligned to what I *thought* TripleO wanted back
when I was more involved in that project.

To paraphrase: Ironic exposes "capabilities" to Nova, and the Nova
scheduler can pick a node based on which capability is requested in
the flavor definition. We don't yet, but are planning to, support
on-demand customization of nodes based on the requested capabilities.
Toggling the VT bit is a canonical example of this -- we should be
able to dynamically update a node's firmware configuration to satisfy
both virtualization and non-virtualization workloads. That's going to
be expressed via Nova flavors and communicated at provision time by
Nova to Ironic. Eventually, I'd like to see everything in that space
(except perhaps RAID topology, since that actually takes a lot of time
to change).

-Devananad

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Mon, Sep 15, 2014 at 9:00 AM, Steven Hardy  wrote:
> For example, today, I've been looking at the steps required for driving
> autodiscovery:
>
> https://etherpad.openstack.org/p/Ironic-PoCDiscovery-Juno
>
> Driving this process looks a lot like application orchestration:
>
> 1. Take some input (IPMI credentials and MAC addresses)
> 2. Maybe build an image and ramdisk(could drop credentials in)
> 3. Interact with the Ironic API to register nodes in maintenance mode
> 4. Boot the nodes, monitor state, wait for a signal back containing some
>data obtained during discovery (same as WaitConditions or
>SoftwareDeployment resources in Heat..)
> 5. Shutdown the nodes and mark them ready for use by nova
>

My apologies if the following sounds snarky -- but I think there are a
few misconceptions that need to be cleared up about how and when one
might use Ironic. I also disagree that 1..5 looks like application
orchestration. Step 4 is a workflow, which I'll go into in a bit, but
this doesn't look at all like describing or launching an application
to me.


Step 1 is just parse a text file.

Step 2 should be a prerequisite to doing -anything- with Ironic. Those
images need to be built and loaded in Glance, and the image UUID(s)
need to be set on each Node in Ironic (or on the Nova flavor, if going
that route) after enrollment. Sure, Heat can express this
declaratively (ironic.node.driver_info must contain key:deploy_kernel
with value:), but are you suggesting that Heat build the images,
or just take the UUIDs as input?

Step 3 is, again, just parse a text file

I'm going to make an assumption here [*], because I think step 4 is
misleading. You shouldn't "boot a node" using Ironic -- you do that
through Nova. And you _dont_ get to specify which node you're booting.
You ask Nova to provision an _instance_ on a _flavor_ and it picks an
available node from the pool of nodes that match the request.

Step 5 is a prerequisite for step 4 -- you can't boot a node that is
in maintenance mode, and if the node is not in maintenance mode, Nova
exposes it to clients. That is in fact how you'd boot it in step 4.

[*] I'm assuming you are not planning to re-implement the Nova
"ironic" driver in Heat. Booting a node with Ironic is not just a
matter of making one or two API calls. It's a declarative
transformation involving multiple changes in Ironic's API, and
presumably also some calls to Neutron if you want network access
to/from your node, and polling the resource to see when its state
converges on the requested state. Actually that sounds like exactly
the sort of thing that Heat could drive. But all this is already
implemented in Nova.


-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Mon, Sep 15, 2014 at 1:08 PM, Steven Hardy  wrote:
> On Mon, Sep 15, 2014 at 05:51:43PM +, Jay Faulkner wrote:
>> Steven,
>>
>> It's important to note that two of the blueprints you reference:
>>
>> https://blueprints.launchpad.net/ironic/+spec/drac-raid-mgmt
>> https://blueprints.launchpad.net/ironic/+spec/drac-hw-discovery
>>
>> are both very unlikely to land in Ironic -- these are configuration and 
>> discovery pieces that best fit inside a operator-deployed CMDB, rather than 
>> Ironic trying to extend its scope significantly to include these type of 
>> functions. I expect the scoping or Ironic with regards to hardware 
>> discovery/interrogation as well as configuration of hardware (like I will 
>> outline below) to be hot topics in Ironic design summit sessions at Paris.
>
> Hmm, okay - not sure I really get how a CMDB is going to help you configure
> your RAID arrays in an automated way?
>
> Or are you subscribing to the legacy datacentre model where a sysadmin
> configures a bunch of boxes via whatever method, puts their details into
> the CMDB, then feeds those details into Ironic?
>
>> A good way of looking at it is that Ironic is responsible for hardware *at 
>> provision time*. Registering the nodes in Ironic, as well as hardware 
>> settings/maintenance/etc while a workload is provisioned is left to the 
>> operators' CMDB.
>>
>> This means what Ironic *can* do is modify the configuration of a node at 
>> provision time based on information passed down the provisioning pipeline. 
>> For instance, if you wanted to configure certain firmware pieces at 
>> provision time, you could do something like this:
>>
>> Nova flavor sets capability:vm_hypervisor in the flavor that maps to the 
>> Ironic node. This would map to an Ironic driver that exposes vm_hypervisor 
>> as a capability, and upon seeing capability:vm_hypervisor has been 
>> requested, could then configure the firmware/BIOS of the machine to 
>> 'hypervisor friendly' settings, such as VT bit on and Turbo mode off. You 
>> could map multiple different combinations of capabilities as different 
>> Ironic flavors, and have them all represent different configurations of the 
>> same pool of nodes. So, you end up with two categories of abilities: 
>> inherent abilities of the node (such as amount of RAM or CPU installed), and 
>> configurable abilities (i.e. things than can be turned on/off at provision 
>> time on demand) -- or perhaps, in the future, even things like RAM and CPU 
>> will be dynamically provisioned into nodes at provision time.
>
> So you advocate pushing all the vendor-specific stuff down into various
> Ironic drivers,

... and providing a common abstraction / representation for it. Yes.
That is, after all, what OpenStack has done for compute, storage, and
networking, and what Ironic has set out to do for hardware
provisioning from the beginning.

>  is any of what you describe above possible today?

No. We had other priorities in Juno. It's probably one of the things
we'll prioritize in Kilo.

If you can't wait for Ironic to implement a common abstraction layer
for such functionality, then by all means, implement vendor-native
templates in Heat, but keep in mind that our goal is to move any
functionality which multiple vendors provide into the common API over
time. Vendor passthru is there as an early proving ground for vendors
to add their unique capabilities while we work towards cross-vendor
standards.

-D

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Tue, Sep 16, 2014 at 11:57 AM, Zane Bitter  wrote:
> On 16/09/14 13:54, Devananda van der Veen wrote:
>>
>> On Sep 15, 2014 8:20 AM, "James Slagle"  wrote:
>>>
>>> >
>>> >On Mon, Sep 15, 2014 at 7:44 AM, Steven Hardy  wrote:
>>>>
>>>> > >
>>>> > >The initial assumption is that there is some discovery step (either
>>>> > >automatic or static generation of a manifest of nodes), that can be
>>>> > > input
>>>> > >to either Ironic or Heat.
>>>
>>> >
>>> >I think it makes a lot of sense to use Heat to do the bulk
>>> >registration of nodes via Ironic. I understand the argument that the
>>> >Ironic API should be "admin-only" a little bit for the non-TripleO
>>> >case, but for TripleO, we only have admins interfacing with the
>>> >Undercloud. The user of a TripleO undercloud is the deployer/operator
>>> >and in some scenarios this may not be the undercloud admin. So,
>>> >talking about TripleO, I don't really buy that the Ironic API is
>>> >admin-only.
>>> >
>>
>> When I say the ironic API is admin only, I'm speaking to the required
>> permissions for accessing it. One must be authenticated with keystone
>> with the "admin" privilege. Borrowing from the ops guide:
>
>
> In most contexts "admin only" means that the default policy.json file
> requires the "admin" role in the current project in order to access a
> particular API endpoint. If you are relying on the is_admin flag in the user
> context and not policy.json then it's likely you are Doing Keystone
> Wrong(TM).

http://git.openstack.org/cgit/openstack/ironic/tree/etc/ironic/policy.json

There are no per-endpoint policies implemented in Ironic. This was
intentional when we started the project.

Also, there have been recent requests to begin providing read-only
access to certain resources to less-privileged users, so we may, in
Kilo, begin implementing more tunable policies.

>
>> " An administrative super user, which has full permissions across all
>> projects and should be used with great care."
>>
>> I'm not sure how TripleO is dividing operator and admin in the
>> undercloud - so I just want to be clear on what you mean when you say
>> "may not be the undercloud admin". Simply put, to use Ironic in the
>> undercloud, you must have "admin" privileges in the undercloud -- or
>> you need to disable Ironic's auth entirely.
>
>
> TripleO can presumably deploy any policy.json file they like in the
> undercloud. It's not entirely surprising that some operations that might are
> admin-only in an overcloud

Ironic isn't in the overcloud, so I'm not sure how this comparison is
appropriate.

> might need to be available in the undercloud to
> "ordinary" users - who, after all, have permissions to create entire
> overclouds - despite them not being admins of the undercloud itself.

Sure. Such a user, who has access to "create an overcloud", and would
be an admin in the overcloud they deployed, would be a regular user of
the undercloud, and have access to the undercloud Nova to provision
their workload. They would not be an admin in the undercloud, nor
would they have any need to talk directly to Ironic.

-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Tue, Sep 16, 2014 at 11:44 AM, Zane Bitter  wrote:
> On 16/09/14 13:56, Devananda van der Veen wrote:
>>
>> On Mon, Sep 15, 2014 at 9:00 AM, Steven Hardy  wrote:
>>>
>>> For example, today, I've been looking at the steps required for driving
>>> autodiscovery:
>>>
>>> https://etherpad.openstack.org/p/Ironic-PoCDiscovery-Juno
>>>
>>> Driving this process looks a lot like application orchestration:
>>>
>>> 1. Take some input (IPMI credentials and MAC addresses)
>>> 2. Maybe build an image and ramdisk(could drop credentials in)
>>> 3. Interact with the Ironic API to register nodes in maintenance mode
>>> 4. Boot the nodes, monitor state, wait for a signal back containing some
>>> data obtained during discovery (same as WaitConditions or
>>> SoftwareDeployment resources in Heat..)
>>> 5. Shutdown the nodes and mark them ready for use by nova
>>>
>>
>> My apologies if the following sounds snarky -- but I think there are a
>> few misconceptions that need to be cleared up about how and when one
>> might use Ironic. I also disagree that 1..5 looks like application
>> orchestration. Step 4 is a workflow, which I'll go into in a bit, but
>> this doesn't look at all like describing or launching an application
>> to me.
>
>
> +1 (Although step 3 does sound to me like something that matches Heat's
> scope.)

I think it's a simplistic use case, and Heat supports a lot more
complexity than is necessary to enroll nodes with Ironic.

>
>> Step 1 is just parse a text file.
>>
>> Step 2 should be a prerequisite to doing -anything- with Ironic. Those
>> images need to be built and loaded in Glance, and the image UUID(s)
>> need to be set on each Node in Ironic (or on the Nova flavor, if going
>> that route) after enrollment. Sure, Heat can express this
>> declaratively (ironic.node.driver_info must contain key:deploy_kernel
>> with value:), but are you suggesting that Heat build the images,
>> or just take the UUIDs as input?
>>
>> Step 3 is, again, just parse a text file
>>
>> I'm going to make an assumption here [*], because I think step 4 is
>> misleading. You shouldn't "boot a node" using Ironic -- you do that
>> through Nova. And you _dont_ get to specify which node you're booting.
>> You ask Nova to provision an _instance_ on a _flavor_ and it picks an
>> available node from the pool of nodes that match the request.
>
>
> I think your assumption is incorrect. Steve is well aware that provisioning
> a bare-metal Ironic server is done through the Nova API. What he's
> suggesting here is that the nodes would be booted - not Nova-booted, but
> booted in the sense of having power physically applied - while in
> maintenance mode in order to do autodiscovery of their capabilities,

Except simply applying power doesn't, in itself, accomplish anything
besides causing the machine to power on. Ironic will only prepare the
PXE boot environment when initiating a _deploy_.

> which
> is presumably hard to do automatically when they're turned off.

Vendors often have ways to do this while the power is turned off, eg.
via the OOB management interface.

> He's also
> suggesting that Heat could drive this process, which I happen to disagree
> with because it is a workflow not an end state.

+1

> However the main takeaway
> here is that you guys are talking completely past one another, and have been
> for some time.
>

Perhaps more detail in the expected interactions with Ironic would be
helpful and avoid me making (perhaps incorrect) assumptions.

-D

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
Now that I've replied to individual emails, let me try to summarize my
thoughts on why Heat feels like the wrong tool for the task that I
think you're trying to accomplish. This discussion has been really
helpful for me in understanding why that is, and I think, at a really
high level, it is because I do not believe that a description of a
cloud application should contain any direct references to Ironic's
resource classes. Let me explain why.

Heat is a declarative engine, where its inputs are: the stated desired
state of a complex system, and the actual state of that system. The
actual state might be that no resources have been created yet (so Heat
should create them) or a set of existing resources (which Heat
previously created) that need to be mutated to achieve the desired
state. This maps reasonably well onto the orchestration of launching
an application within a cloud. Great.

Nova, Cinder, Neutron, etc, provide APIs to control resources within a
cloud (instances, storage volumes, IPs, etc). These are the building
blocks of an application that Heat will use.

Ironic provides a declarative API to model physical configuration of
hardware. This doesn't actually correlate to an application in the
cloud, though Nova understands how to map its resource types (flavors
and instances) onto an available pool of hardware. It is conceivable
that Ironic might, at some point in the future, also be able to manage
switch and storage hardware configuration, firmware updates, etc, as
well. We don't today, but more than one interested developer/vendor
has already approached us asking about that.

By adding Ironic resource templates to Heat, you are, in my view,
moving it beyond the ken of "cloud application orchestration" and into
"inventory management", which is a space that has a lot of additional
complexity, and a lot of existing players.

For what it's worth, I acknowledge that this is possible (and I
apologize if my prior arguments made it seem like I didn't understand
that). But just because it can be done, doesn't mean it /should/ be
done.

Thanks for bearing with me as I organized my thoughts on this,
Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo][heat][ironic] Heat Ironic resources and "ready state" orchestration

2014-09-16 Thread Devananda van der Veen
On Tue, Sep 16, 2014 at 12:42 PM, Zane Bitter  wrote:
> On 16/09/14 15:24, Devananda van der Veen wrote:
>>
>> On Tue, Sep 16, 2014 at 11:44 AM, Zane Bitter  wrote:
>>>
>>> On 16/09/14 13:56, Devananda van der Veen wrote:
>>>>
>>>>
>>>> On Mon, Sep 15, 2014 at 9:00 AM, Steven Hardy  wrote:
>>>>>
>>>>>
>>>>> For example, today, I've been looking at the steps required for driving
>>>>> autodiscovery:
>>>>>
>>>>> https://etherpad.openstack.org/p/Ironic-PoCDiscovery-Juno
>>>>>
>>>>> Driving this process looks a lot like application orchestration:
>>>>>
>>>>> 1. Take some input (IPMI credentials and MAC addresses)
>>>>> 2. Maybe build an image and ramdisk(could drop credentials in)
>>>>> 3. Interact with the Ironic API to register nodes in maintenance mode
>>>>> 4. Boot the nodes, monitor state, wait for a signal back containing
>>>>> some
>>>>>  data obtained during discovery (same as WaitConditions or
>>>>>  SoftwareDeployment resources in Heat..)
>>>>> 5. Shutdown the nodes and mark them ready for use by nova
>>>>>
>>>>
>>>> My apologies if the following sounds snarky -- but I think there are a
>>>> few misconceptions that need to be cleared up about how and when one
>>>> might use Ironic. I also disagree that 1..5 looks like application
>>>> orchestration. Step 4 is a workflow, which I'll go into in a bit, but
>>>> this doesn't look at all like describing or launching an application
>>>> to me.
>>>
>>>
>>>
>>> +1 (Although step 3 does sound to me like something that matches Heat's
>>> scope.)
>>
>>
>> I think it's a simplistic use case, and Heat supports a lot more
>> complexity than is necessary to enroll nodes with Ironic.
>>
>>>
>>>> Step 1 is just parse a text file.
>>>>
>>>> Step 2 should be a prerequisite to doing -anything- with Ironic. Those
>>>> images need to be built and loaded in Glance, and the image UUID(s)
>>>> need to be set on each Node in Ironic (or on the Nova flavor, if going
>>>> that route) after enrollment. Sure, Heat can express this
>>>> declaratively (ironic.node.driver_info must contain key:deploy_kernel
>>>> with value:), but are you suggesting that Heat build the images,
>>>> or just take the UUIDs as input?
>>>>
>>>> Step 3 is, again, just parse a text file
>>>>
>>>> I'm going to make an assumption here [*], because I think step 4 is
>>>> misleading. You shouldn't "boot a node" using Ironic -- you do that
>>>> through Nova. And you _dont_ get to specify which node you're booting.
>>>> You ask Nova to provision an _instance_ on a _flavor_ and it picks an
>>>> available node from the pool of nodes that match the request.
>>>
>>>
>>>
>>> I think your assumption is incorrect. Steve is well aware that
>>> provisioning
>>> a bare-metal Ironic server is done through the Nova API. What he's
>>> suggesting here is that the nodes would be booted - not Nova-booted, but
>>> booted in the sense of having power physically applied - while in
>>> maintenance mode in order to do autodiscovery of their capabilities,
>>
>>
>> Except simply applying power doesn't, in itself, accomplish anything
>> besides causing the machine to power on. Ironic will only prepare the
>> PXE boot environment when initiating a _deploy_.
>
>
> From what I gather elsewhere in this thread, the autodiscovery stuff is a
> proposal for the future, not something that exists in Ironic now, and that
> may be the source of the confusion.
>
> In any case, the etherpad linked at the top of this email was written by
> someone in the Ironic team and _clearly_ describes PXE booting a "discovery
> image" in maintenance mode in order to obtain hardware information about the
> box.
>

Huh. I should have looked at that earlier in the discussion. It is
referring to out-of-tree code whose spec was not approved during Juno.

Apparently, and unfortunately, throughout much of this discussion,
folks have been referring to potential features Ironic might someday
have, whereas I have been focused on the features we actually support
today. That is probably why it seems we are "talking past each other."

-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-18 Thread Devananda van der Veen
On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco  wrote:
> On 09/18/2014 04:09 PM, Gordon Sim wrote:
>> On 09/18/2014 12:31 PM, Flavio Percoco wrote:
>>> Zaqar guarantees FIFO. To be more precise, it does that relying on the
>>> storage backend ability to do so as well. Depending on the storage used,
>>> guaranteeing FIFO may have some performance penalties.
>>
>> Would it be accurate to say that at present Zaqar does not use
>> distributed queues, but holds all queue data in a storage mechanism of
>> some form which may internally distribute that data among servers but
>> provides Zaqar with a consistent data model of some form?
>
> I think this is accurate. The queue's distribution depends on the
> storage ability to do so and deployers will be able to choose what
> storage works best for them based on this as well. I'm not sure how
> useful this separation is from a user perspective but I do see the
> relevance when it comes to implementation details and deployments.

Guaranteeing FIFO and not using a distributed queue architecture
*above* the storage backend are both scale-limiting design choices.
That Zaqar's scalability depends on the storage back end is not a
desirable thing in a cloud-scale messaging system in my opinion,
because this will prevent use at scales which can not be accommodated
by a single storage back end.

And based on my experience consulting for companies whose needs grew
beyond the capabilities of a single storage backend, moving to
application-aware sharding required a significant amount of
rearchitecture in most cases.

-Devananda

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-18 Thread Devananda van der Veen
On Thu, Sep 18, 2014 at 7:55 AM, Flavio Percoco  wrote:
> On 09/18/2014 04:24 PM, Clint Byrum wrote:
>> Great job highlighting what our friends over at Amazon are doing.
>>
>> It's clear from these snippets, and a few other pieces of documentation
>> for SQS I've read, that the Amazon team approached SQS from a _massive_
>> scaling perspective. I think what may be forcing a lot of this frustration
>> with Zaqar is that it was designed with a much smaller scale in mind.
>>
>> I think as long as that is the case, the design will remain in question.
>> I'd be comfortable saying that the use cases I've been thinking about
>> are entirely fine with the limitations SQS has.
>
> I think these are pretty strong comments with not enough arguments to
> defend them.
>

Please see my prior email. I agree with Clint's assertions here.

> Saying that Zaqar was designed with a smaller scale in mid without
> actually saying why you think so is not fair besides not being true. So
> please, do share why you think Zaqar was not designed for big scales and
> provide comments that will help the project to grow and improve.
>
> - Is it because the storage technologies that have been chosen?
> - Is it because of the API?
> - Is it because of the programing language/framework ?

It is not because of the storage technology or because of the
programming language.

> So far, we've just discussed the API semantics and not zaqar's
> scalability, which makes your comments even more surprising.

- guaranteed message order
- not distributing work across a configurable number of back ends

These are scale-limiting design choices which are reflected in the
API's characteristics.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

2014-09-18 Thread Devananda van der Veen
On Thu, Sep 18, 2014 at 8:54 AM, Devananda van der Veen
 wrote:
> On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco  wrote:
>> On 09/18/2014 04:09 PM, Gordon Sim wrote:
>>> On 09/18/2014 12:31 PM, Flavio Percoco wrote:
>>>> Zaqar guarantees FIFO. To be more precise, it does that relying on the
>>>> storage backend ability to do so as well. Depending on the storage used,
>>>> guaranteeing FIFO may have some performance penalties.
>>>
>>> Would it be accurate to say that at present Zaqar does not use
>>> distributed queues, but holds all queue data in a storage mechanism of
>>> some form which may internally distribute that data among servers but
>>> provides Zaqar with a consistent data model of some form?
>>
>> I think this is accurate. The queue's distribution depends on the
>> storage ability to do so and deployers will be able to choose what
>> storage works best for them based on this as well. I'm not sure how
>> useful this separation is from a user perspective but I do see the
>> relevance when it comes to implementation details and deployments.
>
> Guaranteeing FIFO and not using a distributed queue architecture
> *above* the storage backend are both scale-limiting design choices.
> That Zaqar's scalability depends on the storage back end is not a
> desirable thing in a cloud-scale messaging system in my opinion,
> because this will prevent use at scales which can not be accommodated
> by a single storage back end.
>

It may be worth qualifying this a bit more.

While no single instance of any storage back-end is infinitely
scalable, some of them are really darn fast. That may be enough for
the majority of use cases. It's not outside the realm of possibility
that the inflection point [0] where these design choices result in
performance limitations is at the very high end of scale-out, eg.
public cloud providers who have the resources to invest further in
improving zaqar.

As an example of what I mean, let me refer to the 99th percentile
response time graphs in Kurt's benchmarks [1]... increasing the number
of clients with write-heavy workloads was enough to drive latency from
<10ms to >200 ms with a single service. That latency significantly
improved as storage and application instances were added, which is
good, and what I would expect. These benchmarks do not (and were not
intended to) show the maximal performance of a public-cloud-scale
deployment -- but they do show that performance under different
workloads improves as additional services are started.

While I have no basis for comparing the configuration of the
deployment he used in those tests to what a public cloud operator
might choose to deploy, and presumably such an operator would put
significant work into tuning storage and running more instances of
each service and thus shift that inflection point "to the right", my
point is that, by depending on a single storage instance, Zaqar has
pushed the *ability* to scale out down into the storage
implementation. Given my experience scaling SQL and NoSQL data stores
(in my past life, before working on OpenStack) I have a knee-jerk
reaction to believing that this approach will result in a
public-cloud-scale messaging system.

-Devananda

[0] http://en.wikipedia.org/wiki/Inflection_point -- in this context,
I mean the point on the graph of throughput vs latency where the
derivative goes from near-zero (linear growth) to non-zero
(exponential growth)

[1] https://wiki.openstack.org/wiki/Zaqar/Performance/PubSub/Redis

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Ironic] [TripleO] scheduling flow with Ironic?

2013-11-19 Thread Devananda van der Veen
On Wed, Nov 13, 2013 at 10:11 PM, Alex Glikson  wrote:

> Thanks, I understand the Nova scheduler part. One of the gaps there is
> related to the blueprint we have are working on [1]. I was wondering
> regarding the role of Ironic, and the exact interaction between the user,
> Nova and Ironic.
>

The interaction from the point of "nova boot" onwards will be the same --
nova maintains a list of available (host, node) resources, the scheduler
picks one according to the request, dispatches the work to n-cond / n-cpu,
which in turn calls down to various methods in the nova/virt/driver API.
The implementation of the ironic driver is a wrapper around
python-ironicclient library, which will make calls out to the ironic API
service, which in turn performs the necessary work.

Where the interaction is different is around the management of physical
machines; eg, enrolling them with Ironic, temporarily marking a machine as
unavailable while doing maintenance on it, and other sorts of things we
haven't actually written the code for yet.


> In particular, initially I thought that Ironic is going to have its own
> scheduler, resolving some of the issues and complexity within Nova (which
> could focus on VM management, maybe even getting rid of hosts versus nodes,
> etc).


I'm not sure how putting a scheduler in Ironic would solve this problem at
all.

Conversely, I don't think there's any need for the whole (host, node)
thing. Chris Behrens and I talked at the last summit about a possible
rewrite to nova-conductor that would remove the need for this distinction
entirely. I would love to see Nova just track node, and I think this can
work for typical hypervisors (kvm, xen, ...) as well.


> But it seems that Ironic aims to stay at the level of virt driver API.. It
> is a bit unclear to me what is the desired architecture going forward -
> e.g., if the idea is to standardize virt driver APIs but keep the
> scheduling centralized,


AFAIK, the nova.virt.driver API is the standard that all the virt drivers
are written to. Unless you're referring to libvirt's API, in which case, I
don't understand the question.


> maybe we should take the rest of virt drivers into separate projects as
> well, and extend Nova to schedule beyond just compute (if it is already
> doing so for virt + bare-metal).


Why would Nova need to schedule anything besides compute resources? In this
context, Ironic is merely providing a different type of compute resource,
and Nova is still scheduling compute workloads. That this hypervisor driver
has different scheduling characteristics (eg, flavor == node resource;
extra_specs:cpu_arch == node arch; and so on) than other hypervisor drivers
doesn't mean it's not still a compute resource.


> Alternatively, each of them could have its own scheduler (like the
> approach we took when splitting out cinder, for example) - and then someone
> on top (e.g., Heat) would need to do the cross-project logic. Taking
> different architectural approaches in different cases confuses me a bit.
>

Yes, well, Cinder is a different type of resource (block storage).


HTH,
-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic][Ceilometer] get IPMI data for ceilometer

2013-11-19 Thread Devananda van der Veen
On Mon, Nov 18, 2013 at 10:35 AM, Ladislav Smola  wrote:

>  Hello. I have a couple of additional questions.
>
> 1. What about IPMI data that we want to get by polling. E.g. temperatures,
> etc. Will the Ironic be polling these kind of
> data and send them directly to collector(or agent)? Not sure if this
> belongs to Ironic. It would have to support some
> pluggable architecture for vendor specific pollsters like Ceilometer.
>
>
If there is a fixed set of information (eg, temp, fan speed, etc) that
ceilometer will want, let's make a list of that and add a driver interface
within Ironic to abstract the collection of that information from physical
nodes. Then, each driver will be able to implement it as necessary for that
vendor. Eg., an iLO driver may poll its nodes differently than a generic
IPMI driver, but the resulting data exported to Ceilometer should have the
same structure.

I don't think we should, at least right now, support pluggable pollsters on
the Ceilometer->Ironic side. Let's start with a small set of data that
Ironic exposes, make it pluggable internally for different types of
hardware, and iterate if necessary.


> 2. I've seen in the etherpad that the SNMP agent(pollster) will be also
> part of the Ironic(running next to conductor). Is it true?
> Or that will be placed in Ceilometer central agent?
>

An SNMP agent doesn't fit within the scope of Ironic, as far as I see, so
this would need to be implemented by Ceilometer.

As far as where the SNMP agent would need to run, it should be on the same
host(s) as ironic-conductor so that it has access to the management network
(the physically-separate network for hardware management, IPMI, etc). We
should keep the number of applications with direct access to that network
to a minimum, however, so a thin agent that collects and forwards the SNMP
data to the central agent would be preferable, in my opinion.


Regards,
Devananda



>
>
> Thanks for response.
> Ladislav
>
>
>
> On 11/18/2013 06:25 PM, Devananda van der Veen wrote:
>
> Hi Lianhao Lu,
>
>  I briefly summarized my recollection of that session in this blueprint:
>
>  https://blueprints.launchpad.net/ironic/+spec/add-ceilometer-agent
>
>  I've responded to your questions inline as well.
>
>
> On Sun, Nov 17, 2013 at 10:24 PM, Lu, Lianhao wrote:
>
>> Hi stackers,
>>
>> During the summit session Expose hardware sensor (IPMI) data
>> https://etherpad.openstack.org/p/icehouse-summit-ceilometer-hardware-sensors,
>> it was proposed to deploy a ceilometer agent next to the ironic conductor
>> to the get the ipmi data. Here I'd like to ask some questions to figure out
>> what's the current missing pieces in ironic and ceilometer for that
>> proposal.
>>
>> 1. Just double check, ironic won't provide API to get IPMI data, right?
>>
>
>  Correct. This was generally felt to be unnecessary.
>
>>
>> 2. If deploying a ceilometer agent next to the ironic conductor, how does
>> the agent talk to the conductor? Through rpc?
>>
>
>  My understanding is that ironic-conductor will emit messages to the
> ceilimeter agent, and the communication is one-way. These could be
> triggered by a periodic task, or by some other event within Ironic, such as
> a change in the power state of a node.
>
>
>>
>> 3. Does the current ironic conductor have rpc_method to support getting
>> generic ipmi data, i.e. let the rpc_method caller specifying arbitrary
>> netfn/command to get any type of ipmi data?
>>
>
>  No, and as far as I understand, it doesn't need one.
>
>
>>
>> 4. I believe the ironic conductor uses some kind of node_id to associate
>> the bmc with its credentials, right? If so, how can the ceilometer agent
>> get those node_ids to ask the ironic conductor to poll the ipmi data? And
>> how can the ceilometer agent extract meaningful information from that
>> node_id to set those fields in the ceilometer Sample(e.g. recource_id,
>> project_id, user_id, etc.) to identify which physical node the ipmi data is
>> coming from?
>>
>
>  This question perhaps requires a longer answer.
>
>  Ironic references physical machines (nodes) internally with an integer
> node_id and externally with a standard uuid. When a Nova instance is
> created, it will be associated to a node, that node will have a reference
> to the nova instance_uuid which is exposed in our API, and can be passed to
> Ceilometer's agent. I believe that nova instance_uuid will enable
> ceilometer to detect the project, user, etc.
>
>  Should Ironic emit messages regarding nodes which are not provisioned?
> Physical machines that don't have a tenant instance on them are not
> associated to any project, user, tenant, quota, etc, so I suspect that we
> shouldn't notify about them. It would be like tracking the unused disks in
> a SAN.
>
>  Regards,
> Devananda
>
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic][Ceilometer] get IPMI data for ceilometer

2013-11-20 Thread Devananda van der Veen
Responses inline.

On Wed, Nov 20, 2013 at 2:19 AM, Ladislav Smola  wrote:

> Ok, I'll try to summarize what will be done in the near future for
> Undercloud monitoring.
>
> 1. There will be Central agent running on the same host(hosts once the
> central agent horizontal scaling is finished) as Ironic
>

Ironic is meant to be run with >1 conductor service. By i-2 milestone we
should be able to do this, and running at least 2 conductors will be
recommended. When will Ceilometer be able to run with multiple agents?

On a side note, it is a bit confusing to call something a "central agent"
if it is meant to be horizontally scaled. The ironic-conductor service has
been designed to scale out in a similar way to nova-conductor; that is,
there may be many of them in an AZ. I'm not sure that there is a need for
Ceilometer's agent to scale in exactly a 1:1 relationship with
ironic-conductor?


> 2. It will have SNMP pollster, SNMP pollster will be able to get list of
> hosts and their IPs from Nova (last time I
> checked it was in Nova) so it can poll them for stats. Hosts to poll
> can be also defined statically in config file.
>

Assuming all the undercloud images have an SNMP daemon baked in, which they
should, then this is fine. And yes, Nova can give you the IP addresses for
instances provisioned via Ironic.


> 3. It will have IPMI pollster, that will poll Ironic API, getting list of
> hosts and a fixed set of stats (basically everything
> that we can get :-))
>

No -- I thought we just agreed that Ironic will not expose an API for IPMI
data. You can poll Nova to get a list of instances (that are on bare metal)
and you can poll Ironic to get a list of nodes (either nodes that have an
instance associated, or nodes that are unprovisioned) but this will only
give you basic information about the node (such as the MAC addresses of its
network ports, and whether it is on/off, etc).


> 4. Ironic will also emit messages (basically all events regarding the
> hardware) and send them directly to Ceilometer collector
>

Correct. I've updated the BP:

https://blueprints.launchpad.net/ironic/+spec/add-ceilometer-agent

Let me know if that looks like a good description.

-Devananda



> Does it seems to be correct? I think that is the basic we must have to
> have Undercloud monitored. We can then build on that.
>
> Kind regards,
> Ladislav
>
>

> On 11/20/2013 09:22 AM, Julien Danjou wrote:
>
>> On Tue, Nov 19 2013, Devananda van der Veen wrote:
>>
>>  If there is a fixed set of information (eg, temp, fan speed, etc) that
>>> ceilometer will want,
>>>
>> Sure, we want everything.
>>
>>  let's make a list of that and add a driver interface
>>> within Ironic to abstract the collection of that information from
>>> physical
>>> nodes. Then, each driver will be able to implement it as necessary for
>>> that
>>> vendor. Eg., an iLO driver may poll its nodes differently than a generic
>>> IPMI driver, but the resulting data exported to Ceilometer should have
>>> the
>>> same structure.
>>>
>> I like the idea.
>>
>>  An SNMP agent doesn't fit within the scope of Ironic, as far as I see, so
>>> this would need to be implemented by Ceilometer.
>>>
>> We're working on adding pollster for that indeed.
>>
>>  As far as where the SNMP agent would need to run, it should be on the
>>> same host(s) as ironic-conductor so that it has access to the
>>> management network (the physically-separate network for hardware
>>> management, IPMI, etc). We should keep the number of applications with
>>> direct access to that network to a minimum, however, so a thin agent
>>> that collects and forwards the SNMP data to the central agent would be
>>> preferable, in my opinion.
>>>
>> We can keep things simple by having the agent only doing that polling I
>> think. Building a new agent sounds like it will complicate deployment
>> again.
>>
>>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic][Ceilometer] get IPMI data for ceilometer

2013-11-21 Thread Devananda van der Veen
On Thu, Nov 21, 2013 at 12:08 AM, Ladislav Smola  wrote:

>  Responses inline.
>
>
> On 11/20/2013 07:14 PM, Devananda van der Veen wrote:
>
> Responses inline.
>
>  On Wed, Nov 20, 2013 at 2:19 AM, Ladislav Smola wrote:
>
>> Ok, I'll try to summarize what will be done in the near future for
>> Undercloud monitoring.
>>
>> 1. There will be Central agent running on the same host(hosts once the
>> central agent horizontal scaling is finished) as Ironic
>>
>
>  Ironic is meant to be run with >1 conductor service. By i-2 milestone we
> should be able to do this, and running at least 2 conductors will be
> recommended. When will Ceilometer be able to run with multiple agents?
>
>
> Here it is described and tracked:
> https://blueprints.launchpad.net/ceilometer/+spec/central-agent-improvement
>
>
Thanks - I've subscribed to it.


>On a side note, it is a bit confusing to call something a "central
> agent" if it is meant to be horizontally scaled. The ironic-conductor
> service has been designed to scale out in a similar way to nova-conductor;
> that is, there may be many of them in an AZ. I'm not sure that there is a
> need for Ceilometer's agent to scale in exactly a 1:1 relationship with
> ironic-conductor?
>
>
> Yeah we have already talked about that. Maybe some renaming will be in
> place later. :-) I don't think it has to be 1:1 mapping. There was only
> requirement to have "Hardware agent" only on hosts with ironic-conductor,
> so it has access to management network, right?
>
>
Correct.

 2. It will have SNMP pollster, SNMP pollster will be able to get list of
>> hosts and their IPs from Nova (last time I
>> checked it was in Nova) so it can poll them for stats. Hosts to poll
>> can be also defined statically in config file.
>>
>
>  Assuming all the undercloud images have an SNMP daemon baked in, which
> they should, then this is fine. And yes, Nova can give you the IP addresses
> for instances provisioned via Ironic.
>
>
>
> Yes.
>
>
>3. It will have IPMI pollster, that will poll Ironic API, getting list
>> of hosts and a fixed set of stats (basically everything
>> that we can get :-))
>>
>
>  No -- I thought we just agreed that Ironic will not expose an API for
> IPMI data. You can poll Nova to get a list of instances (that are on bare
> metal) and you can poll Ironic to get a list of nodes (either nodes that
> have an instance associated, or nodes that are unprovisioned) but this will
> only give you basic information about the node (such as the MAC addresses
> of its network ports, and whether it is on/off, etc).
>
>
> Ok sorry I have misunderstood the:
> "If there is a fixed set of information (eg, temp, fan speed, etc) that
> ceilometer will want,let's make a list of that and add a driver interface
> within Ironic to abstract the collection of that information from physical
> nodes. Then, each driver will be able to implement it as necessary for that
> vendor. Eg., an iLO driver may poll its nodes differently than a generic
> IPMI driver, but the resulting data exported to Ceilometer should have the
> same structure."
>
> I thought I've read the data will be exposed, but it will be just internal
> Ironic abstraction, that will be polled by Ironic and send directly do
> Ceilometer collector. So same as the point 4., right? Yeah I guess this
> will be easier to implement.
>
>
Yes -- you are correct. I was referring to an internal abstraction around
different hardware drivers.


>
>
>
>> 4. Ironic will also emit messages (basically all events regarding the
>> hardware) and send them directly to Ceilometer collector
>>
>
>  Correct. I've updated the BP:
>
>  https://blueprints.launchpad.net/ironic/+spec/add-ceilometer-agent
>
>  Let me know if that looks like a good description.
>
>
> Yeah, seems great. I would maybe remove the word 'Agent', seems Ironic
> will send it directly to Ceilometer collector, so Ironic acts as agent,
> right?
>

Fair point - I have updated the BP and renamed it to

https://blueprints.launchpad.net/ironic/+spec/send-data-to-ceilometer




>
>
>
> -Devananda
>
>
>
>> Does it seems to be correct? I think that is the basic we must have to
>> have Undercloud monitored. We can then build on that.
>>
>> Kind regards,
>> Ladislav
>>
>>
>
>> On 11/20/2013 09:22 AM, Julien Danjou wrote:
>>
>>> On Tue, Nov 19 2013, Devananda van der Veen wrote:
>>>
>>> If there is a fixed set of information (eg, temp, fan speed, etc) that
>>>> ceilome

Re: [openstack-dev] [Ironic][Cinder] Attaching Cinder volumes to baremetal instance

2013-11-25 Thread Devananda van der Veen
No -- attach volumes is not implemented in Ironic yet. I think it would be
great if someone wants to work on it. There was some discussion at the
summit about cinder support, in particular getting boot-from-volume to work
with Ironic, but no one has come forward since then with code or blueprints.

-Deva


On Fri, Nov 22, 2013 at 11:14 AM, Rohan Kanade wrote:

>
> Hey guys, just starting out with Ironic, had a silly question.
>
> Can we attach bootable or non bootable plain cinder volumes during either
> provisioning of the baremetal instance or after provisioning the baremetal
> instance?
>
> I have seen a "attach_volume" method in the "LibvirtVolumeDriver" of the
> nova baremetal driver. So got curious.
>
> Thanks,
> Rohan Kanade
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic][Ceilometer] get IPMI data for ceilometer

2013-11-25 Thread Devananda van der Veen
Hi!

Very good questions. I think most of them are directed towards the
Ceilometer team, but I have answered a few bits inline.


On Mon, Nov 25, 2013 at 7:24 AM, wanghaomeng  wrote:

>
> Hello all:
>
> Basically, I understand the solution is - Our Ironic will implement an
> IPMI driver
>

We will need to add a new interface -- for example,
ironic.drivers.base.BaseDriver:sensor and the corresponding
ironic.drivers.base.SensorInterface class, then implement this interface as
ironic.drivers.modules.ipmitool:IPMISensor

We also need to define the methods this interface supports and what the
return data type is for each method. I imagine it may be something like:
- SensorInterface.list_available_sensors(node) returns a list of sensor
names for that node
- SensorInterface.get_measurements(node, list_of_sensor_names) returns a
dict of dicts, eg, { 'sensor_1': {'key': 'value'}, 'sensor_2': ...}


> (extendable framework for more drivers) to collect hardware sensor
> data(cpu temp, fan speed, volts, etc) via IPMI protocol from hardware
> server node, and emit the AMQP message to Ceilometer Collector,
> Ceilometer have the framework to handle the valid sample message and save
> to the database for data retrieving by consumer.
>
> Now, how do you think if we should clearly define the *interface & data
> model *specifications between Ironic and Ceilometer to enable IPMI data
> collecting, then our two team can start the coding together?
>

I think this is just a matter of understanding Ceilometer's API so that
Ironic can emit messages in the correct format. You've got many good
questions for the Ceilometer team on this below.


>
> And I still have some concern with our interface and data model as below,
> the spec need to be discussed and finalized:
>
> 1. What is the Ceilometer sample data mandatory attributes, such as
> instance_id/tenant_id/user_id/resource_id, if they are not  optional, where
> are these data populated, from Ironic or Ceilomter side?
>
>   *name/type/unit/volume/timestamp* - basic sample property, can be
> populated from Ironic side as data source
>   *user_id/project_id/resource_id* - Ironic or Ceilometer populate these
> fields??
>   *resource_metadata - this is for Ceilometer metadata query, Ironic know
> nothing for such resource metadata I think*
>   *source *- can we hard-code as 'hardware' as a source identifier?
>
>
Ironic can cache the user_id and project_id of the instance. These will not
be present for unprovisioned nodes.

I'm not sure what "resource_id" is in this context, perhaps the nova
instance_uuid? If so, Ironic has that as well.


> 2. Not sure if our Ceilometer only accept the *signed-message*, if it is
> case, how Ironic get the message trust for Ceilometer, and send the valid
> message which can be accepted by Ceilometer Collector?
>
> 3. What is the Ceilometer sample data structure, and what is the min data
> item set for the IPMI message be emitted to Collector?
>   *name/type/unit/volume/**timestamp/source - is this min data item set?*
>
> 3. If the detailed data model should be defined for our IPMI data now?,
> what is our the first version scope, how many IPMI data type we should
> support? Here is a IPMI data sample list, I think we can support these as a
> min set.
>   *Temperature - System Temp/CPU Temp*
> *  FAN Speed in rpm - FAN 1/2/3/4/A*
> *  Volts - Vcore/3.3VCC/12V/VDIMM/5VCC/-12V/VBAT/VSB/AVCC*
>

I think that's a good starting list. We can add more later.


>
> 4. More specs - such as naming conversions, common constant reference
> definitions ...
>
> These are just a draft, not the spec, correct me if I am wrong
> understanding and add the missing aspects, we can discuss these interface
> and data model clearly I think.
>
>
> --
> *Haomeng*
> *Thanks:)*
>
>
>
Cheers,
Devananda



>
> At 2013-11-21 16:08:00,"Ladislav Smola"  wrote:
>
> Responses inline.
>
> On 11/20/2013 07:14 PM, Devananda van der Veen wrote:
>
> Responses inline.
>
>  On Wed, Nov 20, 2013 at 2:19 AM, Ladislav Smola wrote:
>
>> Ok, I'll try to summarize what will be done in the near future for
>> Undercloud monitoring.
>>
>> 1. There will be Central agent running on the same host(hosts once the
>> central agent horizontal scaling is finished) as Ironic
>>
>
>  Ironic is meant to be run with >1 conductor service. By i-2 milestone we
> should be able to do this, and running at least 2 conductors will be
> recommended. When will Ceilometer be able to run with multiple agents?
>
>
> Here it is described and tracked:
> https://blueprints.launchpad.net/ceilom

Re: [openstack-dev] [Ironic][Ceilometer] get IPMI data for ceilometer

2013-11-28 Thread Devananda van der Veen
On Nov 25, 2013 7:13 PM, "Doug Hellmann" 
wrote:
>
>
>
>
> On Mon, Nov 25, 2013 at 3:56 PM, Devananda van der Veen <
devananda@gmail.com> wrote:
>>
>> Hi!
>>
>> Very good questions. I think most of them are directed towards the
Ceilometer team, but I have answered a few bits inline.
>>
>>
>> On Mon, Nov 25, 2013 at 7:24 AM, wanghaomeng  wrote:
>>>
>>>
>>> Hello all:
>>>
>>> Basically, I understand the solution is - Our Ironic will implement an
IPMI driver
>>
>>
>> We will need to add a new interface -- for example,
ironic.drivers.base.BaseDriver:sensor and the corresponding
ironic.drivers.base.SensorInterface class, then implement this interface as
ironic.drivers.modules.ipmitool:IPMISensor
>>
>> We also need to define the methods this interface supports and what the
return data type is for each method. I imagine it may be something like:
>> - SensorInterface.list_available_sensors(node) returns a list of sensor
names for that node
>> - SensorInterface.get_measurements(node, list_of_sensor_names) returns a
dict of dicts, eg, { 'sensor_1': {'key': 'value'}, 'sensor_2': ...}
>>
>>>
>>> (extendable framework for more drivers) to collect hardware sensor
data(cpu temp, fan speed, volts, etc) via IPMI protocol from hardware
server node, and emit the AMQP message to Ceilometer Collector, Ceilometer
have the framework to handle the valid sample message and save to the
database for data retrieving by consumer.
>>>
>>> Now, how do you think if we should clearly define the interface & data
model specifications between Ironic and Ceilometer to enable IPMI data
collecting, then our two team can start the coding together?
>>
>>
>> I think this is just a matter of understanding Ceilometer's API so that
Ironic can emit messages in the correct format. You've got many good
questions for the Ceilometer team on this below.
>>
>>>
>>>
>>> And I still have some concern with our interface and data model as
below, the spec need to be discussed and finalized:
>>>
>>> 1. What is the Ceilometer sample data mandatory attributes, such as
instance_id/tenant_id/user_id/resource_id, if they are not  optional, where
are these data populated, from Ironic or Ceilomter side?
>>>
>>>
>>>   name/type/unit/volume/timestamp - basic sample property, can be
populated from Ironic side as data source
>>>   user_id/project_id/resource_id - Ironic or Ceilometer populate these
fields??
>
>
> Ceilometer knows nothing about resources unless it is told, so all of the
required fields have to be provided by the sender.
>
>
>>>
>>>   resource_metadata - this is for Ceilometer metadata query, Ironic
know nothing for such resource metadata I think
>
>
> The resource metadata depends on the resource type, but should be all of
the user-visible attributes for that object stored in the database at the
time the measurement is taken. For example, for instances we (try to) get
all of the instance attributes.
>

We could send all the node.properties,  Getting into node.driver_info would
expose passwords and such, so we shouldn't send that.

>>>
>>>   source - can we hard-code as 'hardware' as a source identifier?
>
>
> No, the source is the source of the user and project ids, not the source
of the measurement (the data source is implied by the meter name). The
default source for user and project is "openstack" to differentiate from an
add-on layer like a PaaS where there are different user or project ids.
>
>
>>>
>>>
>>
>> Ironic can cache the user_id and project_id of the instance. These will
not be present for unprovisioned nodes.
>>
>> I'm not sure what "resource_id" is in this context, perhaps the nova
instance_uuid? If so, Ironic has that as well.
>
>
> Do end-users know about bare metal servers before they are provisioned as
instances? Can a regular user, for example, as for the list of available
servers or find details about one by name or id?
>
>

There is an API service which exposes information about unprovisioned
servers. At the moment, it is admin-only. If you think of an end-user as
someone using tuskar, they will likely want to know about unprovisioned
servers.

>>
>>
>>>
>>> 2. Not sure if our Ceilometer only accept the signed-message, if it is
case, how Ironic get the message trust for Ceilometer, and send the valid
message which can be accepted by Ceilometer Collector?
>
>
> I'm not sure it's appropriate for ironic to be sending messages using
ceilometer's sample for

Re: [openstack-dev] [ironic][qa] How will ironic tests run in tempest?

2013-12-09 Thread Devananda van der Veen
On Fri, Dec 6, 2013 at 2:13 PM, Clark Boylan  wrote:

> On Fri, Dec 6, 2013 at 1:53 PM, David Kranz  wrote:
> > It's great that tempest tests for ironic have been submitted! I was
> > reviewing https://review.openstack.org/#/c/48109/ and noticed that the
> tests
> > do not actually run. They are skipped because baremetal is not enabled.
> This
> > is not terribly surprising but we have had a policy in tempest to only
> merge
> > code that has demonstrated that it works. For services that cannot run in
> > the single-vm environment of the upstream gate we said there could be a
> > system running somewhere that would run them and report a result to
> gerrit.
> > Is there a plan for this, or to make an exception for ironic?
> >
> >  -David
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> There is a change[0] to openstack-infra/config to add experimental
> tempest jobs to test ironic. I think that change is close to being
> ready, but I need to give it time for a proper review. Once in that
> will allow you to test 48109 (in theory, not sure if all the bits will
> just work). I don't think these tests fall under the cannot run in a
> single vm environment umbrella, we should be able to test the
> baremetal code via the pxe booting of VMs within the single VM
> environment.
>
> [0] https://review.openstack.org/#/c/53917/
>
>
> Clark
>
>
We can test the ironic services, database, and the driver interfaces by
using our "fake" driver within a single devstack VM today (I'm not sure the
exercises for all of this have been written yet, but it's practical to test
it). OTOH, I don't believe we can test a PXE deploy within a single VM
today, and need to resume discussions with infra about this.

There are some other aspects of Ironic (IPMI, SOL access, any
vendor-specific drivers) which we'll need real hardware to test because
they can't effectively be virtualized. TripleO should cover some (much?) of
those needs, once they are able to switch to using Ironic instead of
nova-baremetal.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] cleaning up our core reviewer list

2013-12-10 Thread Devananda van der Veen
Hi all,

It's about time that I look at the ironic review stats to see who should be
added / removed again.

There are several non-core folks doing reviews during the last month [1] --
thanks! These have been very helpful. I am also looking at the folks who
are contributing code [3] to get another view on depth of knowledge of the
project.

I'm not looking just at the numbers. Good review feedback is very
important, as is the ability to spot architectural problems in a patchset.
For contributors, whether a patch is a superficial fix or a meaningful
improvement is more important than the number of patches submitted. Right
now, I'm looking for folks who have both a general understanding of the
code and project architecture, and a deep knowledge in at least one area,
who have time to review at least one patch a day and attend the weekly
meeting.

Our stats for the last 30 days [1] are:

Total reviews: 711 (23.7/day)
Total reviewers: 23 (avg 1.0 reviews/day)
Total reviews by core team: 347 (11.6/day)
Core team size: 6 (avg 1.9 reviews/day)
New patch sets in the last 30 days: 519 (17.3/day)
Changes involved in the last 30 days: 147 (4.9/day)
  New changes in the last 30 days: 124 (4.1/day)
  Changes merged in the last 30 days: 99 (3.3/day)
  Changes abandoned in the last 30 days: 15 (0.5/day)
  Changes left in state WIP in the last 30 days: 4 (0.1/day)
  Queue growth in the last 30 days: 6 (0.2/day)
  Average number of patches per changeset: 3.5


With an average of 4 patches per day, and 4 active core reviewers, we
currently need to maintain a rate of 2 reviews per core member per day to
keep the backlog from growing.

With all that in mind, I don't see anyone who I feel is both an active
reviewer and has a solid grasp on the project (and who isn't already core)
at the moment. I'll be reaching out to a few people who I think are very
close to see if they are interested and able to commit to a few more
reviews, and revisit this mid-january.

Now for the goodbyes. Michael and Sean initially helped a lot with
nova-baremetal reviews and seeded Ironic's review team when the project
started out. However, they haven't been actively reviewing lately [2] and
when I chatted with them at the summit, neither indicated that they would
return to reviewing this code, so I have removed them from the core team.
I'd like to thank them both for the help jump-starting the project!

-Devananda


[1] - http://russellbryant.net/openstack-stats/ironic-reviewers-30.txt
[2] - http://russellbryant.net/openstack-stats/ironic-reviewers-90.txt
[3] -
http://www.stackalytics.com/?release=icehouse&metric=commits&project_type=openstack&module=ironic-group&company=&user_id=
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic][qa] How will ironic tests run in tempest?

2013-12-10 Thread Devananda van der Veen
 Tue, Dec 10, 2013 at 12:43 PM, David Kranz  wrote:

>  On 12/09/2013 01:37 PM, Devananda van der Veen wrote:
>
>  On Fri, Dec 6, 2013 at 2:13 PM, Clark Boylan wrote:
>
>>  On Fri, Dec 6, 2013 at 1:53 PM, David Kranz  wrote:
>> > It's great that tempest tests for ironic have been submitted! I was
>> > reviewing https://review.openstack.org/#/c/48109/ and noticed that the
>> tests
>> > do not actually run. They are skipped because baremetal is not enabled.
>> This
>> > is not terribly surprising but we have had a policy in tempest to only
>> merge
>> > code that has demonstrated that it works. For services that cannot run
>> in
>> > the single-vm environment of the upstream gate we said there could be a
>> > system running somewhere that would run them and report a result to
>> gerrit.
>> > Is there a plan for this, or to make an exception for ironic?
>> >
>> >  -David
>> >
>> > ___
>> > OpenStack-dev mailing list
>> > OpenStack-dev@lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>  There is a change[0] to openstack-infra/config to add experimental
>> tempest jobs to test ironic. I think that change is close to being
>> ready, but I need to give it time for a proper review. Once in that
>> will allow you to test 48109 (in theory, not sure if all the bits will
>> just work). I don't think these tests fall under the cannot run in a
>> single vm environment umbrella, we should be able to test the
>> baremetal code via the pxe booting of VMs within the single VM
>> environment.
>>
>> [0] https://review.openstack.org/#/c/53917/
>>
>>
>> Clark
>>
>>
>  We can test the ironic services, database, and the driver interfaces by
> using our "fake" driver within a single devstack VM today (I'm not sure the
> exercises for all of this have been written yet, but it's practical to test
> it). OTOH, I don't believe we can test a PXE deploy within a single VM
> today, and need to resume discussions with infra about this.
>
>  There are some other aspects of Ironic (IPMI, SOL access, any
> vendor-specific drivers) which we'll need real hardware to test because
> they can't effectively be virtualized. TripleO should cover some (much?) of
> those needs, once they are able to switch to using Ironic instead of
> nova-baremetal.
>
>  -Devananda
>
> So it seems that the code in the submitted tempest tests can run in a
> regular job if devstack is configured to enable ironic, but that this
> cannot be the default. So I propose that we create a regular
> devstack+ironic job that will run in the ironic and tempest gates, and run
> just the ironic tests. When third-party bare-metal results can be reported
> for ironic, tempest can then accept tests that require bare-metal.  Does
> any one have a problem with this approach?
>
>  -David
>
>
As I understand it, the infra/config patch which Clark already linked (
https://review.openstack.org/#/c/53917), which has gone through several
iterations, should be enabling Ironic within devstack -- and thus causing
tempest to run the relevant tests -- within the Ironic and Tempest check
and gate pipelines. This will exercise Ironic's API by performing CRUD
actions on resources. It doesn't do any more than that yet.

David, I'm not sure what you mean by "when third-party bare-metal results
can be reported for ironic" -- I don't see any reason why we couldn't
accept third-party smoke tests right now, except that none of the tempest
tests are written... Am I missing something?

In the longer term, we are planning to enable tempest testing of deployment
by ironic within devstack-gate as all the pieces come together. This will
take a fair bit more work / time, but I'm going to start nudging resources
in this direction very soon. In fact, we just talked about this in #infra
for a bit. Here's an attempt to summarize what came of it w.r.t. Ironic's
testing plans. We will need:

- some changes in devstack-gate to prepare a new environment by...
-- install sshd + firewall it to only allow connections from localhost
-- create a bunch of tiny qemu VMs (of configurable size and number)
- some changes in devstack to...
-- suck up a list of those VM's MAC addresses and enroll them in Ironic
-- configure nova to use ironic
-- configure ironic to use the pxe+ssh driver
- a new test job that turns all this on, thus allowing tempest to do all
its usual work against a "virtual baremetal" cloud

Also, it's worth mentioning, the above-described plan won't exerc

[openstack-dev] [Ironic] Project status update

2013-12-11 Thread Devananda van der Veen
Hi all!

I realize it's been a while since I've posted an update about the project
-- it's high time I do so! And there are several things to report...

We tagged an Icehouse-1 milestone, though we did not publish a tarball just
yet. That should happen at the Icehouse-2 milestone.
  http://git.openstack.org/cgit/openstack/ironic/tag/?id=2014.1.b1

We've had a functioning python client (library and CLI) for a while now,
and I finally got around to tagging a release and pushing it up to PyPI.
I'll be issuing another release once the deployment API is implemented
(patches are up, but may take a few iterations).
  https://pypi.python.org/pypi/python-ironicclient

Speaking of APIs, we're auto-generating our API docs now. Thanks,
pecan/wsme! Note that our v1 API is not yet stabilized - but at least the
docs are going to stay up-to-date as we hammer out issues and add missing
components.
  http://docs.openstack.org/developer/ironic/webapi/v1.html

We have a patchset up for a Nova "ironic" driver; it is not
feature-complete and still a WIP, but I thought it would be good to list it
here in case anyone is interested in tracking its parity with the baremetal
driver.
  https://review.openstack.org/#/c/51328/

As of late October, Ironic was integrated with devstack. Though it is
currently disabled by default, it is easy to enable in your localrc.

We also have a diskimage-builder element and can use TripleO to deploy an
ironic-based undercloud. Even though it can't deploy an overcloud yet, I
find it very useful for development.


That's all for now,
-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Bug list maintenance

2013-12-11 Thread Devananda van der Veen
So, I've dug into the bug list in the past few days, and want to share what
I've observed.

Over the Havana cycle, we all used the bug list as a way to earmark work we
needed to come back to. Some of those earmarks are stale. Perhaps the
status is incorrect, or we fixed it but didn't close the bug, or the
description no longer reflects the current codebase.

I'd like to ask that, if you have any bugs assigned to you, please take a
few minutes to review them. If you're still working on them, please make
sure the status & priority fields are accurate, and target a reasonable
milestone (i2 is Jan 23, i3 is March 6). Oh, and let me know you've
reviewed your bugs, otherwise I'm going to nag you :)

Also, if you aren't able to work that bug right now, don't sweat it - just
unassign yourself. This is about keeping the bug list accurate, not about
guilting anyone into working more.

Thanks!
-D
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] firmware security

2013-12-12 Thread Devananda van der Veen
On Thu, Dec 12, 2013 at 12:50 AM, Lu, Lianhao  wrote:

> Hi Ironic folks,
>
> I remembered once seeing that ironic was calling for firmware security.
> Can anyone elaborate with a little bit details about what Ironic needs for
> this "firmware security"? I'm wondering if there are some existing
> technologies(e.g. TPM, TXT, etc) that can be used for this purpose.
>
> Best Regards,
> -Lianhao
>

Hi Lianhao,

The topic of firmware support in Ironic has lead to very interesting
discussions: questions about scope, multi-vendor support, and, invariably,
questions about how we might validate / ensure the integrity of existing
firmware or the firmware Ironic would be loading onto a machine. A proposal
was put forward at the last summit to add a generic mechanism for flashing
firmware, as part of a generic utility ramdisk. Other work is taking
priority this cycle, but here are the blueprints / discussion.
  https://blueprints.launchpad.net/ironic/+spec/firmware-update
  https://blueprints.launchpad.net/ironic/+spec/utility-ramdisk

To get back to your question about security, UEFI + hardware TPM is, as far
as I know, the commonly-acknowledged best approach today, even though it is
not necessarily available on all hardware. I believe Ironic will need to
support interacting with these both locally (eg, via CPU bus) and remotely
(eg, via vendor's OOB management controllers).

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Ironic] Get power and temperature via IPMI

2013-12-18 Thread Devananda van der Veen
On Tue, Dec 17, 2013 at 10:00 PM, Gao, Fengqian wrote:

>  Hi, all,
>
> I am planning to extend bp
> https://blueprints.launchpad.net/nova/+spec/utilization-aware-schedulingwith 
> power and temperature. In other words, power and temperature can be
> collected and used for nova-scheduler just as CPU utilization.
>
> I have a question here. As you know, IPMI is used to get power and
> temperature and baremetal implements IPMI functions in Nova. But baremetal
> driver is being split out of nova, so if I want to change something to the
> IPMI, which part should I choose now? Nova or Ironic?
>
>
>

Hi!

A few thoughts... Firstly, new features should be geared towards Ironic,
not the nova baremetal driver as it will be deprecated soon (
https://blueprints.launchpad.net/nova/+spec/deprecate-baremetal-driver).
That being said, I actually don't think you want to use IPMI for what
you're describing at all, but maybe I'm wrong.

When scheduling VMs with Nova, in many cases there is already an agent
running locally, eg. nova-compute, and this agent is already supplying
information to the scheduler. I think this is where the facilities for
gathering power/temperature/etc (eg, via lm-sensors) should be placed, and
it can reported back to the scheduler along with other usage statistics.

If you think there's a compelling reason to use Ironic for this instead of
lm-sensors, please clarify.

Cheers,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Ironic] Get power and temperature via IPMI

2013-12-19 Thread Devananda van der Veen
On Wed, Dec 18, 2013 at 7:16 PM, Gao, Fengqian wrote:

>  Hi, Devananda,
>
> I agree with you that new features should be towards Ironic.
>
> As you asked why use Ironic instead of lm-sensors, actually I just want to
> use IPMI instead of lm-sensors. I think it is reasonable to put the IPMI
> part into Ironic and we already didJ.
>
>
>
> To get the sensors’ information, I think IPMI is much more powerful than
> lm-sensors.
>
> Firstly, IPMI is flexible.  Generally speaking, it provides two kinds of
> connections, in-bind and out-of-band.
>
> Out-of-band connection allows us to get sensors’ status even without OS
> and CPU.
>
> In-band connection is quite similar to lm-sensors, It needs the OS kernel
> to get sensor data.
>
> Secondly,  IPMI can gather more sensors’ information that lm-sensors and
> it is easy to use. From my own experience, using IPMI can get all the
> sensor information that lm-sensors could get, such as
> temperature/voltage/fan. Besides that, IPMI can get power data and some OEM
> specific sensor data.
>
> Thirdly, I think IPMI is a common spec for most of OEMs.  And most of
> servers are integrated with IPMI interface.
>
>
>
> As you sais, nova-compute is already supplying information to the
> scheduler and power/temperature should be gathered locally.  IPMI can be
> used locally, the in-band connection. And there is a lot of open source
> library, such as OpenIPMI, FreeIPMI, which provide the interfaces to OS,
> just like lm-sensors.
>
> So, I prefer to use IPMI than lm-sensors. Please leave your comments if
> you disagreeJ.
>
>
>
I see nothing wrong with nova-compute gathering such information locally.
Whether you use lm-sensors or in-band IPMI is an implementation detail of
how nova-compute would gather the information.

However, I don't see how this has anything to do with Ironic or the
nova-baremetal driver. These would gather information remotely (using
out-of-band IPMI) for hardware controlled and deployed by these services.
In most cases, nova-compute is not deployed by nova-compute (exception: if
you're running TripleO).

Hope that helps,
-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all project] log files and logstash

2014-01-02 Thread Devananda van der Veen
Thanks for the pointer. Since Ironic is not yet enabled by default in
devstack-gate, it's OK that it is also missing, but now I know where to add
it.

Cheers,
-Deva


On Mon, Dec 30, 2013 at 11:11 AM, David Kranz  wrote:

> In case any one other than me didn't know this, the log files that are
> indexed and searchable in logstash are not the same as the set of files
> that you see in the logs directory in jenkins, but only those that have an
> entry in http://git.openstack.org/cgit/openstack-infra/config/tree/
> modules/openstack_project/files/logstash/jenkins-log-client.yaml. Here
> are the log files that are not mentioned in that yaml file in case any
> omissions are not intentional:
>
> cinder:
> screen-c-bak.txt
>
> horizon:
> screen-horizon.txt
>
> neutron:
> screen-q-lbaas.txt  (but there *is* an entry for screen-q-l3.txt)
> screen-q-vpn.txt
>
> ceilometer:
> No log files are indexed by logstash
>
>   -David
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] [QA] some notes on ironic functional testing

2014-01-14 Thread Devananda van der Veen
Hi Alexander, Vladimir,

First of all, thanks for working on testing Ironic! Functional testing is
critical to our project, and I'm very happy you're looking into it.

On Tue, Jan 14, 2014 at 6:28 AM, Alexander Gordeev wrote:

> We (agordeev and vkozhukalov) had a discussion about functional testing
> flow described here https://etherpad.openstack.org/p/IronicDevstackTesting.
> From our perspective it has some non-critical, but still notable flaws. If
> you ask me, it looks a bit strange to create testing networks and VMs
> during devstack run using virsh and shell. Maybe more suitable to use
> libvirt python API for this purpose and create test environment right
> before launching testing scenario (aka setUp stage). We know that according
> to tempest development requirements it is not allowed to hit hypervisors
> directly ( http://docs.openstack.org/developer/tempest/overview.html ),
> but using libvirt is not hitting hypervisor directly. We described
> alternative testing flow at the same document
> https://etherpad.openstack.org/p/IronicDevstackTesting.
>

I've read your proposed approach and I think you raise some very valid
points, particularly about separating the "start devstack" from "prepare a
tempest test run". I agree that the enrollment of VMs (simulated bare metal
nodes) is clearly a property of the latter case, and a failure during
enrollment is a failure of that test, not of devstack. However, I think the
creation of VMs is a bit of a grey area...

Also, you noted that "having baremetal poseurs in devstack is convenient
for development." While it may be useful to add some automation to devstack
allowing for the creation and enrollment of VMs for Ironic development, I'm
much more eager to see good functional testing. Chris and I are working to
simplify the use of tripleo-incubator for easily creating an ironic
development environment, so let's focus this work on adding functional
testing for the gate :)


>
> There are some reasons why it is more correct to create VMs inside tempest
> itself, not inside devstack.
>
>- Firstly, we possibly want to have several functional testing
>scenarios and every scenario needs to have clean testing environment, so we
>can just add setUp and tearDown stages for creating and destroying testing
>envs.
>
> Makes a lot of sense to me.

>
>-
>- Secondly, virsh has some performance issues if you deal with >30 VMs
>(it is not our case for now but who knows).
>
> This is a reason why you want to use python libvirt api instead of virsh
CLI, correct? I don't see a problem, but I will defer to the tempest devs
on whether that's OK.


>- Besides, as we found out ssh power driver also uses virsh for VM
>power management and does it ineffectively looking over all VMs (virsh
>dumpxml for every VM and grep MAC). It is possible to significantly improve
>testing power management performance if we substitute ssh power driver with
>driver which uses libvirt python API and lookups VM by UUID, not by MAC.
>
> So, this is a slightly separate discussion about the Ironic SSH power
driver. I'm aware of the inefficient approach, but it was necessary to
provide support for systems that are not supported (well) by libvirt. If
the performance becomes a serious problem for testing, let's look at how we
can improve the SSH power driver rather than adding another one.


>
>-
>- Third point here is that adding nodes into ironic is also a part of
>ironic functionality and it is supposed to be tested as well as
>others. Besides, if something goes wrong during adding nodes into
>ironic, we'll get testing scenario failed, not devstack failed. It is
>what we usually expect from testing procedure.
>
> Agreed. Another valid point.

>
>-
>- And finally, using tempest with python libvirt API we can create
>widely customizable testing envs, it is not possible if we use devstack for
>that purpose.
>
> I'm not sure I agree with this reasoning. Devstack is very customizable...

As I said, you've raised some good points, and I think putting the
setUp/tearDown to create+enroll and delete+undefine VMs in tempest makes
sense. I think the network bridge creation still belongs in devstack --
that's part of the basic environment, not variable between tests.

Of course, I'd like to see what the Tempest/QA/Infra folks think. I've
added the [QA] tag to this email to get their attention.

Cheers,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Let's move to Alembic

2014-01-15 Thread Devananda van der Veen
Hi all,

Some months back, there was discussion to move Ironic to use Alembic
instead of SqlAlchemy. At that time, I was much more interested in getting
the framework together than I was in restructuring our database migrations,
and what we had was sufficient to get us off the ground.

Now that the plumbing is coming together, and we're looking hopefully at
doing a release this cycle, I'd like to see if anyone wants to pick up the
torch and switch our db migrations to use alembic. Ideally, let's do this
between the I2 and I3 milestones.

I am aware of the work adding a transition-to-alembic to Oslo:
https://review.openstack.org/#/c/59433/

I feel like we don't necessarily need to wait for that to land. There's a
lot less history in our migrations than in, say, Nova; we don't yet support
down-migrations anyway; and there aren't any prior releases of the project
which folks could upgrade from.

Thoughts?

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] Disk Eraser

2014-01-17 Thread Devananda van der Veen
On Fri, Jan 17, 2014 at 12:35 PM, Alan Kavanagh
wrote:

> Hi Rob
>
> Then apart from the disk eraser and reinstalling the blade from scratch
> everytime it is returned from lease, and ensure network isolation, what are
> the other many concerns you are worried about for sharing the bare metal
> then? Would really like to know what the other "major issues are" that you
> see?
>
> /Aaln
>
>
Alan,

Disk erasure is, in my opinion, more suitable to policy compliance, for
instance wiping HIPAA / protected information from a machine before
returning it to the pool of available machines within a trusted
organization. It's not just about security. We discussed it briefly at the
HKG summit, and it fits within the long-tail of this blueprint:

https://blueprints.launchpad.net/ironic/+spec/utility-ramdisk

To answer your other question, the security implications of putting
untrusted tenants on bare metal today are numerous. The really big attack
vector which, AFAIK, no one has completely solved is firmware. Even though
we can use UEFI (in hardware which supports it) to validate the main
firmware and the OS's chain of trust, there are still many micro
controllers, PCI devices, storage controllers, etc, whose firmware can't be
validated out-of-band and thus can not be trusted. The risk is that a prior
tenant maliciously flashed a new firmware which will lie about its status
and remain a persistent infection even if you attempt to re-flash said
device. There are other issues which are easier to solve (eg, network
isolation during boot, IPMI security, a race condition if the data center
power cycles and the node boots before the control plane is online, etc)
but these are, ultimately, not enough as long as the firmware attack vector
still exists.

tl;dr, We should not be recycling bare metal nodes between untrusted
tenants at this time. There's a broader discussion about firmware security
going on, which, I think, will take a while for the hardware vendors to
really address. Fixing the other security issues around it, while good,
isn't a high priority for Ironic at this time.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ironic] Disk Eraser

2014-01-17 Thread Devananda van der Veen
On Fri, Jan 17, 2014 at 3:21 PM, Chris Friesen
wrote:

> On 01/17/2014 04:20 PM, Devananda van der Veen wrote:
>
>  tl;dr, We should not be recycling bare metal nodes between untrusted
>> tenants at this time. There's a broader discussion about firmware
>> security going on, which, I think, will take a while for the hardware
>> vendors to really address.
>>
>
> What can the hardware vendors do?  Has anyone proposed a meaningful
> solution for the firmware issue?
>
> Given the number of devices (NIC, GPU, storage controllers, etc.) that
> could potentially have firmware update capabilities it's not clear to me
> how this could be reliably solved.
>
> Chris
>
>
Precisely.

That's what I mean by "there's a broader discussion." We can encourage
hardware vendors to take firmware security more seriously and add
out-of-band validation mechanisms to their devices. From my perspective,
the industry is moving in that direction already, though raising awareness
directly with your preferred vendors can't hurt ;)

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Third-party drivers and testing

2014-01-19 Thread Devananda van der Veen
Hi all,

I've been thinking about how we should treat third-party drivers in Ironic
for a while, and had several discussions at the Hong Kong summit and last
week at LCA. Cinder, Nova, Neutron, and TripleO are all having similar
discussions, too. What follows is a summary of my thoughts and a proposal
for our project's guidelines to vendors.

At Ironic's core is a set of reference drivers that use PXE and IPMI, which
are broadly supported by most hardware, to provide the minimum necessary
set of features for provisioning hardware. Vendors naturally want to
improve upon those features, whether by adding new features or making their
hardware support more robust or better performing. This is good and worth
encouraging. However, I do not want Ironic to get get into a position where
third-party drivers are in trunk but not tested, and thus potentially of a
lower quality code, inadvertently broken by changes to the common code, and
a burden upon the core contributors. I want to encourage vendors'
contributions while guarding the shared codebase to ensure that it works
well for everyone, and without increasing the burden unfairly on core
contributors and reviewers. We're in the unique position at the moment, as
a new project, of not having any vendor drivers in trunk today; we have the
opportunity to set the bar without having to do a lot of clean up first.

To this end, I believe we need third-party functional testing on real
hardware for each third-party driver, and I believe we need this for every
commit to Ironic. This is the bar that we're going to hold the PXE driver
to, and I would like our users to have the same level of confidence in
third-party drivers.

Before requiring that degree of testing, I would like to be able to direct
vendors at a working test suite which they can copy. I expect us to have
functional testing for the PXE and SSH drivers within Tempest and devstack
/ devstack-gate either late in this cycle or early next cycle. Around the
same time, I believe TripleO will switch to using Ironic in their test
suite, so we'll have coverage of the IPMI driver on real hardware as well
(this may be periodic coverage rather than per-test feedback initially).

With that timeline in mind, it would not be fair to require vendors to
complete their own third-party testing this cycle with no example to work
from. On the other hand, I don't want to block the various vendors who have
already expressed a strong interest in contributing their drivers to the
Icehouse release.

I am proposing that we provisionally allow in vendor drivers this cycle
with the following requirements, and that we draw the line at the J3
milestone to give everyone ample time to get testing up on real hardware --
without blocking innovation now. At that time, we may kick out third-party
drivers if these criteria haven't been met.

1. each driver must adhere to the existing driver interfaces.
2. each driver must have comprehensive unit test coverage and sufficient
inline documentation.
3. vendors are responsible for fixing bugs in their driver in a timely
fashion.
4. vendors commit to have third-party testing on a supported hardware
platform implemented by the J3 milestone.
5. vendors contribute a portion of at least one developer's time to
upstream participation.

Items 1 and 2 are criteria for any code landing in trunk; these are not
specific to this discussion, but I feel they're worth calling out
explicitly.

Items 3 and 4 are crucial to maintaining code quality in the third-party
drivers. See http://ci.openstack.org/third_party.html for a general
description of third-party testing. My goal is that by J3, we should have
smoke testing (vote +1/-1 but not gate blocking) for every commit to Ironic
on supported hardware for each third-party driver.

Item 5 is meant to avoid what I call the "throwing code over the wall"
problem. I believe this will ensure that there is ongoing vendor
participation in the Ironic developer community. To clarify, broadly
speaking, I mean that this person should:
- be a developer from the internal team responsible for the contributed
driver,
- subscribe to the openstack-dev mailing list and pay attention to the
[Ironic] discussions,
- attend and participate in the weekly IRC meeting,
- participate in Ironic code reviews (1 per day would be sufficient),
- occasionally make code contributions that are not directly related to the
vendor driver (eg, help with general bug fixes),
- and be present in #openstack-ironic IRC channel on Freenode during their
working hours so that other developers can reach them easily when necessary.

This doesn't have to consume much of a developer's time, roughly 5 - 10
hrs/week should be enough.

I recognize that the costs to do this (both human and hardware) are
non-trivial, but I believe they are not unreasonable and are crucial to
ensuring a consistent quality across all the drivers in Ironic.


Inviting comments and feedback...

Regards,
Devananda
___

Re: [openstack-dev] [ironic] Disk Eraser

2014-01-20 Thread Devananda van der Veen
On Sun, Jan 19, 2014 at 9:30 PM, Robert Collins
wrote:

> On 20 January 2014 18:10, Alan Kavanagh 
> wrote:
> > +1, that is another point Rob. When I started this thread my main
> interest was disk and then firmware. It is clear we really need to have a
> clear discussion on this, as imho I would not be supportive or lease
> baremetal to tenants if I can not guarantee the service, otherwise the cost
> of risking tenants to adverse attacks and data screening are far greater
> that the revenue generated from the service. When it comes to the tenants
> in our DC we consider all tenants need to be provided a guarantee of the
> baremetal service on the disk, loaders etc etc, otherwise its difficult to
> assure your customer.
>
> I think LXC/openVZ/Docker make pretty good compromises in this space
> BTW - low overhead, bare metal performance, no root access to the
> hardware.
>
>
++

Eg, when sized for single-instance-per-host, you'll get very similar
performance without the disk/firmware/etc security issues.

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Third-party drivers and testing

2014-01-21 Thread Devananda van der Veen
On Sun, Jan 19, 2014 at 7:47 PM, Christopher Yeoh  wrote:

>
> On Mon, Jan 20, 2014 at 6:17 AM, Devananda van der Veen <
> devananda@gmail.com> wrote:
>
>> 1. each driver must adhere to the existing driver interfaces.
>> 2. each driver must have comprehensive unit test coverage and sufficient
>> inline documentation.
>> 3. vendors are responsible for fixing bugs in their driver in a timely
>> fashion.
>> 4. vendors commit to have third-party testing on a supported hardware
>> platform implemented by the J3 milestone.
>>
>
> Perhaps it would be better to set the deadline to the J2 milestone? There
> is already a lot of pressure on CI at the -3 milestone and I think it would
> be good to avoid adding the inevitable last minute bringup of CI systems to
> the same time period. Also gives you the opportunity to give vendors a bit
> of leeway in they're a bit late and still have time to decide if any
> drivers should be removed during the J cycle rather than waiting until I.
>
>
That's a fair point, Chris. Thanks for bringing it up. I'm OK with that as
it is still ~6 months out, and I want to ensure vendors have enough lead
time.

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Third-party drivers and testing

2014-01-21 Thread Devananda van der Veen
On Mon, Jan 20, 2014 at 8:29 AM, Jay Pipes  wrote:

> On Sun, Jan 19, 2014 at 2:47 PM, Devananda van der Veen <
> devananda@gmail.com> wrote:
>
>> Hi all,
>>
>> I've been thinking about how we should treat third-party drivers in
>> Ironic for a while, and had several discussions at the Hong Kong summit and
>> last week at LCA. Cinder, Nova, Neutron, and TripleO are all having similar
>> discussions, too. What follows is a summary of my thoughts and a proposal
>> for our project's guidelines to vendors.
>>
>
> I applaud the effort. I'm actually currently in the process of writing up
> instructions for Cinder and Neutron vendors interested in constructing a
> 3rd party testing platform that uses the openstack-infra tooling as much as
> possible. (Yes, I know there is existing documentation on ci.openstack.org,
> but based on discussions this past week with the Neutron vendors
> implementing these test platforms, there's a number of areas that are
> poorly understood and some more detail is clearly needed).
>
> I would hope the docs I'm putting together for Cinder and Neutron will
> require little, if any, changes for similar instructions for Ironic 3rd
> party testers.
>

Awesome! I would hope so, too. I would love to be included in that process
(eg, tag me on the review if there is one, or what ever).

The Infra team is also working on some broad guidelines too:
https://review.openstack.org/#/c/63478/



>
>
>> Before requiring that degree of testing, I would like to be able to
>> direct vendors at a working test suite which they can copy. I expect us to
>> have functional testing for the PXE and SSH drivers within Tempest and
>> devstack / devstack-gate either late in this cycle or early next cycle.
>> Around the same time, I believe TripleO will switch to using Ironic in
>> their test suite, so we'll have coverage of the IPMI driver on real
>> hardware as well (this may be periodic coverage rather than per-test
>> feedback initially).
>>
>
> I think using Tempest as that working test suite would be the best way to
> go. Cinder 3rd party testing is going in this direction (the cinder_cert/
> directory in devstack simply sets up Tempest, sets the appropriate Cinder
> driver properly in the cinder.conf and then runs the Tempest Volume API
> tests. A similar approach would work for Ironic, I believe, once the Ironic
> API tests are complete for Tempest.
>

++

Right now, we've got Ironic API tests in Tempest for CRUD operations using
our "fake" driver. These will get extended to support the pxe_ssh driver,
with devstack spinning up some small VMs and feeding their credentials to
tempest, so we can mock bare metal within devstack-gate and our local test
environments. Tempest will then perform functional tests on the driver
interfaces (eg, power on/off, deploy/undeploy, etc).

Using the same approach, but on real hardware, 3rd party testers could feed
in a .csv of their hardware credentials to tempest and do the same tests.
At least, that's my plan :)


>
>> I am proposing that we provisionally allow in vendor drivers this cycle
>> with the following requirements, and that we draw the line at the J3
>> milestone to give everyone ample time to get testing up on real hardware --
>> without blocking innovation now. At that time, we may kick out third-party
>> drivers if these criteria haven't been met.
>>
>> 1. each driver must adhere to the existing driver interfaces.
>> 2. each driver must have comprehensive unit test coverage and sufficient
>> inline documentation.
>> 3. vendors are responsible for fixing bugs in their driver in a timely
>> fashion.
>> 4. vendors commit to have third-party testing on a supported hardware
>> platform implemented by the J3 milestone.
>> 5. vendors contribute a portion of at least one developer's time to
>> upstream participation.
>>
>
> All good things. However, specificity is critical here. What does
> "sufficient inline documentation" entail? Who is the arbiter? What does
> "comprehensive unit test coverage" mean? 90%? 100%? What does "timely
> fashion" mean? Within 2 days? By X milestone?
>

Point 2 should be getting enforced by the existing Ironic-core review team
when the driver code is submitted. It's ultimately up to our reviewers to
determine whether the inline documentation is, at a minimum, up to the
standard that we hold our own code to, and the same is true for the unit
tests. I think we're doing a pretty good job as a team of that today, and
my point in calling it out here is to let vendors know the expectation --
they don't just get to submit a driver and have

[openstack-dev] [Ironic] Node groups and multi-node operations

2014-01-22 Thread Devananda van der Veen
So, a conversation came again up today around whether or not Ironic will,
in the future, support operations on groups of nodes. Some folks have
expressed a desire for Ironic to expose operations on groups of nodes;
others want Ironic to host the hardware-grouping data so that eg. Heat and
Tuskar can make more intelligent group-aware decisions or represent the
groups in a UI. Neither of these have an implementation in Ironic today...
and we still need to implement a host of other things before we start on
this. FWIW, this discussion is meant to stimulate thinking ahead to things
we might address in Juno, and aligning development along the way.

There's also some refactoring / code cleanup which is going on and worth
mentioning because it touches the part of the code which this discussion
impacts. For our developers, here is additional context:
* our TaskManager class supports locking >1 node atomically, but both the
driver API and our REST API only support operating on one node at a time.
AFAIK, no where in the code do we actually pass a group of nodes.
* for historical reasons, our driver API requires both a TaskManager and a
Node object be passed to all methods. However, the TaskManager object
contains a reference to the Node(s) which it has acquired, so the node
parameter is redundant.
* we've discussed cleaning this up, but I'd like to avoid refactoring the
same interfaces again when we go to add group-awareness.


I'll try to summarize the different axis-of-concern around which the
discussion of node groups seem to converge...

1: physical vs. logical grouping
- Some hardware is logically, but not strictly physically, grouped. Eg, 1U
servers in the same rack. There is some grouping, such as failure domain,
but operations on discrete nodes are discreet. This grouping should be
modeled somewhere, and some times a user may wish to perform an operation
on that group. Is a higher layer (tuskar, heat, etc) sufficient? I think so.
- Some hardware _is_ physically grouped. Eg, high-density cartridges which
share firmware state or a single management end-point, but are otherwise
discrete computing devices. This grouping must be modeled somewhere, and
certain operations can not be performed on one member without affecting all
members. Things will break if each node is treated independently.

2: performance optimization
- Some operations may be optimized if there is an awareness of concurrent
identical operations. Eg, deploy the same image to lots of nodes using
multicast or bittorrent. If Heat were to inform Ironic that this deploy is
part of a group, the optimization would be deterministic. If Heat does not
inform Ironic of this grouping, but Ironic infers it (eg, from timing of
requests for similar actions) then optimization is possible but
non-deterministic, and may be much harder to reason about or debug.

3: APIs
- Higher layers of OpenStack (eg, Heat) are expected to orchestrate
discrete resource units into a larger group operation. This is where the
grouping happens today, but already results in inefficiencies when
performing identical operations at scale. Ironic may be able to get around
this by coalescing adjacent requests for the same operation, but this would
be non-deterministic.
- Moving group-awareness or group-operations into the lower layers (eg,
Ironic) looks like it will require non-trivial changes to Heat and Nova,
and, in my opinion, violates a layer-constraint that I would like to
maintain. On the other hand, we could avoid the challenges around
coalescing. This might be necessary to support physically-grouped hardware
anyway, too.


If Ironic coalesces requests, it could be done in either the
ConductorManager layer or in the drivers themselves. The difference would
be whether our internal driver API accepts one node or a set of nodes for
each operation. It'll also impact our locking model. Both of these are
implementation details that wouldn't affect other projects, but would
affect our driver developers.

Also, until Ironic models physically-grouped hardware relationships in some
internal way, we're going to have difficulty supporting that class of
hardware. Is that OK? What is the impact of not supporting such hardware?
It seems, at least today, to be pretty minimal.


Discussion is welcome.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Ironic 2014.1.b2 (Icehouse-2) developer milestone available

2014-01-23 Thread Devananda van der Veen
Hi all,

The Icehouse-2 milestone of Ironic is now available, and a new version
(0.1.1) of our python client library has just been pushed to pypi.

Here is a list of what was completed during the Icehouse-2 development
cycle:
https://launchpad.net/ironic/+milestone/icehouse-2

Tarballs are available at:
http://tarballs.openstack.org/ironic/ironic-2014.1.b2.tar.gz
https://pypi.python.org/packages/source/p/python-ironicclient/python-ironicclient-0.1.1.tar.gz

Note that Ironic is not ready to be used as a replacement for
nova-baremetal at this time. We are targeting this to the next milestone,
scheduled for March 6th.

Regards,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] File Injection (and the lack thereof)

2014-01-24 Thread Devananda van der Veen
In going through the bug list, I spotted this one and would like to discuss
it:

"can't disable file injection for bare metal"
https://bugs.launchpad.net/ironic/+bug/1178103

There's a #TODO in Ironic's PXE driver to *add* support for file injection,
but I don't think we should do that. For the various reasons that Robert
raised a while ago (
http://lists.openstack.org/pipermail/openstack-dev/2013-May/008728.html),
file injection for Ironic instances is neither scalable nor secure. I'd
just as soon leave support for it completely out.

However, Michael raised an interesting counter-point (
http://lists.openstack.org/pipermail/openstack-dev/2013-May/008735.html)
that some deployments may not be able to use cloud-init due to their
security policy.

As we don't have support for config drives in Ironic yet, and we won't
until there is a way to control either virtual media or network volumes on
ironic nodes. So, I'd like to ask -- do folks still feel that we need to
support file injection?


-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] File Injection (and the lack thereof)

2014-01-24 Thread Devananda van der Veen
Awesome! But, Ironic will still need a way to inject the SSL cert into the
instance, eg. config-drive over virtual media, or something.

-D
 On Jan 24, 2014 2:32 PM, "Clint Byrum"  wrote:

> Excerpts from Joshua Harlow's message of 2014-01-24 14:17:38 -0800:
> > Cloud-init 0.7.5 (not yet released) will have the ability to read from an
> > ec2-metadata server using SSL.
> >
> > In a recent change I did we now use requests which correctly does SSL for
> > the ec2-metadata/ec2-userdata reading.
> >
> > -
> http://bazaar.launchpad.net/~cloud-init-dev/cloud-init/trunk/revision/910
> >
> > For ssl-certs that it will use by default (if not provided) will be
> looked
> > for in the following locations.
> >
> > - /var/lib/cloud/data/ssl
> >- cert.pem
> >- key
> > - /var/lib/cloud/instance/data/ssl
> >- cert.pem
> >- key
> > - ... Other custom paths (typically datasource dependent)
> >
> > So I think in 0.7.5 for cloud-init this support will be improved and as
> > long as there is a supporting ssl ec2 metadata endpoint then this should
> > all work out fine...
>
> \o/ my heroes! ;)
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [TripleO] [Ironic] mid-cycle meetup?

2014-01-24 Thread Devananda van der Veen
On Fri, Jan 24, 2014 at 2:03 PM, Robert Collins
wrote:

> This was meant to go to -dev, not -operators. Doh.
>
>
> -- Forwarded message --
> From: Robert Collins 
> Date: 24 January 2014 08:47
> Subject: [TripleO] mid-cycle meetup?
> To: "openstack-operat...@lists.openstack.org"
> 
>
>
> Hi, sorry for proposing this at *cough* the mid-way point [christmas
> shutdown got in the way of internal acks...], but who would come if
> there was a mid-cycle meetup? I'm thinking the HP sunnyvale office as
> a venue.
>
> -Rob
>


Hi!

I'd like to co-locate the Ironic midcycle meetup, as there's a lot of
overlap between our team's needs and facilitating that collaboration will
be good. I've added the [Ironic] tag to the subject to pull in folks who
may be filtering on this project specifically. Please keep us in the loop!

Sunnyvale is easy for me, so I'll definitely be there.

Cheers,
Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Node groups and multi-node operations

2014-01-26 Thread Devananda van der Veen
On Sat, Jan 25, 2014 at 7:11 AM, Clint Byrum  wrote:
>
> Excerpts from Robert Collins's message of 2014-01-25 02:47:42 -0800:
> > On 25 January 2014 19:42, Clint Byrum  wrote:
> > > Excerpts from Robert Collins's message of 2014-01-24 18:48:41 -0800:
> >
> > >> > However, in looking at how Ironic works and interacts with Nova, it
> > >> > doesn't seem like there is any distinction of data per-compute-node
> > >> > inside Ironic.  So for this to work, I'd have to run a whole bunch of
> > >> > ironic instances, one per compute node. That seems like something we
> > >> > don't want to do.
> > >>
> > >> Huh?
> > >>
> > >
> > > I can't find anything in Ironic that lets you group nodes by anything
> > > except chassis. It was not a serious discussion of how the problem would
> > > be solved, just a point that without some way to tie ironic nodes to
> > > compute-nodes I'd have to run multiple ironics.
> >
> > I don't understand the point. There is no tie between ironic nodes and
> > compute nodes. Why do you want one?
> >
>
> Because sans Ironic, compute-nodes still have physical characteristics
> that make grouping on them attractive for things like anti-affinity. I
> don't really want my HA instances "not on the same compute node", I want
> them "not in the same failure domain". It becomes a way for all
> OpenStack workloads to have more granularity than "availability zone".

Yes, and with Ironic, these same characteristics are desirable but are
no longer properties of a nova-compute node; they're properties of the
hardware which Ironic manages.

In principle, the same (hypothetical) failure-domain-aware scheduling
could be done if Ironic is exposing the same sort of group awareness,
as long as the nova 'ironic" driver is passing that information up to
the scheduler in a sane way. In which case, Ironic would need to be
representing such information, even if it's not acting on it, which I
think is trivial for us to do.

>
> So if we have all of that modeled in compute-nodes, then when adding
> physical hardware to Ironic one just needs to have something to model
> the same relationship for each physical hardware node. We don't have to
> do it by linking hardware nodes to compute-nodes, but that would be
> doable for a first cut without much change to Ironic.
>

You're trading failure-domain awareness for fault-tolerance in your
control plane. by binding hardware to nova-compute. Ironic is designed
explicitly to decouple the instances of Ironic (and Nova) within the
control plane from the hardware it's managing. This is one of the main
shortcomings of nova baremetal, and it doesn't seem like a worthy
trade, even for a first approximation.

> > >> The changes to Nova would be massive and invasive as they would be
> > >> redefining the driver apiand all the logic around it.
> > >>
> > >
> > > I'm not sure I follow you at all. I'm suggesting that the scheduler have
> > > a new thing to filter on, and that compute nodes push their unique ID
> > > down into the Ironic driver so that while setting up nodes in Ironic one
> > > can assign them to a compute node. That doesn't sound massive and
> > > invasive.

This is already being done *within* Ironic as nodes are mapped
dynamically to ironic-conductor instances; the coordination for
failover/takeover needs to be improved, but that's incremental at this
point. Moving this mapping outside of Ironic is going to be messy and
complicated, and breaks the abstraction layer. The API change may seem
small, but it will massively overcomplicate Nova by duplicating all
the functionality of Ironic-conductor in another layer of the stack.

> >
> > I think we're perhaps talking about different things - in the section
> > you were answering, I thought he was talking about whether the API
> > should offer operations on arbitrary sets of nodes at once, or whether
> > each operation should be a separate API call vs what I now think you
> > were talking about which was whether operations should be able to
> > describe logical relations to other instances/nodes. Perhaps if we use
> > the term 'batch' rather than 'group' to talk about the
> > multiple-things-at-once aspect, and grouping to talk about the
> > primarily scheduler related problems of affinity / anti affinity etc,
> > we can avoid future confusion.
> >
>
> Yes, thats a good point. I was talking about modeling failure domains
> only.  Batching API requests seems like an entirely different thing.
>

I was conflating these terms in that I was talking about "grouping
actions" (batching) and "groups of nodes" (groups). That said, there
are really three distinct topics here. Let's break groups down
further: "logical group" for failure domains, and "hardware group" for
hardware which is physically interdependent in such a way that changes
to one node affect other node(s).


Regards,
Deva

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/

Re: [openstack-dev] [Ironic][Ceilometer]bp:send-data-to-ceilometer

2014-01-29 Thread Devananda van der Veen
On Wed, Jan 29, 2014 at 7:22 AM, Gordon Chung  wrote:

> > Meter Names:
> > fanspeed, fanspeed.min, fanspeed.max, fanspeed.status
> > voltage, voltage.min, voltage.max, voltage.status
> > temperature, temperature.min, temperature.max, temperature.status
> >
> > 'FAN 1': {
> > 'current_value': '4652',
> > 'min_value': '4200',
> > 'max_value': '4693',
> > 'status': 'ok'
> > }
> > 'FAN 2': {
> > 'current_value': '4322',
> > 'min_value': '4210',
> > 'max_value': '4593',
> > 'status': 'ok'
> > },
> > 'voltage': {
> > 'Vcore': {
> > 'current_value': '0.81',
> > 'min_value': '0.80',
> > 'max_value': '0.85',
> > 'status': 'ok'
> > },
> > '3.3VCC': {
> > 'current_value': '3.36',
> > 'min_value': '3.20',
> > 'max_value': '3.56',
> > 'status': 'ok'
> > },
> > ...
> > }
> > }
>
>
> are FAN 1, FAN 2, Vcore, etc... variable names or values that would
> consistently show up? if the former, would it make sense to have the meters
> be similar to fanspeed: where trait is FAN1, FAN2, etc...? if the
> meter is just fanspeed, what would the volume be? FAN 1's current_value?
>

Different hardware will expose different number of each of these things. In
Haomeng's first proposal, all hardware would expose a "fanspeed" and a
"voltage" category, but with a variable number of meters in each category.
In the second proposal, it looks like there are no categories and hardware
exposes a variable number of meters whose names adhere to some consistent
structure (eg, "FAN ?" and "V???").

It looks to me like the question is whether or not to use categories to
group similar meters.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] [Ironic] Roadmap towards heterogenous hardware support

2014-01-30 Thread Devananda van der Veen
As far as nova-scheduler and Ironic go, I believe this is a solved problem.
Steps are:
- enroll hardware with proper specs (CPU, RAM, disk, etc)
- create flavors based on hardware specs
- scheduler filter matches requests exactly

There are, I suspect, three areas where this would fall short today:
- exposing to the user when certain flavors shouldn't be picked, because
there is no more hardware available which could match it
- ensuring that hardware is enrolled with the proper specs //
trouble-shooting when it is not
- a UI that does these well

If I understand your proposal correctly, you're suggesting that we
introduce non-deterministic behavior. If the scheduler filter falls back to
>$flavor when $flavor is not available, even if the search is in ascending
order and upper-bounded by some percentage, the user is still likely to get
something other than what they requested. From a utilization and
inventory-management standpoint, this would be a headache, and from a user
standpoint, it would be awkward. Also, your proposal is only addressing the
case where hardware variance is small; it doesn't include a solution for
deployments with substantially different hardware.

I don't think introducing a non-deterministic hack when the underlying
services already work, just to provide a temporary UI solution, is
appropriate. But that's just my opinion.

Here's an alternate proposal to support same-arch but different
cpu/ram/disk hardware environments:
- keep the scheduler filter doing an exact match
- have the UI only allow the user to define one flavor, and have that be
the lowest common denominator of available hardware
- assign that flavor's properties to all nodes -- basically lie about the
hardware specs when enrolling them
- inform the user that, if they have heterogeneous hardware, they will get
randomly chosen nodes from their pool, and that scheduling on heterogeneous
hardware will be added in a future UI release

This will allow folks who are using TripleO at the commandline to take
advantage of their heterogeneous hardware, instead of crippling
already-existing functionality, while also allowing users who have slightly
(or wildly) different hardware specs to still use the UI.


Regards,
Devananda



On Thu, Jan 30, 2014 at 7:14 AM, Tomas Sedovic  wrote:

> On 30/01/14 15:53, Matt Wagner wrote:
>
>> On 1/30/14, 5:26 AM, Tomas Sedovic wrote:
>>
>>> Hi all,
>>>
>>> I've seen some confusion regarding the homogenous hardware support as
>>> the first step for the tripleo UI. I think it's time to make sure we're
>>> all on the same page.
>>>
>>> Here's what I think is not controversial:
>>>
>>> 1. Build the UI and everything underneath to work with homogenous
>>> hardware in the Icehouse timeframe
>>> 2. Figure out how to support heterogenous hardware and do that (may or
>>> may not happen within Icehouse)
>>>
>>> The first option implies having a single nova flavour that will match
>>> all the boxes we want to work with. It may or may not be surfaced in the
>>> UI (I think that depends on our undercloud installation story).
>>>
>>> Now, someone (I don't honestly know who or when) proposed a slight step
>>> up from point #1 that would allow people to try the UI even if their
>>> hardware varies slightly:
>>>
>>> 1.1 Treat similar hardware configuration as equal
>>>
>>> The way I understand it is this: we use a scheduler filter that wouldn't
>>> do a strict match on the hardware in Ironic. E.g. if our baremetal
>>> flavour said 16GB ram and 1TB disk, it would also match a node with 24GB
>>> ram or 1.5TB disk.
>>>
>>> The UI would still assume homogenous hardware and treat it as such. It's
>>> just that we would allow for small differences.
>>>
>>> This *isn't* proposing we match ARM to x64 or offer a box with 24GB RAM
>>> when the flavour says 32. We would treat the flavour as a lowest common
>>> denominator.
>>>
>>
>> Does Nova already handle this? Or is it built on exact matches?
>>
>
> It's doing an exact match as far as I know. This would likely involve
> writing a custom filter for nova scheduler and updating nova.conf
> accordingly.
>
>
>
>> I guess my question is -- what is the benefit of doing this? Is it just
>> so people can play around with it? Or is there a lasting benefit
>> long-term? I can see one -- match to the closest, but be willing to give
>> me more than I asked for if that's all that's available. Is there any
>> downside to this being permanent behavior?
>>
>
> Absolutely not a long term thing. This is just to let people play around
> with the MVP until we have the proper support for heterogenous hardware in.
>
> It's just an idea that would increase the usefulness of the first version
> and should be trivial to implement and take out.
>
> If neither is the case or if we will in fact manage to have a proper
> heterogenous hardware support early (in Icehouse), it doesn't make any
> sense to do this.
>
>
>> I think the lowest-common-denominator match will be familiar to
>> sysadmins, too. Wa

Re: [openstack-dev] [TripleO] [Ironic] Roadmap towards heterogenous hardware support

2014-01-30 Thread Devananda van der Veen
I was responding based on "Treat similar hardware configuration as equal". When
there is a very minor difference in hardware (eg, 1TB vs 1.1TB disks),
enrolling them with the same spec (1TB disk) is sufficient to solve all
these issues and mask the need for multiple flavors, and the hardware
wouldn't need to be re-enrolled. My suggestion does not address the desire
to support significant variation in hardware specs, such as 8GB RAM vs 64GB
RAM, in which case, there is no situation in which I think those
differences should be glossed over, even as a short-term hack in Icehouse.

"if our baremetal flavour said 16GB ram and 1TB disk, it would also match a
node with 24GB ram or 1.5TB disk."

I think this will lead to a lot of confusion, and difficulty with inventory
/ resource management. I don't think it's suitable even as a
first-approximation.

Put another way, I dislike the prospect of removing currently-available
functionality (an exact-match scheduler and support for multiple flavors)
to enable ease-of-use in a UI. Not that I dislike UIs or anything... it
just feels like two steps backwards. If the UI is limited to homogeneous
hardware, accept that; don't take away heterogeneous hardware support from
the rest of the stack.


Anyway, it sounds like Robert has a solution in mind, so this is all moot :)

Cheers,
Devananda



On Thu, Jan 30, 2014 at 1:30 PM, Jay Dobies  wrote:

>  Wouldn't lying about the hardware specs when registering nodes be
>> problematic for upgrades?  Users would have
>> to re-register their nodes.
>>
>
> This was my first impression too, the line "basically lie about the
> hardware specs when enrolling them". It feels more wrong to have the user
> provide false data than it does to ignore that data for Icehouse. I'd
> rather have the data correct now and ignore it than tell users when they
> upgrade to Juno they have to re-enter all of their node data.
>
> It's not heterogenous v. homogeneous support. It's whether or not we use
> the data. We can capture it now and not provide the user the ability to
> differentiate what something is deployed on. That's a heterogeneous
> enrivonment, but just a lack of fine-grained control over where the
> instances fall.
>
> And all of this is simply for the time constraints of Icehouse's first
> pass. A known, temporary limitation.
>
>
>> One reason why a custom filter feels attractive is that it provides us
>> with a clear upgrade path:
>>
>> Icehouse
>>* nodes are registered with correct attributes
>>* create a custom scheduler filter that allows any node to match
>>* users are informed that for this release, Tuskar will not
>> differentiate between heterogeneous hardware
>>
>> J-Release
>>* implement the proper use of flavors within Tuskar, allowing Tuskar
>> to work with heterogeneous hardware
>>* work with nova regarding scheduler filters (if needed)
>>* remove the custom scheduler filter
>>
>>
>> Mainn
>>
>> 
>>
>>
>> As far as nova-scheduler and Ironic go, I believe this is a solved
>> problem. Steps are:
>> - enroll hardware with proper specs (CPU, RAM, disk, etc)
>> - create flavors based on hardware specs
>> - scheduler filter matches requests exactly
>>
>> There are, I suspect, three areas where this would fall short today:
>> - exposing to the user when certain flavors shouldn't be picked,
>> because there is no more hardware available which could match it
>> - ensuring that hardware is enrolled with the proper specs //
>> trouble-shooting when it is not
>> - a UI that does these well
>>
>> If I understand your proposal correctly, you're suggesting that we
>> introduce non-deterministic behavior. If the scheduler filter falls
>> back to >$flavor when $flavor is not available, even if the search
>> is in ascending order and upper-bounded by some percentage, the user
>> is still likely to get something other than what they requested.
>>  From a utilization and inventory-management standpoint, this would
>> be a headache, and from a user standpoint, it would be awkward.
>> Also, your proposal is only addressing the case where hardware
>> variance is small; it doesn't include a solution for deployments
>> with substantially different hardware.
>>
>> I don't think introducing a non-deterministic hack when the
>> underlying services already work, just to provide a temporary UI
>> solution, is appropriate. But that's just my opinion.
>>
>> Here's an alternate proposal to support same-arch but different
>> cpu/ram/disk hardware environments:
>> - keep the scheduler filter doing an exact match
>> - have the UI only allow the user to define one flavor, and have
>> that be the lowest common denominator of available hardware
>> - assign that flavor's properties to all nodes -- basically lie
>> about the hardware specs when enrolling

Re: [openstack-dev] [Ironic] PXE driver deploy issues

2014-01-31 Thread Devananda van der Veen
I think your driver should implement a wrapper around both VendorPassthru
interfaces and call each appropriately, depending on the request. This
keeps each VendorPassthru driver separate, and encapsulates the logic about
when to call each of them in the driver layer.

As an aside, this is a code path (multiplexed VendorPassthru interfaces)
that we haven't exercised yet, but has come up in other discussions
recently too, so if you run into something else that looks awkward, please
jump into IRC and we'll help hash it out.

Cheers,
Devananda


On Fri, Jan 31, 2014 at 1:34 AM, Rohan Kanade wrote:

> > The deploy ramdisk should ping back the Ironic API and call the
> > vendor_passthru/pass_deploy_info with the iscsi informations etc... So,
> > make sure you've built your deploy ramdisk after this patch landed on
>
> Any strategies on how to verify if the ramdisk has been deployed on the
> server?
>
> Also, I am using different Power and VendorPassthru interfaces (unreleased
> SeaMicro), And i am using PXE only for Deploy interface. How can the
> pass_deploy_info be called by the ramdisk since it is not implemented by
> SeaMicro VendorPassthru?
>
> Regards,
> Rohan Kanade
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO] [Ironic] Roadmap towards heterogenous hardware support

2014-01-31 Thread Devananda van der Veen
On Fri, Jan 31, 2014 at 1:03 PM, Tzu-Mainn Chen  wrote:

> So after reading the replies on this thread, it seems like I (and others
> advocating
> a custom scheduler) may have overthought things a bit.  The reason this
> route was
> suggested was because of conflicting goals for Icehouse:
>
> a) homogeneous nodes (to simplify requirements)
> b) support diverse hardware sets (to allow as many users as possible to
> try Tuskar)
>
> Option b) requires either a custom scheduler or forcing nodes to have the
> same attributes,
> and the answer to that question is where much of the debate lies.
>
> However, taking a step back, maybe the real answer is:
>
> a) homogeneous nodes
> b) document. . .
>- **unsupported** means of "demoing" Tuskar (set node attributes to
> match flavors, hack
>  the scheduler, etc)
>- our goals of supporting heterogeneous nodes for the J-release.
>
> Does this seem reasonable to everyone?
>

+1

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Functional testing, dependencies, etc

2014-02-01 Thread Devananda van der Veen
Hi all,

I've got a few updates to share on the status of functional testing of
Ironic.

Firstly, early last week, Ironic's tempest tests were added to our check
and gate pipeline, and non-voting checks were added to devstack and tempest
pipelines as well. These tests start the Ironic services in devstack, and
then exercise CRUD actions via our python client. The current tempest tests
exercise our integration with Keystone, and the integration of our internal
components (ir-api, ir-cond, mysql/pgsql, rabbit).

Since the project plans include integration with Glance, Keystone, Nova,
and Neutron, we initially enabled all of those in our devstack-gate
environment. However, due to the unpredictable nature of Neutron's test
suite, our gate was blocked as soon as it was enabled, and on Tuesday I
disabled Neutron in our devstack-gate runs.

This is not ideal. Ironic's PXE deployment driver depends on Neutron to
control the DHCP BOOT [4] option for nodes, so to do automated functional
testing of a PXE deployment, we will need to re-enable Neutron in
devstack-gate. We still have work to do before we are ready for end-to-end
deploy testing, so I'm hoping Neutron becomes a bit more stable by then.
I'm not thrilled about the prospects if it is not.

Our Nova driver [1] hasn't landed yet, and probably needs further
refinement before the Nova folks will be ready to land it, but it *is*
functional. Late in the week, Lucas and Chris each did an end-to-end
deployment with it!

So, today, we're not functionally testing Nova with an "ironic" virt driver
[2] -- even though Nova is enabled and tested by devstack-gate in Ironic's
pipeline. This was an oversight in my review of our devstack-gate tests:
we're currently gating on Nova using the libvirt driver. It's unrelated to
Ironic and I don't believe it should be exercised in Ironic's test suite.
Furthermore, we tripped a bug in the libvirt driver by doing file injection
with libguestfs. This has, once again, broken Ironic's gate.

I've proposed a temporary solution [3] that will cause libvirt to be tested
using configdrive in our pipe, as it is in all other projects except
Neutron. A better solution will be to not gate Ironic on the libvirt driver
at all.

The path forward that I see is:
- changes to land in devstack and tempest to create a suitable environment
for functional testing (eg, creating VMs and enrolling them with Ironic),
- the Nova "ironic" driver to be landed, with adequate unit tests, but no
integration tests,
- we set up an experimental devstack-gate pipe to load that Nova driver and
do integration tests between Ironic and Nova, Glance, and Neutron,
- iteratively fix bugs in devstack, ironic, our nova driver, and if
necessary, neutron, until this can become part of our gate.

In the meantime, I don't see a point in those services being enabled and
tested in our check or gate pipelines.


Regards,
Devananda


[1] https://review.openstack.org/5132
[2] https://review.openstack.org/70348
[3] https://review.openstack.org/70544
[4]
http://docs.openstack.org/api/openstack-network/2.0/content/extra-dhc-opt-ext-update.html
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] PXE driver deploy issues

2014-02-03 Thread Devananda van der Veen
On Fri, Jan 31, 2014 at 12:13 PM, Devananda van der Veen <
devananda@gmail.com> wrote:

> I think your driver should implement a wrapper around both VendorPassthru
> interfaces and call each appropriately, depending on the request. This
> keeps each VendorPassthru driver separate, and encapsulates the logic about
> when to call each of them in the driver layer.
>
>
I've posted an example of this here:

  https://review.openstack.org/#/c/70863/

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] January review redux

2014-02-04 Thread Devananda van der Veen
The last month and a half had most of our team out for holiday leave at
some point, and the review stats reflect that. I had hoped our review queue
would come down once we all got back from the holidays, but that hasn't
happened. In fact, our review queue has grown significantly  Perhaps
it's a combination of the usual nearing-end-of-cycle-rush and our gate
breaking twice in the last 10 days

Here are the stats for the last month [1]

Total reviews: 569 (19.0/day)
 Total reviewers: 28 (avg 0.7 reviews/day)
Total reviews by core team: 211 (7.0/day)
 Core team size: 6 (avg 1.2 reviews/day)
New patch sets in the last 30 days: 347 (11.6/day)
 Changes involved in the last 30 days: 119 (4.0/day)
New changes in the last 30 days: 93 (3.1/day)
 Changes merged in the last 30 days: 56 (1.9/day)
Changes abandoned in the last 30 days: 13 (0.4/day)
 Changes left in state WIP in the last 30 days: 4 (0.1/day)
Queue growth in the last 30 days: 20 (0.7/day)
 Average number of patches per changeset: 2.9


And here are the current / average stats [2]

 Total Open Reviews: 48
 Waiting on Submitter: 16
 Waiting on Reviewer: 32
 Stats since the latest revision:
  Average wait time: 6 days, 21 hours, 8 minutes
  1rd quartile wait time: 3 days, 15 hours, 59 minutes
  Median wait time: 5 days, 13 hours, 4 minutes
  3rd quartile wait time: 10 days, 6 hours, 2 minutes
  Number waiting more than 7 days: 14

I would very much like to add a few people to our core review team. We need
to increase the pace of reviews to keep up with development, particularly
as we approach our most aggressive milestone and prepare for our first
release. I'd also like to improve our non-US-timezone coverage.

So, I'd like to nominate the following two additions to the ironic-core
team:

Max Lobur
https://review.openstack.org/#/q/reviewer:mlobur%2540mirantis.com+project:openstack/ironic,n,z

Roman Prykhodchenko
https://review.openstack.org/#/q/reviewer:rprikhodchenko%2540mirantis.com+project:openstack/ironic,n,z

I believe that the review feedback that I've seen from both of them shows a
good understanding of the project architecture and the direction that I'd
like Ironic to go.

Max has been consistently reviewing patches for the last few months. His
input has been very valuable in spotting issues early on, and clearly show
a good grasp of the project's architecture. He is frequently engaged with
the existing team during discussions in IRC and in the weekly meetings.

Roman was involved in Ironic early on, then spent a few months focusing on
our devstack and tempest patches [3]. He continues to help with the
ironic-related work in those projects, is engaged in discussions both in
channel and during meetings, and has resumed doing reviews on a regular
basis.

With this, we would have one core in NZ, two in US, and three in EU time
zones.


Regards,
Devananda


[1] - http://russellbryant.net/openstack-stats/ironic-reviewers-30.txt
[2] - http://russellbryant.net/openstack-stats/ironic-openreviews.html
[3] -
https://review.openstack.org/#/q/owner:rprikhodchenko%2540mirantis.com,n,z
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][baremetal] Partition layout of image used in nova bare-metal

2014-02-07 Thread Devananda van der Veen
On Fri, Feb 7, 2014 at 2:00 AM, Taurus Cheung  wrote:

> Hi,
>
>
>
> I am working on deploying images to bare-metal machines using nova
> bare-metal. In current design, nova bare-metal would first write a
> partition layout of root partition and swap partition, then write the image
> to root partition. It seems that the logic assumes there's no partition
> table inside the image.
>
>
>
> Without code change, does nova bare-metal support writing image with
> partition table embedded in it?
>
>
>
No. This isn't currently supported by the baremetal driver.

It is in the plans for Ironic as part of supporting Windows images, though
it could be used for other OS's as well.
  https://blueprints.launchpad.net/ironic/+spec/windows-disk-image-support

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][baremetal] Support configurable inject items in nova Bare-metal

2014-02-07 Thread Devananda van der Veen
On Fri, Feb 7, 2014 at 2:01 AM, Taurus Cheung  wrote:

> Hi,
>
>
>
> I am working on deploying images to bare-metal machines using nova
> bare-metal. In current design, some files like hostname, network config
> file and meta.json are injected into the image before writing to bare-metal
> machines. Can we control which items to be injected into the image?
>
>
>
We removed this functionality a while back. Instance parameterization
should be happening via cloud-init, not via file injection, if you're using
code from Havana or trunk.

If you're using the baremetal driver from Grizzly, yes, it's doing file
injection. That early driver didn't support DHCP for the instances, so it
had to inject hostname and network config with static IPs.

-Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] [TripleO] Goal setting // progress towards integration

2014-02-12 Thread Devananda van der Veen
Hello ironic developers, tripleo folks, and interested third-parties!

The Icehouse graduation deadline is fast approaching and I think we all
need to take a good look at where Ironic is, compare that to the
requirements that have been laid out by the TC [1] for all Integrated
projects, and prioritize our work aggressively.


We have four areas inside of Ironic that need significant movement in order
to graduate this cycle:

* Critical & High priority bugs
   There are enough of these that I'm not comfortable releasing with the
current state of things. I think we can address all of these bugs in time
if we focus on getting to a minimum-viable-product, rather than on
engineering a perfect solution. (Yes, I'm talking about myself too...) We
have patches in flight for many of them, but the average turn-around time
for our review queue is, in my opinion, higher than it should be [3] given
the size of our team. If we don't collectively speed up the review queue,
it's going to be very difficult to land everything we need to.

* Missing functionality
  We need to match the feature set of Nova baremetal. We're pretty good on
that front, but I'm calling out two blueprints and I'll explain why.

  https://blueprints.launchpad.net/ironic/+spec/serial-console-access
  I believe some folks are using this, even though the TripleO team is not,
and I personally never used the feature in Nova baremetal.
  IBM contributed code to implement this in Ironic, but the patch #64100
has been abandoned for some time. We need to revive this and continue the
work.
  https://blueprints.launchpad.net/ironic/+spec/preserve-ephemeral
  The TripleO team added this feature to Nova baremetal during the current
cycle;  even though it wasn't on our plans at the start of Icehouse, we
need to do it to keep feature parity.

* Testing & QA
  Ironic has API tests in tempest, and these run in Ironic's gate. However,
there are no functional tests today (iow, nothing testing that a PXE deploy
actually works). These are a graduation requirement. Aleksandr has been
working on this, but it's a long way from done. Anyone want to jump in?

* Documentation
  We have CLI and API docs, but we have no installation or deployer docs.
These are also a graduation requirement. No one has stepped up to own this
effort yet.


Additionally, the TC has laid out some (draft) graduation requirements [2]
for projects that duplicate functionality in a pre-existing project -- that
means us, since we're supplanting the old Nova "baremetal" driver. For
reference, here's the pertinent snippet from the draft:

"[T]he new project must reach a level of functionality and maturity such
> that we are ready to deprecate the old code ... including details for how
> users will be able to migrate from the old to the new."



We have two blueprints up that pertain specifically to this requirement:

https://blueprints.launchpad.net/nova/+spec/deprecate-baremetal-driver

The nova.virt.ironic driver is especially important as it represents Nova's
ability to perform the same functionality that is available today with the
baremetal driver. The Project meeting yesterday [4] made it clear that
getting the nova.virt.ironic driver landed is a pre-condition of Ironic's
graduation. This driver is well underway, but still has much work to be
done. We've started to see a few reviews from the Nova team, and have begun
splitting up the patch into smaller, more reviewable chunks.

The blueprint is set "Low" priority. I know the Nova core team is swamped
with review work, but we'll need to start getting regular feedback on this.
Russel suggested that this is suitable for a FFE so we could continue
working on it after I3, which is great - we'll need the extra time. Even
so, getting a little early feedback from nova-core would be very helpful in
case there is major work that they think we need to do.

https://blueprints.launchpad.net/ironic/+spec/migration-from-nova

We will need to provide a data migration tool for existing Nova baremetal
deployments, along with usage documentation. Roman is working on this, but
there still isn't any code up, and I'm getting a bit nervous


That's it for the graduation-critical tasks, but that's not all the work
currently in our review queue or targeted to I3 ...

We also have third-party drivers coming in. Both SeaMicro and HP have
blueprints up for a vendor driver with power and deploy interfaces.
SeaMicro has code already up; HP has promised code very soon. There's also
a blueprint to enable PXE booting of windows images. None of these are
required for graduation, and even though I want to encourage vendors -- and
I think the functionality these blueprints describe is very valuable to the
project -- I question whether we'll have the time and bandwidth to review
them and ensure they're documented. I am not going to set a "code proposal
deadline" as some other projects have done, in part because I expect a lot
of development to happen at the TripleO sprint (March 3 

[openstack-dev] [Ironic] review days

2014-02-12 Thread Devananda van der Veen
Hi again!

I promise this will be a much shorter email than my last one ... :)

I'd like to propose that we find regular day/time to have a recurring code
jam. Here's what it looks like in my head:
- we get at least three core reviewers together
- as many non-core folks as show up are welcome, too
- we tune out all distractions for a few hours
- we pick a patch and all review it

If the author is present, we iterate with the author, and review each
revision they submit while we're all together. If the author is not present
and there are only minor issues, we fix them up in a follow-on patch and
land both at once. If neither of those are possible, we -1 it and move on.

I think we could make very quick progress in our review queue this way. In
particular, I want us to plow through the bug fixes that have been in Fix
Proposed status for a while ...

What do ya'll think of this idea? Useful or a doomed to fail?

What time would work for you? How about Thursdays at 8am PST?


Cheers,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] review days

2014-02-12 Thread Devananda van der Veen
On Wed, Feb 12, 2014 at 2:18 PM, Maksym Lobur  wrote:

> Also, I think 3 hours might be too much
>
I'm happy to start with 2 hours and see how it goes.


On Wed, Feb 12, 2014 at 2:35 PM, Roman Prykhodchenko <
rprikhodche...@mirantis.com> wrote:

> Since there are two core-subteams located in different time zones I
> propose making two sessions:
> - One in the morning in Europe. Let's say at 10 GMT
> - One in the morning in the US at a convenient time.
>
>
We don't exactly have two teams split by timezone... we have core members
in GMT -8 +0 +2 +12, +/- 1hr for DST

Regardless of that, I would rather not split this up. Without some overlap
between US and EU, it may be hard to land much during these code jams. If
only two cores are present, neither one of them can land a new patch
without +2'ing their own work, which we shouldn't do. The point of this is
to be able to rapidly iterate on fixing bugs, and a lot of the important
bug fixes have been proposed by core team members already, so we need
minimum of 3 cores present.

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] first review jam session redux

2014-02-13 Thread Devananda van der Veen
Just a quick follow-up to our first "review jam session". We got 5 patches
landed in the server 3 in the client, and zuul is merging another 5 right
now.

We started an etherpad part-way through
  https://etherpad.openstack.org/p/IronicReviewDay
Let's continue to use that to track work that spins out of these sessions.

I think this was great. We got a lot accomplished in very little time --
let's plan to do this again next Thursday, 8am PST (16:00 GMT).

Let's also have a shorter review session at the same time on Monday
morning, before the meeting.

Cheers!
Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] first review jam session redux

2014-02-13 Thread Devananda van der Veen
On Thu, Feb 13, 2014 at 11:06 AM, Chris K  wrote:

> *I think this was great. We got a lot accomplished in very little time --
> let's plan to do this again **next Thursday*, *8am* *PST*
> Totally +1 from me
>
> *Let's also have a shorter review session at the same time on
> Monday morning, before the meeting.*
> Would this be another review session or recap for the meeting?
>
>
I'm thinking another review session. Besides landing all the things, we'll
also see if there are important things we're blocked on that need to be
brought up, and possibly solve them before the meeting :)

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] [TripleO] Goal setting // progress towards integration

2014-02-13 Thread Devananda van der Veen
On Thu, Feb 13, 2014 at 12:16 PM, Dan Smith  wrote:

> > I would also like to see CI (either third party or in the gate) for
> > the nova driver before merging it. There's a chicken and egg problem
> > here if its in the gate, but I'd like to see it at least proposed as a
> > review.
>
> Yeah, I think that the existing nova-baremetal driver is kinda frozen in
> a pre-deprecation state right now, which gives it a special pass on the
> CI requirement. To me, I think it makes sense to avoid ripping it out
> since it's already on ice.
>

Except it's not actually frozen - at least one blueprint adding new
functionality landed during Icehouse, which we still need to finish porting.
  https://blueprints.launchpad.net/nova/+spec/baremetal-preserve-ephemeral

However, for the Ironic driver, I would definitely rather see real CI up
> _and_ working before we merge it. I think that probably means it will be
> a post-icehouse thing at this point, unless that effort is farther along
> than I think.
>
> At the Nova meetup this week, we had a serious discussion about ripping
> out major drivers that might not make the deadline. I don't think it
> makes sense to rip those out and merge another without meeting the
> requirement.
>

>From Nova's perspective, I agree. Ironic is not as far along with CI as I
had hoped we would be by this point. Now, it's possible that in the next
month or so, we'll make a lot of headway there -- we're certainly going to
try.

AIUI, even if Ironic meets all the other criteria, if we don't have the
Nova driver landed and fully CI'd in time, we won't graduate. Is that
correct?

Since it's hard to tell tone from text, I'm not upset about this -- I knew
from the start that we would need real CI for Ironic, it makes sense from a
perspective of "protect the core", and I've been following the discussions
around third-party testing. I just want to be clear about expectations so
that we can allocate development resources appropriately. We might also
want to consider what it means for baremetal if Ironic doesn't graduate...

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] py26 test failures since Thursday

2014-02-17 Thread Devananda van der Veen
Hi all,

Last Thursday, a series of patches introduced sporadic failures to our
python unit tests. It only showed up on the py26 tests (most of us test
with py27 locally), and it only happened ~ 40% of the time, so we didn't
notice it right away. The issue persisted on Friday and into the weekend as
we tried to reproduce and fix it. If, during this time, Jenkins -1'd your
patch, it may or may not have been valid.

Chris and I landed a fix on Sunday [1] and it looks like everything is back
in order today. Most patches have already had a "recheck bug 1279992" run
on them.

Regards,
Devananda


[1] 7aa2ab905
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Starting to postpone work to Juno

2014-02-24 Thread Devananda van der Veen
Hi all,

For the last few meetings, we've been discussing how to prioritize the work
that we need to get done as we approach the close of Icehouse development.
There's still some distance between where we are and where we need to be --
integration with other projects (eg. Nova), CI testing of that integration
(eg. via devstack), and fixing bugs that we continue to find.

As core reviewers need to focus their time during the last week of I-3,
we've discussed postponing cosmetic changes, particularly patches that just
refactor code without any performance or feature benefit, to the start of
Juno. [1] So, later today I am going to block patches that do not have
important functional changes and are non-trivial in scope (eg, take more
than a minute to read), are related to low-priority or wishlist items, or
are not targeted to Icehouse.

Near the end of the week, I will retarget incomplete blueprints to the Juno
release.

Next week is the TripleO developer sprint, which coincides with the close
of I-3. Many Ironic developers and more than half of our core review team
will also be there. This will give us a good opportunity to hammer out
testing and integration issues and work on bug fixes.

Over the next month, I would like us to stabilize what we have, add further
integration and functional testing to our gate, and write deployer/usage
documentation.

Regards,
Devananda


[1]

We actually voted on this last week, I didn't follow through, and Chris
reminded me during the meeting today...

http://eavesdrop.openstack.org/meetings/ironic/2014/ironic.2014-02-17-19.00.html
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] review days

2014-02-26 Thread Devananda van der Veen
I wanted to take a moment first to thank all the reviewers who have been
getting up early // staying late to join in these review sessions -- I
believe they've been tremendously helpful in unblocking a lot of work over
the last two weeks. Thanks!!

I also want to remind folks that we have another one scheduled at the usual
time tomorrow [*] and that this will be the last review jam before the
sprint next week and the close of Icehouse-3.

Cheers,
Devananda

* http://www.timeanddate.com/worldclock/fixedtime.html?iso=20140227T1600
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] WSME / Pecan and only supporting JSON?

2014-02-27 Thread Devananda van der Veen
On Thu, Feb 27, 2014 at 5:28 AM, Sean Dague  wrote:

> On 02/27/2014 08:13 AM, Doug Hellmann wrote:
> >
> >
> >
> > On Thu, Feb 27, 2014 at 12:48 AM, Michael Davies  > > wrote:
> >
> > Hi everyone,
> >
> > Over in "Ironic Land" we're looking at removing XML support from
> > ironic-api (i.e. https://bugs.launchpad.net/ironic/+bug/1271317)
> >
> > I've been looking, but I can't seem to find an easy way to modify
> > the accepted content_types.
> >
> > Are there any wsgi / WSME / Pecan experts out there who can point me
> > in the right direction?
> >
> >
> > There's no support for turning off a protocol in WSME right now, but we
> > could add that if we really need it.
> >
> > Why would we turn it off, though? The point of dropping XML support in
> > some of the other projects is that they use toolkits that require extra
> > work to support it (either coding or maintenance of code we've written
> > elsewhere for OpenStack). WSME supports both protocols without the API
> > developer having to do any extra work.
>
> Because if an interface is exported to the user, then it needs to be
> both Documented and Tested. So that's double the cost on the validation
> front, and the documentation front.
>
> Exporting an API isn't set and forget. Especially with the semantic
> differences between JSON and XML. And if someone doesn't feel the XML
> created by WSME is semantically useful enough to expose to their users,
> they shouldn't be forced to by the interface.
>
>
Aside from our lack of doc and test coverage for XML support and the desire
to hide an untested and undocumented API (which I think are valid reasons
to disable it) there is an actual technical problem.

Ironic's API relies on HTTP PATCH to modify resources, which Pecan/WSME
does not abstract for us. We're using the python jsonpatch library to parse
these requests. I'm not aware of a similar python library for XML support.

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] heads up, set -o errexit on devstack - things will fail earlier now

2014-02-27 Thread Devananda van der Veen
 Thu, Feb 27, 2014 at 9:34 AM, Ben Nemec  wrote:

> On 2014-02-27 09:23, Daniel P. Berrange wrote:
>
>> On Thu, Feb 27, 2014 at 08:38:22AM -0500, Sean Dague wrote:
>>
>>> This patch is coming through the gate this morning -
>>> https://review.openstack.org/#/c/71996/
>>>
>>> The point being to actually make devstack stop when it hits an error,
>>> instead of only once these compound to the point where there is no
>>> moving forward and some service call fails. This should *dramatically*
>>> improve the experience of figuring out a failure in the gate, because
>>> where it fails should be the issue. (It also made us figure out some
>>> wonkiness with stdout buffering, that was making debug difficult).
>>>
>>> This works on all the content that devstack gates against. However,
>>> there are a ton of other paths in devstack, including vendor plugins,
>>> which I'm sure aren't clean enough to run under -o errexit. So if all of
>>> a sudden things start failing, this may be why. Fortunately you'll be
>>> pointed at the exact point of the fail.
>>>
>>
>> This is awesome!
>>
>
> +1!  Thanks Sean and everyone else who was involved with this.
>

Another big +1 for this! I've wished for it every time I tried to add
something to devstack and struggled with debugging it.

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] asymmetric gating and stable vs unstable tests

2014-02-27 Thread Devananda van der Veen
Hi all,

I'd like to point out how asymmetric gating is challenging for incubated
projects, and propose that there may be a way to make it less so.

For reference, incubated projects aren't allowed to have symmetric gating
with integrated projects. This is why our devstack and tempest tests are
"*-check-nv" in devstack and tempest, but "*-check" and "*-gate" in our
pipeline. So, these jobs are stable from Ironic's point of view because
we've been gating on them for the last month.

Cut forward to this morning. A devstack patch [1] was merged and broke
Ironic's gate because of a one-line issue in devstack/lib/ironic which I've
since proposed a fix for [2]. This issue was visible in the non-voting
check results before the patch was approved -- but those non-voting checks
got ignored because of an assumption of instability (they must be
non-voting for a reason, right?).

I'm not suggesting we gate integrated projects on incubated projects, but I
would like to point out that not all non-voting checks are non-voting
*because they're unstable*. It would be great if there were a way to
indicate that certain tests are voting for someone else and a failure
actually matters to them.

Thanks for listening,
-Deva


[1] https://review.openstack.org/#/c/71996/

[2] 
https://review.openstack.org/#/c/76943/
 -- It's been approved already, just waiting in the merge queue ...
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] maintaining backwards compatibility within a cycle

2014-11-20 Thread Devananda van der Veen
Let's get concrete for a moment, because it makes a difference which API
we're talking about.

We have to guarantee a fairly high degree of backwards compatibility within
the REST API. Adding new capabilities, and exposing them in a discoverable
way, is fine; a backwards-incompatible breaking change to the REST API is
definitely not OK without a version bump. We should (and do) make a strong
effort not to land any REST API change without appreciable thought and
testing of its impact. Changes here have an immediate effect on anyone
following trunk.

The RPC API is another area of compatibility, and perhaps the one most
clearly versioned today. We must continue supporting running disparate
versions of the RPC client and server (that is, rpcapi.py and manager.py)
so that operators can upgrade the API and Conductor services
asymmetrically. Changes to the RPC API are done in such a way that each
service can be upgraded independently of other services.

The driver API is the only purely-python API we support -- and we know that
there are downstream consumers of that API. OnMetal is one such; many other
spoke up at the recent summit. While the impact of a breaking change here
is less than in the REST API, it is not to be overlooked. There is a cost
associated with maintaining an out-of-tree driver and we should make a
best-effort to minimize that cost for folks who (for what ever reason) are
in that boat.

-Devananda


On Thu Nov 20 2014 at 8:28:56 AM Lucas Alvares Gomes 
wrote:

> Hi Ruby,
>
> Thank you for putting this up.
>
> I'm one of the ones think we should try hard (even really hard) to
> maintain the compatibility on every commit. I understand that it may
> sound naive because I'm sure that sometimes we will break things, but
> that doesn't means we shouldn't try.
>
> There may be people running Ironic in a continuous deployment
> environment, those are the users of the project and therefor the most
> important part of Ironic. Doesn't matter how well written Ironic code
> may be if nobody is using it. If we break that user workflow and he's
> unhappy that's the ultimate failure.
>
> I also understand that in the project POV we want to have fast
> interactions and shiny new features as quick as possible and trying to
> be backward compatibility all the time - on every commit - might slow
> that down. But in the user POV I believe that he doesn't care much
> about all the new features, he would mostly care about the things that
> used to work to continue to work for him.
>
> Also the backwards approach between releases and not commits might
> work fine in the non-opensource world where the code is kept indoors
> until the software is release, but in the opensource world where the
> code is out to people to use it all the time it doesn't seem to work
> that well.
>
> That's my 2 cents.
>
> Lucas
>
> On Thu, Nov 20, 2014 at 3:38 PM, Ruby Loo  wrote:
> > Hi, we had an interesting discussion on IRC about whether or not we
> should
> > be maintaining backwards compatibility within a release cycle. In this
> > particular case, we introduced a new decorator in this kilo cycle, and
> were
> > discussing the renaming of it, and whether it needed to be backwards
> > compatible to not break any out-of-tree driver using master.
> >
> > Some of us (ok, me or I) think it doesn't make sense to make sure that
> > everything we do is backwards compatible. Others disagree and think we
> > should, or at least strive for 'must be' backwards compatible with the
> > caveat that there will be cases where this isn't
> feasible/possible/whatever.
> > (I hope I captured that correctly.)
> >
> > Although I can see the merit (well, sort of) of trying our best, trying
> > doesn't mean 'must', and if it is 'must', who decides what can be
> exempted
> > from this, and how will we communicate what is exempted, etc?
> >
> > Thoughts?
> >
> > --ruby
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] New meeting room and time!

2014-11-25 Thread Devananda van der Veen
As discussed on this list previously [0] and agreed to in the last meeting
[1], our weekly IRC meeting time is changing to better accommodate the many
active contributors we have who are not well served by the current meeting
time.

The new meeting time will alternate between 1700 UTC on Mondays and 0500
UTC on Tuesdays. The next meeting, on Monday Dec 1st, will be at 1700 UTC

http://www.timeanddate.com/worldclock/fixedtime.html?iso=20141201T1700


*** NOTE ***
This change of time also requires us to change rooms; we will now be
meeting in the *#openstack-meeting-3* room. I'll remind folks by announcing
this in our main channel before hand as well.
*** NOTE ***

All relevant wiki pages have been updated, but the change has not
propagated to the iCal feed just yet.

Regards,
Devananda

[0]
http://lists.openstack.org/pipermail/openstack-dev/2014-November/050838.html

[1]
http://eavesdrop.openstack.org/meetings/ironic/2014/ironic.2014-11-24-19.01.html
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [ironic] how to remove check-tempest-dsvm-ironic-pxe_ssh on Nova check

2014-11-25 Thread Devananda van der Veen
On Tue Nov 25 2014 at 7:20:00 AM Sean Dague  wrote:

> On 11/25/2014 10:07 AM, Jim Rollenhagen wrote:
> > On Tue, Nov 25, 2014 at 08:02:56AM -0500, Sean Dague wrote:
> >> When at Summit I discovered that check-tempest-dsvm-ironic-pxe_ssh is
> >> now voting on Nova check queue. The reasons given is that the Nova team
> >> ignored the interface contract that was being provided to Ironic, broke
> >> them, so the Ironic team pushed for co-gating (which basically means the
> >> interface contract is now enforced by a 3rd party outside of Nova /
> Ironic).
> >>
> >> However, this was all in vague term, and I think is exactly the kind of
> >> thing we don't want to do. Which is use the gate as a proxy fight over
> >> teams breaking contracts with other teams.
> >>
> >> So I'd like to dive into what changes happened and what actually broke,
> >> so that we can get back to doing this smarter.
> >>
> >> Because if we are going to continue to grow as a community, we have to
> >> co-gate less. It has become a crutch to not think about interfaces and
> >> implications of changes, and is something we need to be doing a lot
> less of.
>

Definitely -- and we're already co-gating less than other projects :)

You might notice that ironic jobs aren't running in the check queues for
any other Integrated project, even though Ironic depends on interacting
directly with Glance, Neutron, Keystone, and (for some drivers) Swift.


> >
> > Completely agree here; I would love to not gate Nova on Ironic.
> >
> > The major problem is that Nova's driver API is explicitly not stable.[0]
> > If the driver API becomes stable and properly versioned, Nova should be
> > able to change the driver API without breaking Ironic.
> >
> > Now, this won't fix all cases where Nova could break Ironic. The
> > resource tracker or scheduler could change in a breaking way. However, I
> > think the driver API has been the most common cause of Ironic breakage,
> > so I think it's a great first step.
> >
> > // jim
> >
> > [0] http://docs.openstack.org/developer/nova/devref/
> policies.html#public-contractual-apis
>
> So we actually had a test in tree for that part of the contract...
>
>
We did. It was added here:
  https://review.openstack.org/#/c/98201

.. and removed here:
  https://review.openstack.org/#/c/111425/

.. because there is now unit test coverage for the internal API usage by
in-tree virt drivers via assertPublicAPISignatures() here:
  https://git.openstack.org/cgit/openstack/nova/tree/nova/test.py#n370


We don't need the surface to be public contractual, we just need to know
> what things Ironic is depending on and realize that can't be changed
> without some compatibility code put in place.


There is the virt driver API, which is the largest and most obvious one.
Then there's the HostManager, ResourceTracker, and SchedulerFilter classes
(or sublcasses thereof). None of these are "public contractual APIs" as
defined by Nova. Without any guarantee of stability in those interfaces, I
believe co-gating is a reasonable thing to do between the projects.

Also, there have been discussions [2] at and leading up to the Paris summit
(some ongoing for many cycles now) regarding changing every one of those
interfaces. Until those interfaces are refactored / split out / otherwise
deemed stable, I would like to continue running Ironic's functional tests
on all Nova changes. If you think we don't need to co-gate while that work
is underway, I'd like to understand why, and what that would look like.

Again... without knowing exactly what happened (I was on leave) it's
> hard to come up with a solution. However, I think the co-gate was an
> elephant gun that we don't actually want.
>

Apologies, but I don't recall exactly what time period you were on leave
for, so you may have already seen some or all of these.

I have looked up up several cases of asymmetrical breaks that happened (due
to changes in multiple projects) during Juno (before Ironic was voting in
Nova's check queue). At least one of these was the result of a change in
Nova after the Ironic driver merged. Links at the bottom for reference [1].

Here is a specific example where a patch introduced subtle behavior changes
within the resource tracker that were not caught by any of Nova's tests,
and would not have been caught by the API contract test, even if that had
been in place at the time, nor any other API contract test, since it did
not, in fact, change the API. It changed a behavior of the resource tracker
which, it turns out, libvirt does not use (at least within the
devstack-gate environment).

https://review.openstack.org/#/c/71557/33/nova/compute/resource_tracker.py

The problem is at line 385, where the supplied 'stats' are overwritten.
None of the libvirt-based tests touched this code path, though.

> def _write_ext_resources(self, resources):
>resources['stats'] = {}## right here
>resources['stats'].update(self.stats)
>self.ext_resources_handler.write_resources(re

Re: [openstack-dev] [TripleO] [Ironic] Do we want to remove Nova-bm support?

2014-12-04 Thread Devananda van der Veen
On Thu Dec 04 2014 at 11:05:53 AM Clint Byrum  wrote:

> Excerpts from Steve Kowalik's message of 2014-12-03 20:47:19 -0800:
> > Hi all,
> >
> > I'm becoming increasingly concerned about all of the code paths
> > in tripleo-incubator that check $USE_IRONIC -eq 0 -- that is, use
> > nova-baremetal rather than Ironic. We do not check nova-bm support in
> > CI, haven't for at least a month, and I'm concerned that parts of it
> > may be slowly bit-rotting.
> >
> > I think our documentation is fairly clear that nova-baremetal is
> > deprecated and Ironic is the way forward, and I know it flies in the
> > face of backwards-compatibility, but do we want to bite the bullet and
> > remove nova-bm support?
>
> Has Ironic settled on a migration path/tool from nova-bm? If yes, then
> we should remove nova-bm support and point people at the migration
> documentation.
>

Such a tool was created and has been provided for the Juno release as a
"sideways migration". That is, an in-place migration from Juno Nova
Baremetal to Juno Ironic is supported. Such is documented here:

https://wiki.openstack.org/wiki/Ironic/NovaBaremetalIronicMigration

That is all that will be provided, as Baremetal has been removed from Nova
at the start of the Kilo cycle.

-Deva


> If Ironic decided not to provide one, then we should just remove support
> as well.
>
> If Ironic just isn't done yet, then removing nova-bm in TripleO is
> premature and we should wait for them to finish.
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [[Openstack-dev] [Ironic] Ironic-conductor fails to start - "AttributeError '_keepalive_evt'"

2014-12-05 Thread Devananda van der Veen
Hi Lohit,

In the future, please do not cross-post or copy-and-paste usage questions
on the development list. Since you posted this question on the general list
(*) -- which is exactly where you should post it -- I will respond there.

Regards,
Devananda

(*) http://lists.openstack.org/pipermail/openstack/2014-December/010698.html



On Fri Dec 05 2014 at 1:15:44 PM Lohit Valleru 
wrote:

> Hello All,
>
> I am trying to deploy bare-metal nodes using openstack-ironic. It is a 2 -
> node architecture with controller/keystone/mysql on a virtual machine, and
> cinder/compute/nova network on a physical machine on a CentOS 7 environment.
>
> openstack-ironic-common-2014.2-2.el7.centos.noarch
> openstack-ironic-api-2014.2-2.el7.centos.noarch
> openstack-ironic-conductor-2014.2-2.el7.centos.noarch
>
> I have followed this document,
>
> http://docs.openstack.org/developer/ironic/deploy/install-guide.html#ipmi-support
>
> and installed ironic. But when i start ironic-conductor, i get the below
> error :
>
> ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service
>  ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 ERROR
> ironic.common.service [-] Service error occurred when cleaning up the RPC
> manager. Error: 'ConductorManager' object has no attribute '_keepalive_evt'
>  ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service Traceback (most recent call last):
>  ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service   File
> "/usr/lib/python2.7/site-packages/ironic/common/service.py", line 91, in
> stop
>  ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service self.manager.del_host()
> ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service   File
> "/usr/lib/python2.7/site-packages/ironic/conductor/manager.py", line 235,
> in del_host
>  ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service self._keepalive_evt.set()
>  hc004 ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service AttributeError: 'ConductorManager' object has no
> attribute '_keepalive_evt'
>  hc004 ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 TRACE
> ironic.common.service
>  hc004 ironic-conductor[15997]: 2014-12-05 15:38:12.457 15997 INFO
> ironic.common.service [-] Stopped RPC server for service
> ironic.conductor_manager on host hc004.
>
> A look at the source code, tells me that it is something related to RPC
> service being started/stopped.
>
> Also, I cannot debug this more as - I do not see any logs being created
> with respect to ironic.
> Do i have to explicitly enable the logging properties in ironic.conf, or
> are they expected to be working by default?
>
> Here is the configuration from ironic.conf
>
> #
>
> [DEFAULT]
> verbose=true
> rabbit_host=172.18.246.104
> auth_strategy=keystone
> debug=true
>
> [keystone_authtoken]
> auth_host=172.18.246.104
> auth_uri=http://172.18.246.104:5000/v2.0
> admin_user=ironic
> admin_password=
> admin_tenant_name=service
>
> [database]
> connection = mysql://ironic:x@172.18.246.104/ironic?charset=utf8
>
> [glance]
> glance_host=172.18.246.104
>
> #
>
> I understand that i did not give neutron URL as required by the
> documentation. The reason : that i have architecture limitations to install
> neutron networking and would like to experiment if nova-network and dhcp
> pxe server will server the purpose although i highly doubt that.
>
> However, i wish to know if the above issue is anyway related to
> non-existent neutron network, or if it is related to something else.
>
> Please do let me know.
>
> Thank you,
>
> Lohit
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] reminder: alternate meeting time

2014-12-05 Thread Devananda van der Veen
This is a friendly reminder that our weekly IRC meetings have begun
alternating times every week to try to accommodate more of our contributors.

Next week's meeting will be at 0500 UTC Tuesday (9pm PST Monday) in the
#openstack-meeting-3 channel. Details, as always, are on the wiki [0].

Regards,
Devananda

[0] https://wiki.openstack.org/wiki/Meetings/Ironic
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Fuel agent proposal

2014-12-08 Thread Devananda van der Veen
I'd like to raise this topic for a wider discussion outside of the hallway
track and code reviews, where it has thus far mostly remained.

In previous discussions, my understanding has been that the Fuel team
sought to use Ironic to manage "pets" rather than "cattle" - and doing so
required extending the API and the project's functionality in ways that no
one else on the core team agreed with. Perhaps that understanding was wrong
(or perhaps not), but in any case, there is now a proposal to add a
FuelAgent driver to Ironic. The proposal claims this would meet that teams'
needs without requiring changes to the core of Ironic.

https://review.openstack.org/#/c/138115/

The Problem Description section calls out four things, which have all been
discussed previously (some are here [0]). I would like to address each one,
invite discussion on whether or not these are, in fact, problems facing
Ironic (not whether they are problems for someone, somewhere), and then ask
why these necessitate a new driver be added to the project.


They are, for reference:

1. limited partition support

2. no software RAID support

3. no LVM support

4. no support for hardware that lacks a BMC

#1.

When deploying a partition image (eg, QCOW format), Ironic's PXE deploy
driver performs only the minimal partitioning necessary to fulfill its
mission as an OpenStack service: respect the user's request for root, swap,
and ephemeral partition sizes. When deploying a whole-disk image, Ironic
does not perform any partitioning -- such is left up to the operator who
created the disk image.

Support for arbitrarily complex partition layouts is not required by, nor
does it facilitate, the goal of provisioning physical servers via a common
cloud API. Additionally, as with #3 below, nothing prevents a user from
creating more partitions in unallocated disk space once they have access to
their instance. Therefor, I don't see how Ironic's minimal support for
partitioning is a problem for the project.

#2.

There is no support for defining a RAID in Ironic today, at all, whether
software or hardware. Several proposals were floated last cycle; one is
under review right now for DRAC support [1], and there are multiple call
outs for RAID building in the state machine mega-spec [2]. Any such support
for hardware RAID will necessarily be abstract enough to support multiple
hardware vendor's driver implementations and both in-band creation (via
IPA) and out-of-band creation (via vendor tools).

Given the above, it may become possible to add software RAID support to IPA
in the future, under the same abstraction. This would closely tie the
deploy agent to the images it deploys (the latter image's kernel would be
dependent upon a software RAID built by the former), but this would
necessarily be true for the proposed FuelAgent as well.

I don't see this as a compelling reason to add a new driver to the project.
Instead, we should (plan to) add support for software RAID to the deploy
agent which is already part of the project.

#3.

LVM volumes can easily be added by a user (after provisioning) within
unallocated disk space for non-root partitions. I have not yet seen a
compelling argument for doing this within the provisioning phase.

#4.

There are already in-tree drivers [3] [4] [5] which do not require a BMC.
One of these uses SSH to connect and run pre-determined commands. Like the
spec proposal, which states at line 122, "Control via SSH access feature
intended only for experiments in non-production environment," the current
SSHPowerDriver is only meant for testing environments. We could probably
extend this driver to do what the FuelAgent spec proposes, as far as remote
power control for cheap always-on hardware in testing environments with a
pre-shared key.

(And if anyone wonders about a use case for Ironic without external power
control ... I can only think of one situation where I would rationally ever
want to have a control-plane agent running inside a user-instance: I am
both the operator and the only user of the cloud.)




In summary, as far as I can tell, all of the problem statements upon which
the FuelAgent proposal are based are solvable through incremental changes
in existing drivers, or out of scope for the project entirely. As another
software-based deploy agent, FuelAgent would duplicate the majority of the
functionality which ironic-python-agent has today.

Ironic's driver ecosystem benefits from a diversity of hardware-enablement
drivers. Today, we have two divergent software deployment drivers which
approach image deployment differently: "agent" drivers use a local agent to
prepare a system and download the image; "pxe" drivers use a remote agent
and copy the image over iSCSI. I don't understand how a second driver which
duplicates the functionality we already have, and shares the same goals as
the drivers we already have, is beneficial to the project.

Doing the same thing twice just increases the burden on the team; w

Re: [openstack-dev] [Ironic] Fuel agent proposal

2014-12-09 Thread Devananda van der Veen
Thank you for explaining in detail what Fuel's use case is. I was lacking
this information, and taking the FuelAgent proposal in isolation. Allow me
to respond to several points inline...

On Tue Dec 09 2014 at 4:08:45 AM Vladimir Kozhukalov <
vkozhuka...@mirantis.com> wrote:

> Just a short explanation of Fuel use case.
>
> Fuel use case is not a cloud.
>

This is a fairly key point, and thank you for bringing it up. Ironic's
primary aim is to better OpenStack, and as such, to be part of an "Open
Source Cloud Computing platform." [0]

Meeting a non-cloud use case has not been a priority for the project as a
whole. It is from that perspective that my initial email was written, and I
stand by what I said there -- FuelAgent does not appear to be significantly
different from IPA when used within a "cloudy" use case. But, as you've
pointed out, that's not your use case :)

Enabling use outside of OpenStack has been generally accepted by the team,
though I don't believe anyone on the core team has put a lot of effort into
developing that yet. As I read this thread, I'm pleased to see more details
about Fuel's architecture and goals -- I think there is a potential fit for
Ironic here, though several points need further discussion.


> Fuel is a deployment tool. We install OS on bare metal servers and on VMs
> and then configure this OS using Puppet. We have been using Cobbler as our
> OS provisioning tool since the beginning of Fuel.
> However, Cobbler assumes using native OS installers (Anaconda and
> Debian-installer). For some reasons we decided to
> switch to image based approach for installing OS.
>
> One of Fuel features is the ability to provide advanced partitioning
> schemes (including software RAIDs, LVM).
> Native installers are quite difficult to customize in the field of
> partitioning
> (that was one of the reasons to switch to image based approach). Moreover,
> we'd like to implement even more
> flexible user experience.
>

The degree of customization and flexibility which you describe is very
understandable within traditional IT shops. Don't get me wrong -- there's
nothing inherently bad about wanting to give such flexibility to your
users. However, infinite flexibility is counter-productive to two of the
primary benefits of cloud computing: repeatability, and consistency.

[snip]

According Fuel itself, our nearest plan is to get rid of Cobbler because
> in the case of image based approach it is huge overhead. The question is
> which tool we can use instead of Cobbler. We need power management,
> we need TFTP management, we need DHCP management. That is
> exactly what Ironic is able to do.
>

You're only partly correct here. Ironic provides a vendor-neutral
abstraction for power management and image deployment, but Ironic does not
implement any DHCP management - Neutron is responsible for that, and Ironic
calls out to Neutron's API only to adjust dhcpboot parameters. At no point
is Ironic responsible for IP or DNS assignment.

This same view is echoed in the spec [1] which I have left comments on:

> Cobbler manages DHCP, DNS, TFTP services ...
> OpenStack has Ironic in its core which is capable to do the same ...
> Ironic can manage DHCP and it is planned to implement dnsmasq plugin.

To reiterate, Ironic does not manage DHCP or DNS, it never has, and such is
not on the roadmap for Kilo [2]. Two specs related to this were proposed
last month [3] -- but a spec proposal does not equal project plans. One of
the specs has been abandoned, and I am still waiting for the author to
rewrite the other one. Neither are approved nor targeted to Kilo.


In summary, if I understand correctly, it seems as though you're trying to
fit Ironic into Cobbler's way of doing things, rather than recognize that
Ironic approaches provisioning in a fundamentally different way.

Your use case:
* is not cloud-like
* does not include Nova or Neutron, but will duplicate functionality of
both (you need a scheduler and all the logic within nova.virt.ironic, and
something to manage DHCP and DNS assignment)
* would use Ironic to manage diverse hardware, which naturally requires
some operator-driven customization, but still exposes the messy
configuration bits^D^Dchoices to users at deploy time
* duplicates some of the functionality already available in other drivers

There are certain aspects of the proposal which I like, though:
* using SSH rather than HTTP for remote access to the deploy agent
* support for putting the root partition on a software RAID
* integration with another provisioning system, without any API changes

Regards,
-Devananda


[0] https://wiki.openstack.org/wiki/Main_Page

[1]
https://review.openstack.org/#/c/138301/8/specs/6.1/substitution-cobbler-with-openstack-ironic.rst

[2] https://launchpad.net/ironic/kilo

[3] https://review.openstack.org/#/c/132511/ and
https://review.openstack.org/#/c/132744/
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org

Re: [openstack-dev] [Ironic] Fuel agent proposal

2014-12-09 Thread Devananda van der Veen
On Tue Dec 09 2014 at 7:49:32 AM Yuriy Zveryanskyy <
yzveryans...@mirantis.com> wrote:

> On 12/09/2014 05:00 PM, Jim Rollenhagen wrote:
> > On Tue, Dec 09, 2014 at 04:01:07PM +0400, Vladimir Kozhukalov wrote:
>
> >> Many many various cases are possible. If you ask why we'd like to
> support
> >> all those cases, the answer is simple:
> >> because our users want us to support all those cases.
> >> Obviously, many of those cases can not be implemented as image
> internals,
> >> some cases can not be also implemented on
> >> configuration stage (placing root fs on lvm device).
> >>
> >> As far as those use cases were rejected to be implemented in term of
> IPA,
> >> we implemented so called Fuel Agent.
> > This is *precisely* why I disagree with adding this driver.
> >
> > Nearly every feature that is listed here has been talked about before,
> > within the Ironic community. Software RAID, LVM, user choosing the
> > partition layout. These were reected from IPA because they do not fit in
> > *Ironic*, not because they don't fit in IPA.
>
> Yes, they do not fit in Ironic *core* but this is a *driver*.
> There is iLO driver for example. Good or bad is iLO management technology?
> I don't know. But it is an existing vendor's solution. I should buy or rent
> HP server for tests or experiments with iLO driver. Fuel is widely used
> solution for deployment, and it is open-source. I think to have Fuel Agent
> driver in Ironic will be better than driver for rare hardware XYZ for
> example.
>
>
This argument is completely hollow. Fuel is not a vendor-specific
hardware-enablement driver. It *is* an open-source deployment driver
providing much the same functionality as another open-source deployment
driver which is already integrated with the project.

To make my point another way, could I use Fuel with HP iLO driver? (the
answer should be "yes" because they fill different roles within Ironic).
But, on the other hand, could I use Fuel with the IPA driver? (nope -
definitely not - they do the same thing.)

-Deva
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Fuel agent proposal

2014-12-09 Thread Devananda van der Veen
On Tue Dec 09 2014 at 10:13:52 AM Vladimir Kozhukalov <
vkozhuka...@mirantis.com> wrote:

> Kevin,
>
> Just to make sure everyone understands what Fuel Agent is about. Fuel
> Agent is agnostic to image format. There are 3 possibilities for image
> format
> 1) DISK IMAGE contains GPT/MBR table and all partitions and metadata in
> case of md or lvm. That is just something like what you get when run 'dd
> if=/dev/sda of=disk_image.raw'
>

This is what IPA driver does today.


> 2) FS IMAGE contains fs. Disk contains some partitions which then could be
> used to create md device or volume group contains logical volumes. We then
> can put a file system over plain partition or md device or logical volume.
> This type of image is what you get when run 'dd if=/dev/sdaN
> of=fs_image.raw'
>

This is what PXE driver does today, but it does so over a remote iSCSI
connection.

Work is being done to add support for this to IPA [0]


> 3) TAR IMAGE contains files. It is when you run 'tar cf tar_image.tar /'
>
> Currently in Fuel we use FS images. Fuel Agent creates partitions, md and
> lvm devices and then downloads FS images and put them on partition devices
> (/dev/sdaN) or on lvm device (/dev/mapper/vgname/lvname) or md device
> (/dev/md0)
>
>
I believe the IPA team would welcome contributions that add support for
software RAID for the root partition.


> Fuel Agent is also able to install and configure grub.
>

Again, I think this would be welcomed by the IPA team...

If this is what FuelAgent is about, why is there so much resistance to
contributing that functionality to the component which is already
integrated with Ironic? Why complicate matters for both users and
developers by adding *another* deploy agent that does (or will soon do) the
same things?

-Deva

[0]
https://blueprints.launchpad.net/ironic/+spec/partition-image-support-for-agent-driver
https://review.openstack.org/137363
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] Fuel agent proposal

2014-12-09 Thread Devananda van der Veen
On Tue Dec 09 2014 at 9:45:51 AM Fox, Kevin M  wrote:

> We've been interested in Ironic as a replacement for Cobbler for some of
> our systems and have been kicking the tires a bit recently.
>
> While initially I thought this thread was probably another "Fuel not
> playing well with the community" kind of thing, I'm not thinking that any
> more. Its deeper then that.
>

There are aspects to both conversations here, and you raise many valid
points.

Cloud provisioning is great. I really REALLY like it. But one of the things
> that makes it great is the nice, pretty, cute, uniform, standard "hardware"
> the vm gives the user. Ideally, the physical hardware would behave the
> same. But,
> “No Battle Plan Survives Contact With the Enemy”.  The sad reality is,
> most hardware is different from each other. Different drivers, different
> firmware, different different different.
>

Indeed, hardware is different. And no matter how homogeneous you *think* it
is, at some point, some hardware is going to fail^D^D^Dbehave differently
than some other piece of hardware.

One of the primary goals of Ironic is to provide a common *abstraction* to
all the vendor differences, driver differences, and hardware differences.
There's no magic in that -- underneath the covers, each driver is going to
have to deal with the unpleasant realities of actual hardware that is
actually different.


> One way the cloud enables this isolation is by forcing the cloud admin's
> to install things and deal with the grungy hardware to make the interface
> nice and clean for the user. For example, if you want greater mean time
> between failures of nova compute nodes, you probably use a raid 1. Sure,
> its kind of a pet kind of thing todo, but its up to the cloud admin to
> decide what's "better", buying more hardware, or paying for more admin/user
> time. Extra hard drives are dirt cheep...
>
> So, in reality Ironic is playing in a space somewhere between "I want to
> use cloud tools to deploy hardware, yay!" and "ewww.., physical hardware's
> nasty. you have to know all these extra things and do all these extra
> things that you don't have to do with a vm"... I believe Ironic's going to
> need to be able to deal with this messiness in as clean a way as possible.


If by "clean" you mean, expose a common abstraction on top of all those
messy differences -- then we're on the same page. I would welcome any
feedback as to where that abstraction leaks today, and on both spec and
code reviews that would degrade or violate that abstraction layer. I think
it is one of, if not *the*, defining characteristic of the project.


> But that's my opinion. If the team feels its not a valid use case, then
> we'll just have to use something else for our needs. I really really want
> to be able to use heat to deploy whole physical distributed systems though.
>
> Today, we're using software raid over two disks to deploy our nova
> compute. Why? We have some very old disks we recovered for one of our
> clouds and they fail often. nova-compute is pet enough to benefit somewhat
> from being able to swap out a disk without much effort. If we were to use
> Ironic to provision the compute nodes, we need to support a way to do the
> same.
>

I have made the (apparently incorrect) assumption that anyone running
anything sensitive to disk failures in production would naturally have a
hardware RAID, and that, therefor, Ironic should be capable of setting up
that RAID in accordance with a description in the Nova flavor metadata --
but did not need to be concerned with software RAIDs.

Clearly, there are several folks who have the same use-case in mind, but do
not have hardware RAID cards in their servers, so my initial assumption was
incorrect :)

I'm fairly sure that the IPA team would welcome contributions to this
effect.

We're looking into ways of building an image that has a software raid
> presetup, and expand it on boot.


Awesome! I hope that work will make its way into diskimage-builder ;)

(As an aside, I suggested this to the Fuel team back in Atlanta...)


> This requires each image to be customized for this case though. I can see
> Fuel not wanting to provide two different sets of images, "hardware raid"
> and "software raid", that have the same contents in them, with just
> different partitioning layouts... If we want users to not have to care
> about partition layout, this is also not ideal...
>

End-users are probably not generating their own images for bare metal
(unless user == operator, in which case, it should be fine).


> Assuming Ironic can be convinced that these features really would be
> needed, perhaps the solution is a middle ground between the pxe driver and
> the agent?
>

I've been rallying for a convergence between the feature sets of these
drivers -- specifically, that the agent should support partition-based
images, and also support copy-over-iscsi as a deployment model. In
parallel, Lucas had started working on splitting the deploy interfa

[openstack-dev] [Ironic] 0.3.2 client release

2014-12-10 Thread Devananda van der Veen
Hi folks,

Just a quick announcement that I've tagged an incremental release of our
client library to catch up with the changes so far in Kilo in preparation
for the k-1 milestone next week. Here are the release notes:

- Add keystone v3 CLI support
- Add tty password entry to CLI
- Add node-set-maintenance command to CLI
- Include maintenance_reason in CLI output of node-show
- Add option to specify node uuid in node-create subcommand
- Add GET support for vendor_passthru to the library

It should be winding its way through the build pipeline right now, and
available on pypi later today.

Regards,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Ironic] deadline for specs

2014-12-13 Thread Devananda van der Veen
Hi, Tan,

No, ironic is not having an early feature proposal freeze for this cycle.
Dec 18 is the kilo-1 milestone, and that is all.

Please see the release schedule here:

https://wiki.openstack.org/wiki/Kilo_Release_Schedule

That being said, the earlier you can propose a spec, the better your
chances for it landing in any given cycle.

Regards,
Devananda




On Sat, Dec 13, 2014, 10:10 PM Tan, Lin  wrote:

Hi,

A quick question,
do we have a SpecProposalDeadline for Ironic, 18th Dec or ?

Thanks

Best Regards,

Tan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] maintaining our stable branches

2014-12-19 Thread Devananda van der Veen
Hi folks!

We now have control over our own stable branch maintenance, which is good,
because the stable maintenance team is not responsible for non-integrated
releases of projects (eg, both of our previous releases).

Also, to note, with the Big Tent changes starting to occur, such
responsibilities will be more distributed in the future. [0] For the
remainder of this cycle, we'll reuse the ironic-milestone group for stable
branch maintenance [1]; when Kilo is released, we'll need to create an
ironic-stable-maint gerrit group and move to that, or generally do what
ever the process looks like at that point.

In any case, for now I've seeded the group with Adam Gandelman and myself
(since we were already tracking Ironic's stable branches). If any other
cores would like to help with this, please ping me, I'm happy to add folks
but don't want to assume that all cores want the responsibility.

We should also decide and document, explicitly, what support we're giving
to the Icehouse and Juno releases. My sense is that we should drop support
for Icehouse, but I'll put that on the weekly meeting agenda for after the
holidays.

-Devananda

[0]
http://lists.openstack.org/pipermail/openstack-dev/2014-November/050390.html

[1] https://review.openstack.org/#/c/143112/
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Cancelling next week's meeting

2014-12-23 Thread Devananda van der Veen
With the winter break coming up (or already here, for some folks) I am
cancelling next week's meeting on Dec 29.

I had not cancelled last night's meeting ahead of time, but very few people
attended, and with so few core reviewers present there wasn't much we could
get done. We did not have a formal meeting, and just hung out in channel
for about 15 minutes.

This means our next meeting will be Jan 6th at 0500 UTC (Jan 5th at 9pm US
west coast).

See you all again after the break!

Best,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] mid-cycle details

2014-12-29 Thread Devananda van der Veen
Hi folks!

tldr; If you will be attending the midcycle sprint in Grenoble the week of
Feb 3rd, please sign up HERE

.

Long version...

Before the holidays, I was behind in gathering and sharing information
about our midcycle sprint, which makes me even further behind now so
I've finally got those details to share with y'all! Also, I have some
thoughts / concerns, which I've shared in a separate email -- please go
read it.

Dates: Feb 3 - 5 (Tue - Thu) with a half day for those sticking around on
Friday, Feb 6th.

Location:
  Hewlett Packard Centre de Compétences
  5 Avenue Raymond Chanas
  38320 Eybens
  Grenoble, France

Grenoble is flat and fairly easy to get around, both by tram and by car.
The easiest airport to travel through is Lyon, and it is about an hour's
drive by car from the airport to Grenoble. (Also, it's a beautiful drive in
the countryside - I recommend it!)

I have previously stayed at the Mercure Centre Alpotel [1], and while not
the closest hotel (it's about 10 minutes by car or 25 minutes by tram to
HP's campus) it is within walking distance to the city center. I'll be
staying there again. There are also hotels around the Expo center, which is
just a few blocks from the HP campus, such as [2]. I have not arranged any
group rates at these hotels, but the city has plenty of availability and
this isn't peak travel season so rates are quite reasonable.

The weather forecast [3] will probably be chilly (around 45F or 7C during
the day), likely overcast with some rain, but probably not snowing in the
city. We'll be within easy driving distance of the Alps, so if you plan to
go exploring outside the city (ski trip, anyone?) dress for snow.

Regards,
Devananda



[1]
Hotel Mercure Grenoble Centre Alpotel
12 Boulevard Maréchal Joffre
38000 Grenoble
France

[2]
Park & Suites Elegance Grenoble Alpexpo
1 Avenue d'Innsbruck
38100 Grenoble
France

[3]
https://weatherspark.com/averages/32103/2/Grenoble-Rhone-Alpes-France
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] thoughts on the midcycle

2014-12-29 Thread Devananda van der Veen
I'm sending the details of the midcycle in a separate email. Before you
reply that you won't be able to make it, I'd like to share some thoughts /
concerns.

In the last few weeks, several people who I previously thought would attend
told me that they can't. By my informal count, it looks like we will have
at most 5 of our 10 core reviewers in attendance. I don't think we should
cancel based on that, but it does mean that we need to set our expectations
accordingly.

Assuming that we will be lacking about half the core team, I think it will
be more practical as a focused sprint, rather than a planning & design
meeting. While that's a break from precedent, planning should be happening
via the spec review process *anyway*. Also, we already have a larger back
log of specs and work than we had this time last cycle, but with the same
size review team. Rather than adding to our backlog, I would like us to use
this gathering to burn through some specs and land some code.

That being said, I'd also like to put forth this idea: if we had a second
gathering (with the same focus on writing code) the following week (let's
say, Feb 11 - 13) in the SF Bay area -- who would attend? Would we be able
to get the "other half" of the core team together and get more work done?
Is this a good idea?

OK. That's enough of my musing for now...

Once again, if you will be attending the midcycle sprint in Grenoble the
week of Feb 3rd, please sign up HERE

.

Regards,
Devananda
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Ironic] Bug team meeting tomorrow 1700 UTC

2015-01-12 Thread Devananda van der Veen
Following up from our IRC meeting to let folks know that we'll be having a
"bug day" tomorrow (Tuesday) to clean up our bug list. Folks who want to
join Dmitry and I are welcome - we'll start at the same time as today's
meeting (1700 UTC // 9am PST), but in our usual channel (#openstack-ironic)
rather than the meeting room.

The goal for tomorrow isn't to fix all the bugs, but to make sure the
status is correct. We have a fair number of stale bugs that should be
closed or re-prioritized, and we have a growing list of new bugs that need
to be triaged.

-Devananda
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   >