> I guess our architecture is pretty unique in a way but I wonder if
> other people are also a little scared about the whole all DB servers
> need to up to serve API requests?
When we started down this path, we acknowledged that this would create a
different access pattern which would require ops
> I tested a code change that essentially reverts
> https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py
>
> In other words, with this change metadata tables are not fetched by
> default in API requests. If I understand correctly, metadata is
> fetched in separate queries as the inst
> Do you guys see an easy fix here?
>
> Should I open a bug report?
Definitely open a bug. IMHO, we should just make the single-instance
load work like the multi ones, where we load the metadata separately if
requested. We might be able to get away without sysmeta these days, but
we needed it for
> We haven't been doing this (intentionally) for quite some time, as we
> query and fill metadata linearly:
>
> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2244
>
> and have since 2013 (Havana):
>
> https://review.openstack.org/#/c/26136/
>
> So unless there has been a
> Of course this is only a problem when instances have a lot of metadata
> records. An instance with 50 records in "instance_metadata" and 50
> records in "instance_system_metadata" will fetch 50 x 50 = 2,500 rows
> from the database. It's not difficult to see how this can escalate
> quickly. This
>> I disagree on this. I'd rather just do a simple check for >1
>> provider in the allocations on the source and if True, fail hard.
>>
>> The reverse (going from a non-nested source to a nested destination)
>> will hard fail anyway on the destination because the POST
>> /allocations won't work due
>> I still want to use something like "Is capable of RAID5" and/or "Has
>> RAID5 already configured" as part of a scheduling and placement
>> decision. Being able to have the GET /a_c response filtered down to
>> providers with those, ahem, traits is the exact purpose of that operation.
>
> And yep
> It sounds like you might be saying, "I would rather not see encoded
> trait names OR a new key/value primitive; but if the alternative is
> ending up with 'a much larger mess', I would accept..." ...which?
>
> Or is it, "We should not implement a key/value primitive, nor should we
> implement res
I was out when much of this conversation happened, so I'm going to
summarize my opinion here.
> So from a code perspective _placement_ is completely agnostic to
> whether a trait is "PCI_ADDRESS_01_AB_23_CD", "STORAGE_DISK_SSD", or
> "JAY_LIKES_CRUNCHIE_BARS".
>
> However, things which are using t
> I'm just a bit worried to limit that role to the elected TC members. If
> we say "it's the role of the TC to do cross-project PM in OpenStack"
> then we artificially limit the number of people who would sign up to do
> that kind of work. You mention Ildiko and Lance: they did that line of
> work
> How do people feel about this? It seems pretty straight-forward to
> me. If people are generally in favor of this, then the question is
> what would be sane defaults - or should we not assume a default and
> force operators to opt into this?
I dunno, adding something to nova.conf that is only us
> The other obvious thing is the database. The placement repo code as-is
> today still has the check for whether or not it should use the
> placement database but falls back to using the nova_api database
> [5]. So technically you could point the extracted placement at the
> same nova_api database
> I think there was a period in time where the nova_api database was created
> where entires would try to get pulled out from the original nova database and
> then checking nova_api if it doesn't exist afterwards (or vice versa). One
> of the cases that this was done to deal with was for things li
>> Yes, we should definitely trim the placement DB migrations to only
>> things relevant to placement. And we can use this opportunity to get
>> rid of cruft too and squash all of the placement migrations together
>> to start at migration 1 for the placement repo. If anyone can think
>> of a proble
> If we're going to do the extraction in Stien, which we said we'd do in
> Dublin, we need to start that as early as possible to iron out any
> deployment bugs in the switch. We can't wait until the 2nd or 3rd
> milestone, it would be too risky.
I agree that the current extraction plan is highly r
> Grenade already has it's own "resources db" right? So we can shove
> things in there before we upgrade and then verify they are still there
> after the upgrade?
Yep, I'm working on something right now. We create an instance that
survives the upgrade and validate it on the other side. I'll just d
> Grenade uses devstack so once we have devstack on master installing
> (and configuring) placement from the new repo and disable installing
> and configuring it from the nova repo, that's the majority of the
> change I'd think.
>
> Grenade will likely need a from-rocky script to move any config th
>> 2. We have a stack of changes to zuul jobs that show nova working but
>> deploying placement in devstack from the new repo instead of nova's
>> repo. This includes the grenade job, ensuring that upgrade works.
>
> I'm guessing there would need to be changes to Devstack itself, outside
> of the
> The compromise, using the patch as currently written [1], would entail
> adding one line at the top of each test file:
>
> uuids = uuidsentinel.UUIDSentinels()
>
> ...as seen (more or less) at [2]. The subtle difference being that this
> `uuids` wouldn't share a namespace across the whole proces
> Do you mean an actual fixture, that would be used like:
>
> class MyTestCase(testtools.TestCase):
> def setUp(self):
> self.uuids = self.useFixture(oslofx.UUIDSentinelFixture()).uuids
>
> def test_foo(self):
> do_a_thing_with(self.uuids.foo)
>
> ?
>
> That's... okay I
> I think Nova should never have to rely on Cinder's hosts/backends
> information to do migrations or any other operation.
>
> In this case even if Nova had that info, it wouldn't be the solution.
> Cinder would reject migrations if there's an incompatibility on the
> Volume Type (AZ, Referenced ba
>> So my hope is that (in no particular order) Jay Pipes, Eric Fried,
>> Takashi Natsume, Tetsuro Nakamura, Matt Riedemann, Andrey Volkov,
>> Alex Xu, Balazs Gibizer, Ed Leafe, and any other contributor to
>> placement whom I'm forgetting [1] would express their preference on
>> what they'd like to
> The subject of using placement in Cinder has come up, and since then I've had
> a
> few conversations with people in and outside of that team. I really think
> until
> placement is its own project outside of the nova team, there will be
> resistance
> from some to adopt it.
I know politics wi
> We have tried out the patch:
> https://review.openstack.org/#/c/592698/
> we also applied https://review.openstack.org/#/c/592285/
>
> it turns out that we are able to half the overall time consumption, we
> did try with different sort key and dirs, the results are similar, we
> didn't try out pa
> yes, the DB query was in serial, after some investigation, it seems that we
> are unable to perform eventlet.mockey_patch in uWSGI mode, so
> Yikun made this fix:
>
> https://review.openstack.org/#/c/592285/
Cool, good catch :)
>
> After making this change, we test again, and we got this k
> I thought we were leaning toward the option where nova itself doesn't
> impose a limit, but lets the virt driver decide.
>
> I would really like NOT to see logic like this in any nova code:
>
>> if kvm|qemu:
>> return 256
>> elif POWER:
>> return 4000
>> elif:
>>
> While I have tried to review a few of the runway-slotted efforts, I
> have gotten burned out on a number of them. Other runway-slotted
> efforts, I simply don't care enough about or once I've seen some of
> the code, simply can't bring myself to review it (sorry, just being
> honest).
I have the
> Some ideas that have been discussed so far include:
FYI, these are already in my order of preference.
> A) Selecting a new, higher maximum that still yields reasonable
> performance on a single compute host (64 or 128, for example). Pros:
> helps prevent the potential for poor performance on a
> FWIW, I don't have a problem with the virt driver "knowing about
> allocations". What I have a problem with is the virt driver *claiming
> resources for an instance*.
+1000.
> That's what the whole placement claims resources things was all about,
> and I'm not interested in stepping back to the
> Dan, you are leaving out the parts of my response where I am agreeing
> with you and saying that your "Option #2" is probably the things we
> should go with.
No, what you said was:
>> I would vote for Option #2 if it comes down to it.
Implying (to me at least) that you still weren't in favor o
> So, you're saying the normal process is to try upgrading the Linux
> kernel and associated low-level libs, wait the requisite amount of
> time that takes (can be a long time) and just hope that everything
> comes back OK? That doesn't sound like any upgrade I've ever seen.
I'm saying I think it'
> My feeling is that we should not attempt to "migrate" any allocations
> or inventories between root or child providers within a compute node,
> period.
While I agree this is the simplest approach, it does put a lot of
responsibility on the operators to do work to sidestep this issue, which
might
> can I know a use case for this 'live copy metadata or ' the 'only way
> to access device tags when hot-attach? my thought is this is one time
> thing in cloud-init side either through metatdata service or config
> drive and won't be used later? then why I need a live copy?
If I do something lik
> For example, I look at your nova fork and it has a "don't allow this
> call during an upgrade" decorator on many API calls. Why wasn't that
> done upstream? It doesn't seem overly controversial, so it would be
> useful to understand the reasoning for that change.
Interesting. We have internal ac
> Takashi Natsume writes:
>
>> In some compute REST APIs, it returns the 'marker' parameter
>> in their pagination.
>> Then users can specify the 'marker' parameter in the next request.
I read this as you saying there was some way that the in-band marker
mapping could be leaked to the user via th
Takashi Natsume writes:
> In some compute REST APIs, it returns the 'marker' parameter
> in their pagination.
> Then users can specify the 'marker' parameter in the next request.
How is this possible? The only way we would get the marker is if we
either (a) listed the mappings by project_id, usi
> The oslo UUIDField emits a warning if the string used as a field value
> does not pass the validation of the uuid.UUID(str(value)) call
> [3]. All the offending places are fixed in nova except the nova-manage
> cell_v2 map_instances call [1][2]. That call uses markers in the DB
> that are not val
> I'm late to this thread but I finally went through the replies and my
> thought is, we should do a pre-flight check to verify with placement
> whether the image traits requested are 1) supported by the compute
> host the instance is residing on and 2) coincide with the
> already-existing allocati
> According to requirements and comments, now we opened the CI runs with
> run_validation = True And according to [1] below, for example, [2]
> need the ssh validation passed the test
>
> And there are a couple of comments need some enhancement on the logs
> of CI such as format and legacy incorrec
> Having briefly read the cloud-init snippet which was linked earlier in
> this thread, the requirement seems to be that the guest exposes the
> device as /dev/srX or /dev/cdX. So I guess in order to make this work:
>
> * You need to tell z/VM to expose the virtual disk as an optical disk
> * The z
> Maybe it wasn't clear but I'm not advocating that we block the change
> until volume-backed instances are supported with trusted certs. I'm
> suggesting we add a policy rule which allows deployers to at least
> disable it via policy if it's not supported for their cloud.
That's fine with me, and
> Thanks for the concern and fully under it , the major reason is
> cloud-init doesn't have a hook or plugin before it start to read
> config drive (ISO disk) z/VM is an old hypervisor and no way to do
> something like libvirt to define a ISO format disk in xml definition,
> instead, it can define
> I propose that we remove the z/VM driver blueprint from the runway at
> this time and place it back into the queue while work on the driver
> continues. At a minimum, we need to see z/VM CI running with
> [validation]run_validation = True in tempest.conf before we add the
> z/VM driver blueprint
>> global ironic
>> if ironic is None:
>> ironic = importutils.import_module('ironicclient')
I believe ironic was an early example of a client library we hot-loaded,
and I believe at the time we said this was a pattern we were going to
follow. Personally, I think this m
> for the run_validation=False issue, you are right, because z/VM driver
> only support config drive and don't support metadata service ,we made
> bad assumption and took wrong action to disabled the whole ssh check,
> actually according to [1] , we should only disable
> CONF.compute_feature_enable
> https://review.openstack.org/#/c/527658 is a z/VM patch which
> introduces their support for config drive. They do this by attaching a
> tarball to the instance, having pretended in the nova code that it is
> an iso9660. This worries me.
>
> In the past we've been concerned about adding new files
> ==> Fully dynamic: You can program one region with one function, and
> then still program a different region with a different function, etc.
Note that this is also the case if you don't have virtualized multi-slot
devices. Like, if you had one that only has one region. Consuming it
consumes the
> To the existing core team members, please respond with your comments,
> +1s, or objections within one week.
+1.
--Dan
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists
> Does Cell v2 support for multi-cell deployment in pike? Is there any
> good document about the deployment?
In the release notes of Pike:
https://docs.openstack.org/releasenotes/nova/pike.html
is this under 16.0.0 Prelude:
Nova now supports a Cells v2 multi-cell deployment. The default
d
>> And I, for one, wouldn't be offended if we could "officially start
>> development" (i.e. focus on patches, start runways, etc.) before the
>> mystical but arbitrary spec freeze date.
Yeah, I agree. I see runways as an attempt to add pressure to the
earlier part of the cycle, where we're igno
> Can you be more specific about what is limiting you when you use
> volume-backed instances?
Presumably it's because you're taking a trip over iscsi instead of using
the native attachment mechanism for the technology that you're using? If
so, that's a valid argument, but it's hard to see the trad
> Deleting all snapshots would seem dangerous though...
>
> 1. I want to reset my instance to how it was before
> 2. I'll just do a snapshot in case I need any data in the future
> 3. rebuild
> 4. oops
Yep, for sure. I think if there are snapshots, we have to refuse to do
te thing. My comment was
> Rather than overload delete_on_termination, could another flag like
> delete_on_rebuild be added?
Isn't delete_on_termination already the field we want? To me, that field
means "nova owns this". If that is true, then we should be able to
re-image the volume (in-place is ideal, IMHO) and if not,
> 2. Dan Smith mentioned another idea such that we could index the
> aggregate metadata keys like filter_tenant_id0, filter_tenant_id1,
> ... filter_tenant_idN and then combine those so you have one host
> aggregate filter_tenant_id* key per tenant.
Yep, and that's wha
Ed Leafe writes:
> I think you're missing the reality that intermediate releases have
> about zero uptake in the real world. We have had milestone releases of
> Nova for years, but I challenge you to find me one non-trivial
> deployment that uses one of them. To my knowledge, based on user
> surv
> In my experience, the longer a patch (or worse, patch series) sits
> around, the staler it gets. Others are merging changes, so the
> long-lived patch series has to be constantly rebased.
This is definitely true.
> The 20% developer would be spending a greater proportion of her time
> figuring
> I hope everyone travelling to the Sydney Summit is enjoying jet lag
> just as much as I normally do. Revenge is sweet! My big advice is that
> caffeine is your friend, and to not lick any of the wildlife.
I wasn't planning on licking any of it, but thanks for the warning.
> As of just now, all
> But the record in 'host_mappings' table of api database is not deleted
> (I tried it with nova master 8ca24bf1ff80f39b14726aca22b5cf52603ea5a0).
> The cell cannot be deleted if the records for the cell remains in
> 'host_mappings' table.
> (An error occurs with a message "There are existing host
> Any update on where we stand on issues now? Because every single patch I
> tried to land yesterday was killed by POST_FAILURE in various ways.
> Including some really small stuff - https://review.openstack.org/#/c/324720/
Yeah, Nova has only landed eight patches since Thursday. Most of those are
>> I also think there is value in exposing vGPU in a generic way, irrespective
>> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or
>> whatever approach Hyper-V/VMWare use).
>
> That is a big ask. To start with, all GPUs are not created equal, and
> various vGPU functionality
Hi all,
Due to a zuulv3 bug, we're running an old nova-network test job on
master and, as you would expect, failing hard. As a workaround in the
meantime, we're[0] going to disable that job entirely so that it runs
nowhere. This makes it not run on master (good) but also not run on
stable/new
The concepts of PCI and SR-IOV are, of course, generic
They are, although the PowerVM guys have already pointed out that they
don't even refer to virtual devices by PCI address and thus anything
based on that subsystem isn't going to help them.
but I think out of principal we should avoid a
In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.
That's not really "generalizing the PCI framework to hand
- Modify the `supports-upgrades`[3] and `supports-accessible-upgrades`[4] tags
I have yet to look into the formal process around making changes to
these tags but I will aim to make a start ASAP.
We've previously tried to avoid changing assert tag definitions because
we then have to re-rev
So to the existing core team members, please respond with a yay/nay and
after about a week or so we should have a decision (knowing a few cores
are on vacation right now).
+1 on the condition that gibi stops finding so many bugs in the stuff I
worked on. It's embarrassing.
--Dan
___
> So, I see your point here, but my concern here is that if we *modify* an
> existing schema migration that has already been tested to properly apply
> a schema change for MySQL/InnoDB and PostgreSQL with code that is
> specific to NDB, we introduce the potential for bugs where users report
> that
Are we allowed to cheat and say auto-disabling non-nova-compute services
on startup is a bug and just fix it that way for #2? :) Because (1) it
doesn't make sense, as far as we know, and (2) it forces the operator to
have to use the API to enable them later just to fix their nova
service-list outp
So it seems our options are:
1. Allow PUT /os-services/{service_uuid} on any type of service, even if
doesn't make sense for non-nova-compute services.
2. Change the behavior of [1] to only disable new "nova-compute" services.
Please, #2. Please.
--Dan
___
>> b) a compute node could very well have both local disk and shared
>> disk. how would the placement API know which one to pick? This is a
>> sorting/weighing decision and thus is something the scheduler is
>> responsible for.
> I remember having this discussion, and we concluded that a
> comp
>> My current feeling is that we got ourselves into our existing mess
>> of ugly, convoluted code when we tried to add these complex
>> relationships into the resource tracker and the scheduler. We set
>> out to create the placement engine to bring some sanity back to how
>> we think about things
> I haven't looked at what Keystone is doing, but to the degree they are
> using triggers, those triggers would only impact new data operations as
> they continue to run into the schema that is straddling between two
> versions (e.g. old column/table still exists, data should be synced to
> new col
> As most of the upgrade issues center around database migrations, we
> discussed some of the potential pitfalls at length. One approach was to
> roll-up all DB migrations into a single repository and run all upgrades
> for a given project in one step. Another was to simply have mutliple
> python v
> Thanks for answering the base question. So, if AZs are implemented with
> haggs, then really, they are truly disjoint from cells (ie, not a subset
> of a cell and not a superset of a cell, just unrelated.) Does that
> philosophy agree with what you are stating?
Correct, aggregates are at the top
The etherpad for this session is here [1]. The goal of the session was
to get some questions answered that the developers had for operators
around the topic of cellsv2.
The bulk of the time was spent discussing ways to limit instance
scheduling retries in a cellsv2 world where placement eliminates
> +1. ocata's cell v2 stuff added a lot of extra required complexity
> with no perceivable benefit to end users. If there was a long term
> stable version, then putting it in the non lts release would have
> been ok. In absence of lts, I would have recommended the cell v2
> stuff have been done in
Interestingly, we just had a meeting about cells and the scheduler,
which had quite a bit of overlap on this topic.
> That said, as mentioned in the previous email, the priorities for Pike
> (and likely Queens) will continue to be, in order: traits, ironic,
> shared resource pools, and nested prov
> The problem is there's no way to update an existing cell's transport_url
> via nova-manage.
There is:
https://review.openstack.org/#/c/431582/
> It appears the only way to get around this is manually deleting the old
> cell1 record from the db.
No, don't do that :)
> I'd like to hear more op
Hi all,
In an epic collision of cosmic coincidences, four of the primary cells
meeting attendees have a conflict tomorrow. Since there won't really be
anyone around to run (or attend) the meeting, we'll have to cancel again.
Next week we will be at the PTG so any meeting will be done there.
So,
> We have a fix here:
Actual link to fix is left as an exercise for the reader?
https://review.openstack.org/#/c/433707
--Dan
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ..
Hi all,
Today's cells meeting is canceled. We're still working on getting ocata
out the door, a bunch of normal participants are out today, and not much
has transpired for pike just yet.
--Dan
__
OpenStack Development Mailin
> Update on that agreement : I made the necessary modification in the
> proposal [1] for not verifying the filters. We now send a request to the
> Placement API by introspecting the flavor and we get a list of potential
> destinations.
Thanks!
> When I began doing that modification, I know there
> No. Have administrators set the allocation ratios for the resources they
> do not care about exceeding capacity to a very high number.
>
> If someone previously removed a filter, that doesn't mean that the
> resources were not consumed on a host. It merely means the admin was
> willing to accept
Hi all,
There will be no cells meeting next week, Jan 18 2017. I'll be in the
wilderness and nobody else was brave enough to run it in my absence.
Yeah, something like that.
--Dan
__
OpenStack Development Mailing List (not f
>> NotImplementedError: Cannot load 'nullable_string' in the base class
>>
>> Is this the correct behavior?
>
> Yes, that's the expected behaviour.
Yes.
>> Then what is the expected behavior if the field is also defaulted to
>> None?
>>
>> fields = {
>> 'nullable_string': fields.Stri
Hi all,
Given the upcoming holidays, there will not be nova cells meetings for
the remainder of the year. That puts the next one at January 4, 2017.
--Dan
__
OpenStack Development Mailing List (not for usage questions)
Unsub
> It has been a true pleasure working with you all these past few years
> and I'm thankful to have had the opportunity. As I've told people many
> times when they ask me what it's like to work on an open source project
> like this: working on proprietary software exposes you to smart people
> but y
Hi all,
Since this week's cells meeting falls on food-coma-day-eve, we're
canceling it. Anyone that wants to help move things along could review
the following patches:
https://review.openstack.org/#/q/topic:bp/cells-sched-staging+project:openstack/nova+status:open
https://review.openstack.org/#/
> I just wanted to say thanks to everyone reviewing specs this week. I've
> seen a lot of non-core newer people to the specs review process chipping
> in and helping to review a lot of the specs we're trying to get approved
> for Ocata. It can be hard to grind through several specs reviews in a
> d
> I do imagine, however, that most folks who have been working
> on nova for long enough have a list of domain experts in their heads
> already. Would actually putting that on paper really hurt?
You mean like this?
https://wiki.openstack.org/wiki/Nova#Developer_Contacts
Those are pretty much the
Hi all,
A bunch of the usual participants cannot attend the CellsV2 meeting
today, and the ones that can just discussed it last week face-to-face in
Barcelona. So, I'm going to declare it canceled for today for lack of
critical mass.
--Dan
> Basically the issue is seen in the following three lines of nova compute
> log. For that port even though it received the vif plugging event 2 mins
> before it waits for it and blocks and times out
> Is there a race condition in the code that basically gets the events to
> wait for and the one wh
> Is there a particular reason we're only retrospecting on placement?
I think that we need to have a concrete topic that applied to newton and
will apply to ocata in order to be productive. I think there will be
specific things we can change in ocata that will have an actual impact
on major work f
> Having said that, I think Dan Smith came across a fairly large
> production DB dataset recently which he was using for testing some
> archive changes, maybe Dan will become our new Johannes, but grumpier of
> course. :)
That's quite an insult to Johannes :)
While working on
> The current DB online data upgrade model feels *very opaque* to
> ops. They didn't realize the current model Nova was using, and didn't
> feel like it was documented anywhere.
> ACTION: document the DB data lifecycle better for operators
This is on me, so I'll take it. I've just thrown together
> We know:
>
> * It pretty much does what we intend it to do: allocations are added
> and deleted on server create and delete.
> * On manipulations like a resize the allocations are not updated
> immediately, there is a delay until the heal periodic job does its
> thing.
We know one more th
> So that is fine. However, correct me if I'm wrong but you're
> proposing just that these projects migrate to also use a new service
> layer with oslo.versionedobjects, because IIUC Nova/Neutron's
> approach is dependent on that area of indirection being present.
> Otherwise, if you meant som
> Thanks Dan for your response. While I do run that before I start my
> move to liberty, what I see is that it doesn't seem to flavor migrate
> meta data for the VMs that are spawned after controller upgrade from
> juno to kilo and before all computes upgraded from juno to kilo. The
> current work
> While migrate_flavor_data seem to flavor migrate meta data of the VMs
> that were spawned before upgrade procedure, it doesn't seem to flavor
> migrate for the VMs that were spawned during the upgrade procedure more
> specifically after openstack controller upgrade and before compute
> upgrade. A
>> I don't think it's all that ambitious to think we can just use
>> tried and tested schema evolution techniques that work for everyone
>> else.
>
> People have been asking me for over a year how to do this, and I have
> no easy answer, I'm glad that you do. I would like to see some
> examples o
>> Even in the case of projects using versioned objects, it still
>> means a SQL layer has to include functionality for both versions of
>> a particular schema change which itself is awkward.
That's not true. Nova doesn't have multiple models to straddle a
particular change. We just...
> It's sim
1 - 100 of 371 matches
Mail list logo