Could we replace the refresh from the period task with a timestamp in the network cache of when it was last updated so that we refresh it only when it’s accessed if older that X ?
From: Aaron Rosen [mailto:aaronoro...@gmail.com] Sent: 29 May 2014 01:47 To: Assaf Muller Cc: OpenStack Development Mailing List (not for usage questions) Subject: Re: [openstack-dev] [Nova] [Neutron] heal_instance_info_cache_interval - Can we kill it? On Wed, May 28, 2014 at 7:39 AM, Assaf Muller <amul...@redhat.com<mailto:amul...@redhat.com>> wrote: ----- Original Message ----- > Hi, > > Sorry somehow I missed this email. I don't think you want to disable it, > though we can definitely have it run less often. The issue with disabling it > is if one of the notifications from neutron->nova never gets sent > successfully to nova (neutron-server is restarted before the event is sent > or some other internal failure). Nova will never update it's cache if the > heal_instance_info_cache_interval is set to 0. The thing is, this periodic healing doesn't imply correctness either. In the case where you lose a notification and the compute node hosting the VM is hosting a non-trivial amount of VMs it can take (With the default of 60 seconds) dozens of minutes to update the cache, since you only update a VM a minute. I could understand the use of a sanity check if it was performed much more often, but as it is now it seems useless to me since you can't really rely on it. I agree with you. That's why we implemented the event callback so that the cache would be more up to date. In honesty you can probably safely disable the heal_instance_info_cache_interval and things will probably be fine as we haven't seen many failures where events from neutron fail to send. If we find out this is the case we can definitely make the event sending notification logic in neutron much more robust by persisting events to the db and implementing retry logic on failure there to help ensure nova gets the notification. What I'm trying to say is that with the inefficiency of the implementation, coupled with Neutron's default plugin inability to cope with a "large" amount of API calls, I feel like the disadvantages outweigh the advantages when it comes to the cache healing. Right the current heal_instance implementation has scaling issues as every compute node runs this task querying neutron. The more compute nodes you have the more querying. Hopefully the nova v3 api should solve this issue though as the networking information will no longer have to live in nova as well. So someone interested in this data network data can query neutron directly and we can avoid these type of caching issues all together :) How would you feel about disabling it, optimizing the implementation (For example by introducing a new networking_for_instance API verb to Neutron?) then enabling it again? I think this is a good idea we should definitely implement something like this so nova can interface with less api calls. > The neutron->nova events help > to ensure that the nova info_cache is up to date sooner by having neutron > inform nova whenever a port's data has changed (@Joe Gordon - this happens > regardless of virt driver). > > If you're using the libvirt virt driver the neutron->nova events will also be > used to ensure that the networking is 'ready' before the instance is powered > on. > > Best, > > Aaron > > P.S: we're working on making the heal_network call to neutron a lot less > expensive as well in the future. > > > > > On Tue, May 27, 2014 at 7:25 PM, Joe Gordon < > joe.gord...@gmail.com<mailto:joe.gord...@gmail.com> > wrote: > > > > > > > On Wed, May 21, 2014 at 6:21 AM, Assaf Muller < > amul...@redhat.com<mailto:amul...@redhat.com> > wrote: > > > Dear Nova aficionados, > > Please make sure I understand this correctly: > Each nova compute instance selects a single VM out of all of the VMs > that it hosts, and every <heal_instance_info_cache_interval> seconds > queries Neutron for all of its networking information, then updates > Nova's DB. > > If the information above is correct, then I fail to see how that > is in anyway useful. For example, for a compute node hosting 20 VMs, > it would take 20 minutes to update the last one. Seems unacceptable > to me. > > Considering Icehouse's Neutron to Nova notifications, my question > is if we can change the default to 0 (Disable the feature), deprecate > it, then delete it in the K cycle. Is there a good reason not to do this? > > Based on the patch that introduced this function [0] you may be on to > something, but AFAIK unfortunately the neutron to nova notifications only > work in libvirt right now [1], so I don' think we can fully deprecate this > periodic task. That being said turning it off by default may be an option. > Have you tried disabling this feature and seeing what happens (in the gate > and/or in production)? > We've disabled it in a scale lab and didn't observe any black holes forming or other catastrophes. > > [0] https://review.openstack.org/#/c/4269/ > [1] https://wiki.openstack.org/wiki/ReleaseNotes/Icehouse > > > > > Assaf Muller, Cloud Networking Engineer > Red Hat > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev