I have offered up https://review.openstack.org/#/c/60082/ as a backport to Havana. Interest was expressed in the blueprint for doing this even before this thread. If there is consensus for this as the stop-gap then it is there for the merging. However, I do not want to discourage discussion of other stop-gap solutions like what Maru proposed in the original post.
Carl On Wed, Dec 4, 2013 at 9:12 AM, Ashok Kumaran <ashokkumara...@gmail.com> wrote: > > > > On Wed, Dec 4, 2013 at 8:30 PM, Maru Newby <ma...@redhat.com> wrote: >> >> >> On Dec 4, 2013, at 8:55 AM, Carl Baldwin <c...@ecbaldwin.net> wrote: >> >> > Stephen, all, >> > >> > I agree that there may be some opportunity to split things out a bit. >> > However, I'm not sure what the best way will be. I recall that Mark >> > mentioned breaking out the processes that handle API requests and RPC >> > from each other at the summit. Anyway, it is something that has been >> > discussed. >> > >> > I actually wanted to point out that the neutron server now has the >> > ability to run a configurable number of sub-processes to handle a >> > heavier load. Introduced with this commit: >> > >> > https://review.openstack.org/#/c/37131/ >> > >> > Set api_workers to something > 1 and restart the server. >> > >> > The server can also be run on more than one physical host in >> > combination with multiple child processes. >> >> I completely misunderstood the import of the commit in question. Being >> able to run the wsgi server(s) out of process is a nice improvement, thank >> you for making it happen. Has there been any discussion around making the >> default for api_workers > 0 (at least 1) to ensure that the default >> configuration separates wsgi and rpc load? This also seems like a great >> candidate for backporting to havana and maybe even grizzly, although >> api_workers should probably be defaulted to 0 in those cases. > > > +1 for backporting the api_workers feature to havana as well as Grizzly :) >> >> >> FYI, I re-ran the test that attempted to boot 75 micro VM's simultaneously >> with api_workers = 2, with mixed results. The increased wsgi throughput >> resulted in almost half of the boot requests failing with 500 errors due to >> QueuePool errors (https://bugs.launchpad.net/neutron/+bug/1160442) in >> Neutron. It also appears that maximizing the number of wsgi requests has >> the side-effect of increasing the RPC load on the main process, and this >> means that the problem of dhcp notifications being dropped is little >> improved. I intend to submit a fix that ensures that notifications are sent >> regardless of agent status, in any case. >> >> >> m. >> >> > >> > Carl >> > >> > On Tue, Dec 3, 2013 at 9:47 AM, Stephen Gran >> > <stephen.g...@theguardian.com> wrote: >> >> On 03/12/13 16:08, Maru Newby wrote: >> >>> >> >>> I've been investigating a bug that is preventing VM's from receiving >> >>> IP >> >>> addresses when a Neutron service is under high load: >> >>> >> >>> https://bugs.launchpad.net/neutron/+bug/1192381 >> >>> >> >>> High load causes the DHCP agent's status updates to be delayed, >> >>> causing >> >>> the Neutron service to assume that the agent is down. This results in >> >>> the >> >>> Neutron service not sending notifications of port addition to the DHCP >> >>> agent. At present, the notifications are simply dropped. A simple >> >>> fix is >> >>> to send notifications regardless of agent status. Does anybody have >> >>> any >> >>> objections to this stop-gap approach? I'm not clear on the >> >>> implications of >> >>> sending notifications to agents that are down, but I'm hoping for a >> >>> simple >> >>> fix that can be backported to both havana and grizzly (yes, this bug >> >>> has >> >>> been with us that long). >> >>> >> >>> Fixing this problem for real, though, will likely be more involved. >> >>> The >> >>> proposal to replace the current wsgi framework with Pecan may increase >> >>> the >> >>> Neutron service's scalability, but should we continue to use a 'fire >> >>> and >> >>> forget' approach to notification? Being able to track the success or >> >>> failure of a given action outside of the logs would seem pretty >> >>> important, >> >>> and allow for more effective coordination with Nova than is currently >> >>> possible. >> >> >> >> >> >> It strikes me that we ask an awful lot of a single neutron-server >> >> instance - >> >> it has to take state updates from all the agents, it has to do >> >> scheduling, >> >> it has to respond to API requests, and it has to communicate about >> >> actual >> >> changes with the agents. >> >> >> >> Maybe breaking some of these out the way nova has a scheduler and a >> >> conductor and so on might be a good model (I know there are things >> >> people >> >> are unhappy about with nova-scheduler, but imagine how much worse it >> >> would >> >> be if it was built into the API). >> >> >> >> Doing all of those tasks, and doing it largely single threaded, is just >> >> asking for overload. >> >> >> >> Cheers, >> >> -- >> >> Stephen Gran >> >> Senior Systems Integrator - theguardian.com >> >> Please consider the environment before printing this email. >> >> ------------------------------------------------------------------ >> >> Visit theguardian.com >> >> On your mobile, download the Guardian iPhone app theguardian.com/iphone >> >> and >> >> our iPad edition theguardian.com/iPad Save up to 33% by subscribing >> >> to the >> >> Guardian and Observer - choose the papers you want and get full digital >> >> access. >> >> Visit subscribe.theguardian.com >> >> >> >> This e-mail and all attachments are confidential and may also >> >> be privileged. If you are not the named recipient, please notify >> >> the sender and delete the e-mail and all attachments immediately. >> >> Do not disclose the contents to another person. You may not use >> >> the information for any purpose, or store, or copy, it in any way. >> >> >> >> Guardian News & Media Limited is not liable for any computer >> >> viruses or other material transmitted with or as part of this >> >> e-mail. You should employ virus checking software. >> >> >> >> Guardian News & Media Limited >> >> >> >> A member of Guardian Media Group plc >> >> Registered Office >> >> PO Box 68164 >> >> Kings Place >> >> 90 York Way >> >> London >> >> N1P 2AP >> >> >> >> Registered in England Number 908396 >> >> >> >> >> >> -------------------------------------------------------------------------- >> >> >> >> >> >> >> >> _______________________________________________ >> >> OpenStack-dev mailing list >> >> OpenStack-dev@lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> > _______________________________________________ >> > OpenStack-dev mailing list >> > OpenStack-dev@lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev