Increasing the RPC timeout should help. I have seen this problem in nova-network in the past. Vish suggestion sounds good.
Recently we launched by mistake 128 VMs in a production environment of a customer: 0 errors. They are using 12 cores and several gigs for the nova-network servers with dual 10G pipes. So hardware matters, of course. My two cents, Diego -- Diego Parrilla <http://www.stackops.com/>*CEO* *www.stackops.com | * diego.parri...@stackops.com** | +34 649 94 43 29 | skype:diegoparrilla* * <http://www.stackops.com/> * * On Tue, Feb 19, 2013 at 10:09 AM, gtt116 <gtt...@126.com> wrote: > Hi Diego > > Thanks for you reply. > How many hosts do you have? I have 4 hosts. And in this bug, > https://bugs.launchpad.net/nova/+bug/1094226, The N is 20. In my > environment N is about 16. > > I found that nova-network is too busy to deal with so many rpc request at > the same time. The Rabbitmq is strong enough in the scenario. > > 于 2013年02月19日 16:54, Diego Parrilla Santamaría 写道: > > Hi gtt, > > what does it mean for you 'lots of instance simultaneously'? 100, 1000, > 10000, more? > > We have launched >100 (but less than <1000) simultaneously without any > issue. Rabbit running in a multicore with several gigs of RAM with out of > the box configuration. > > Cheers > Diego > -- > Diego Parrilla > <http://www.stackops.com/>*CEO* > *www.stackops.com | * diego.parri...@stackops.com** | +34 649 94 43 29 | > skype:diegoparrilla* > * <http://www.stackops.com/> > * > > * > > > > On Tue, Feb 19, 2013 at 9:35 AM, gtt116 <gtt...@126.com> wrote: > >> Hi all, >> >> When create lots of instance simultaneously, there will be lots of >> instance in ERROR state. And most of them are caused by network rpc request >> timeout. This result is not so graceful. >> >> I think it will be better if scheduler keep a queue of creating request. >> when he find all the hosts are busy enough(compute_node.current_workload >> reach some value), stop cast the request to host temporarily, until he >> found some host free enough. In this way, we can make sure booting lots of >> instances simultaneously results in active instances rather than lots of >> ERROR instance. but will cause a small weak point, if the top value of >> current_workload small enough, create instance processing will be slow. >> >> Do you have another quick fix? >> >> Thanks, >> >> -- >> best regards, >> gtt >> >> >> _______________________________________________ >> Mailing list: https://launchpad.net/~openstack >> Post to : openstack@lists.launchpad.net >> Unsubscribe : https://launchpad.net/~openstack >> More help : https://help.launchpad.net/ListHelp >> >> > > > -- > best regards, > gtt > >
_______________________________________________ Mailing list: https://launchpad.net/~openstack Post to : openstack@lists.launchpad.net Unsubscribe : https://launchpad.net/~openstack More help : https://help.launchpad.net/ListHelp