Following up also the discussion in https://gerrit.opnfv.org/gerrit/#/c/20877/
It just remind me of something. Is there a guideline on how the inspector should behave to achieve the targeted performance from Doctor project? e.g. be topology awareness, simultaneously set all affected host to error state instead of one by one... Although we shall keep the sample inspector a simple model, it would be good to demonstrate these essential design guidelines that Doctor project is concerning. I think it could be a topic of release D. As for release C, I think we just need to get the blocking issues resolved and keep the rest as it is. What do you think, doctors? -- Yujun On Mon, Sep 12, 2016 at 5:43 PM Juvonen, Tomi (Nokia - FI/Espoo) < [email protected]> wrote: > Hi, > > > > So all and all. Surely need something fixed in FUEL installation, but also > Inspector needs to work fast enough in scale. Optimally this means it needs > to be aware on VMs running on host, so there will not be any extra time > spent figuring out that when failure occurs. Also currently there only > exist this very simple test scenario where you only need to be aware of VMs > on a single host. It is totally something different when need to find out > thing like switch failure that also cause problem on host(s) (more time > consuming). Anyhow those will be then when also Vitrage is integrated as > the Inspector. > > > > Br, > > Tomi > > > > *From:* Yujun Zhang [mailto:[email protected]] > *Sent:* Monday, September 12, 2016 9:58 AM > *To:* Souville, Bertrand <[email protected]>; > [email protected]; [email protected]; [email protected]; > Juvonen, Tomi (Nokia - FI/Espoo) <[email protected]> > *Cc:* [email protected] > *Subject:* Re: [opnfv-tech-discuss] [doctor] For the issue about the > notification time is large than 1S > > > > Hi, Carlos > > > > According to the data collected by @Wenjuan, it seems when the test fails, > most time is consumed by the inspector api disable_compute_host[1] > > > > if event_type == 'compute.host.down': > > inspector.disable_compute_host(hostname) > > > > I checked the source code and it looks it will iterate through the server > list to set them to error state. So I wonder if it is related to the total > number of server in the test environment. > > > > Could you please provide the log in apex environment so we can dig further > to find out the root cause? > > > > BTW: as @Tomi pointed out, the inspector should be topology aware, knowing > all VM's on the host, so I think we may create the server list > > in initialization phase and use the saved list when processing > `compute.host.down` event. This will be a better emulation of real > inspector. > > > > [1] https://git.opnfv.org/cgit/doctor/tree/tests/inspector.py#n63 > > > > > > On Fri, Sep 9, 2016 at 7:15 PM Souville, Bertrand < > [email protected]> wrote: > > My understanding is that Fuel team/experts are now investigating the > issue. Let’s give them few more days… > > > > Bertrand > > > > *From:* Carlos Goncalves [mailto:[email protected]] > *Sent:* Friday, September 09, 2016 11:06 AM > *To:* [email protected]; Souville, Bertrand < > [email protected]>; [email protected]; [email protected]; > Kunzmann, Gerald <[email protected]> > *Cc:* [email protected] > *Subject:* RE: [opnfv-tech-discuss] [doctor] For the issue about the > notification time is large than 1S > > > > As I’ve already commented in the patch you submitted to Gerrit [1]: no, we > should not extend the accepted max notification time from 1s to anything > higher than that. > > > > Currently our Doctor tests are passing in Apex in all available PODs as > well as local environments (e.g. devstack). For Apex, the notification time > is around 250ms which is much lower than the max 1 second. If we cannot get > it green light in any other scenario/installer, we don’t claim any support. > > > > Carlos > > > > [1] https://gerrit.opnfv.org/gerrit/#/c/20627 > > > > *From:* [email protected] [mailto:[email protected] > <[email protected]>] > *Sent:* 09 September 2016 07:00 > *To:* Carlos Goncalves; [email protected]; [email protected]; > [email protected]; [email protected] > *Cc:* [email protected] > *Subject:* [opnfv-tech-discuss] [doctor] For the issue about the > notification time is large than 1S > > > > > Hi doctors, > > For the issue about the notification time is large than 1S. > I check the log and find out that from the inspector received the event > to nova-api begin to handle the reset_state is taken up most of the time, > about over 80%. For example, the total notification time is 2.26s, the > process of inspector takes 1.983s. > > In the test inspector script, we find all the VM under all telant, and > then set all the VM states as error. > As we need to improve the performance,but it can not be handle in a short > time. > > Shall we extned 1S to 3S or change back to no failed to notification time > calculation to let functest green? > Meanwhile doing the performance improvement, then change it back? > > Any suggestions will be welcome. Thank you~ > > BR, > dwj > > > > *董文娟** Wenjuan Dong* > > 控制器四部 / 无线产品 Controller Dept Ⅳ. / Wireless Product Operation > > > [image: image003.jpg] > > [image: image004.jpg] > 上海市浦东新区碧波路889号中兴通讯D3 > D3, ZTE, No. 889, Bibo Rd. > T: +86 021 85922 M: +86 13661996389 > E: [email protected] > www.ztedevice.com > > > > _______________________________________________ > opnfv-tech-discuss mailing list > [email protected] > https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss > >
_______________________________________________ opnfv-tech-discuss mailing list [email protected] https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss
