[opnfv-tech-discuss] 答复: Re: [doctor] For the issue about the notification time is large than 1S

dong . wenjuan Mon, 12 Sep 2016 00:43:32 -0700

> BTW: as @Tomi pointed out, the inspector should be topology aware, 
knowing all VM's on the host, so I think we may > create the server list
> in initialization phase and use the saved list when processing 
`compute.host.down` event. This will be a better emulation of real 
inspector.



I add timestamp print inside `disable_computer_host` and it looks 
servers.list consumes more than 1s (total: 1.53), repeated several times 
with the same result.

So I think we need to get it fixed and it may resolve the performance 
issue on fuel. Anybody can have a similar test on apex to collect data for 
comparison?


    def disable_compute_host(self, hostname):
        opts = {'all_tenants': True, 'host': hostname}
        app.logger.debug('before call nova-list at %s' % time.time())
        for server in self.nova.servers.list(detailed=False, 
search_opts=opts):
            app.logger.debug('after call nova-list at %s' % time.time())
            self.nova.servers.reset_state(server, 'error')


--------------------------------------------------------------------------------
DEBUG in inspector [inspector.py:38]:
before call nova-list at 1473664289.13
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
DEBUG in inspector [inspector.py:40]:
after call nova-list at 1473664290.17
--------------------------------------------------------------------------------




Yujun Zhang <[email protected]> 
2016-09-12 14:57

收件人
"Souville, Bertrand" <[email protected]>, 
"[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, 
"[email protected]" <[email protected]>, "[email protected]" 
<[email protected]>
抄送
"[email protected]" <[email protected]>
主题
Re: [opnfv-tech-discuss] [doctor] For the issue about the notification 
time is large than 1S






Hi, Carlos

According to the data collected by @Wenjuan, it seems when the test fails, 
most time is consumed by the inspector api disable_compute_host[1]

    if event_type == 'compute.host.down':
        inspector.disable_compute_host(hostname)

I checked the source code and it looks it will iterate through the server 
list to set them to error state. So I wonder if it is related to the total 
number of server in the test environment. 

Could you please provide the log in apex environment so we can dig further 
to find out the root cause?

BTW: as @Tomi pointed out, the inspector should be topology aware, knowing 
all VM's on the host, so I think we may create the server list
in initialization phase and use the saved list when processing 
`compute.host.down` event. This will be a better emulation of real 
inspector.

[1] https://git.opnfv.org/cgit/doctor/tree/tests/inspector.py#n63



On Fri, Sep 9, 2016 at 7:15 PM Souville, Bertrand <
[email protected]> wrote:
My understanding is that Fuel team/experts are now investigating the 
issue. Let’s give them few more days…
 
Bertrand
 
From: Carlos Goncalves [mailto:[email protected]] 
Sent: Friday, September 09, 2016 11:06 AM
To: [email protected]; Souville, Bertrand <
[email protected]>; [email protected]; [email protected]
; Kunzmann, Gerald <[email protected]>
Cc: [email protected]
Subject: RE: [opnfv-tech-discuss] [doctor] For the issue about the 
notification time is large than 1S
 
As I’ve already commented in the patch you submitted to Gerrit [1]: no, 
we should not extend the accepted max notification time from 1s to 
anything higher than that.
 
Currently our Doctor tests are passing in Apex in all available PODs as 
well as local environments (e.g. devstack). For Apex, the notification 
time is around 250ms which is much lower than the max 1 second. If we 
cannot get it green light in any other scenario/installer, we don’t claim 
any support.
 
Carlos
 
[1] https://gerrit.opnfv.org/gerrit/#/c/20627
 
From: [email protected] [mailto:[email protected]] 
Sent: 09 September 2016 07:00
To: Carlos Goncalves; [email protected]; [email protected]; 
[email protected]; [email protected]
Cc: [email protected]
Subject: [opnfv-tech-discuss] [doctor] For the issue about the 
notification time is large than 1S
 

Hi doctors, 

For the issue about the notification time is large than 1S. 
I check the log and find out that from the inspector received the event to 
nova-api begin to handle the reset_state is taken up most of the time, 
about over 80%. For example, the total notification time is 2.26s, the 
process of inspector takes 1.983s. 

In the test inspector script, we find all the VM under all telant, and 
then set all the VM states as error. 
As we need to improve the performance,but it can not be handle in a short 
time. 

Shall we extned 1S to 3S or change back to no failed to notification time 
calculation to let functest green? 
Meanwhile doing the performance improvement, then change it back? 

Any suggestions will be welcome. Thank you~ 

BR, 
dwj



董文娟   Wenjuan Dong 
控制器四部 / 无线产品   Controller Dept Ⅳ. / Wireless Product Operation 
  



上海市浦东新区碧波路889号中兴通讯D3
D3, ZTE, No. 889, Bibo Rd.
T: +86 021 85922    M: +86 13661996389
E: [email protected]
www.ztedevice.com
 
_______________________________________________
opnfv-tech-discuss mailing list
[email protected]
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss

_______________________________________________
opnfv-tech-discuss mailing list
[email protected]
https://lists.opnfv.org/mailman/listinfo/opnfv-tech-discuss

[opnfv-tech-discuss] 答复: Re: [doctor] For the issue about the notification time is large than 1S

Reply via email to