Public bug reported:

Functional test jobs randomly failing with some processes getting oom-
killed when running on nodes with 4 CPU, currently raxflex-sjc3 provider
nodes have 4 CPUs (normally nodes have 8 CPUs).

When the jobs running on these nodes they utilize all memory and swap
and that results into oom-kill of some processes and thus test fails.

Example failure:- 
- 
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fce/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/fce5738/testr_results.html
- 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_18a/937106/2/gate/neutron-functional/18ac225/testr_results.html
- 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7aa/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/7aa3e1d/testr_results.html
- 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_238/936845/3/check/neutron-functional/238854b/testr_results.html

Opensearch:- https://opensearch.logs.openstack.org/_dashboards/app/data-
explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'94869730-aea8-11ec-9e6a-83741af3fdcd',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_q=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!f,params:(query:neutron-
functional),type:phrase),query:(match_phrase:(build_name:neutron-
functional))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_status,negate:!f,params:(query:FAILURE),type:phrase),query:(match_phrase:(build_status:FAILURE)))),query:(language:kuery,query:'message:%22localhost%20%7C%20Provider:%20raxflex-
sjc3%22'))


We have to check if the memory utilization can be improved in these tests/jobs.
For CI itself we can separate the test runs in groups to unblock this.

** Affects: neutron
     Importance: Critical
         Status: New


** Tags: functional-tests gate-failure

** Changed in: neutron
   Importance: Undecided => Critical

** Tags added: functional-tests gate-failure

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2091855

Title:
  functional test job randomly failing when running on nodes with 4 CPUs

Status in neutron:
  New

Bug description:
  Functional test jobs randomly failing with some processes getting oom-
  killed when running on nodes with 4 CPU, currently raxflex-sjc3
  provider nodes have 4 CPUs (normally nodes have 8 CPUs).

  When the jobs running on these nodes they utilize all memory and swap
  and that results into oom-kill of some processes and thus test fails.

  Example failure:- 
  - 
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fce/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/fce5738/testr_results.html
  - 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_18a/937106/2/gate/neutron-functional/18ac225/testr_results.html
  - 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7aa/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/7aa3e1d/testr_results.html
  - 
https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_238/936845/3/check/neutron-functional/238854b/testr_results.html

  Opensearch:-
  https://opensearch.logs.openstack.org/_dashboards/app/data-
  
explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'94869730-aea8-11ec-9e6a-83741af3fdcd',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_q=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!f,params:(query:neutron-
  functional),type:phrase),query:(match_phrase:(build_name:neutron-
  
functional))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_status,negate:!f,params:(query:FAILURE),type:phrase),query:(match_phrase:(build_status:FAILURE)))),query:(language:kuery,query:'message:%22localhost%20%7C%20Provider:%20raxflex-
  sjc3%22'))

  
  We have to check if the memory utilization can be improved in these 
tests/jobs.
  For CI itself we can separate the test runs in groups to unblock this.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2091855/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to