Public bug reported: Functional test jobs randomly failing with some processes getting oom- killed when running on nodes with 4 CPU, currently raxflex-sjc3 provider nodes have 4 CPUs (normally nodes have 8 CPUs).
When the jobs running on these nodes they utilize all memory and swap and that results into oom-kill of some processes and thus test fails. Example failure:- - https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fce/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/fce5738/testr_results.html - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_18a/937106/2/gate/neutron-functional/18ac225/testr_results.html - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7aa/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/7aa3e1d/testr_results.html - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_238/936845/3/check/neutron-functional/238854b/testr_results.html Opensearch:- https://opensearch.logs.openstack.org/_dashboards/app/data- explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'94869730-aea8-11ec-9e6a-83741af3fdcd',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_q=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!f,params:(query:neutron- functional),type:phrase),query:(match_phrase:(build_name:neutron- functional))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_status,negate:!f,params:(query:FAILURE),type:phrase),query:(match_phrase:(build_status:FAILURE)))),query:(language:kuery,query:'message:%22localhost%20%7C%20Provider:%20raxflex- sjc3%22')) We have to check if the memory utilization can be improved in these tests/jobs. For CI itself we can separate the test runs in groups to unblock this. ** Affects: neutron Importance: Critical Status: New ** Tags: functional-tests gate-failure ** Changed in: neutron Importance: Undecided => Critical ** Tags added: functional-tests gate-failure -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2091855 Title: functional test job randomly failing when running on nodes with 4 CPUs Status in neutron: New Bug description: Functional test jobs randomly failing with some processes getting oom- killed when running on nodes with 4 CPU, currently raxflex-sjc3 provider nodes have 4 CPUs (normally nodes have 8 CPUs). When the jobs running on these nodes they utilize all memory and swap and that results into oom-kill of some processes and thus test fails. Example failure:- - https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_fce/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/fce5738/testr_results.html - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_18a/937106/2/gate/neutron-functional/18ac225/testr_results.html - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7aa/periodic/opendev.org/openstack/neutron/master/neutron-functional-with-oslo-master/7aa3e1d/testr_results.html - https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_238/936845/3/check/neutron-functional/238854b/testr_results.html Opensearch:- https://opensearch.logs.openstack.org/_dashboards/app/data- explorer/discover#?_a=(discover:(columns:!(_source),isDirty:!f,sort:!()),metadata:(indexPattern:'94869730-aea8-11ec-9e6a-83741af3fdcd',view:discover))&_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_q=(filters:!(('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_name,negate:!f,params:(query:neutron- functional),type:phrase),query:(match_phrase:(build_name:neutron- functional))),('$state':(store:appState),meta:(alias:!n,disabled:!f,index:'94869730-aea8-11ec-9e6a-83741af3fdcd',key:build_status,negate:!f,params:(query:FAILURE),type:phrase),query:(match_phrase:(build_status:FAILURE)))),query:(language:kuery,query:'message:%22localhost%20%7C%20Provider:%20raxflex- sjc3%22')) We have to check if the memory utilization can be improved in these tests/jobs. For CI itself we can separate the test runs in groups to unblock this. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2091855/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp