Also there is a re-forking delay added to allow instances to fire up and
back off if resources get low. These changes have been tested with 256,
1024, 4096 and 8192 instances on a 24 thread system with 32GB of memory.
** Changed in: linux (Ubuntu)
Status: New => Invalid
** Changed in: linux (Ubuntu)
Importance: High => Low
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1968361
Title:
rawsock test BUG: soft lockup
Status in Linux:
Fix Released
Status in Stress-ng:
Fix Committed
Status in linux package in Ubuntu:
Invalid
Bug description:
When running the rawsock stressor on large system with 32 CPUs and
above, I always hit soft lockup in the kernel, and sometime it will
lock up the system if running it for longtime. This issue is on all
major OSes that I tested: Ubunutu 20.04. RHEL7,8, SUSE 15
my system:
stress-ng V0.13.03-5-g9093bce7
#lscpu | grep CPU
CPU(s): 64
On-line CPU(s) list: 0-63
NUMA node0 CPU(s): 0-63
# ./stress-ng --rawsock 20 -t 5
stress-ng: info: [49748] setting to a 5 second run per stressor
stress-ng: info: [49748] dispatching hogs: 20 rawsock
Message from syslogd@rain65 at Apr 8 12:18:26 ...
kernel:watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [stress-ng:49781]
....
If I run with --timeout 60 secs, it will lock up the systems.
The issue is lock starvation in the kernel:
- when stressor create an instance, forking a new child/client and
parent/server processes, recreating sockets for these processes. The kernel
acquires the Write lock for adding them to raw sock hash table.
- the client process immediately starts sending data in a do while {} loop.
The kernel acquires the Read Lock for accessing raw sock hash table, and
cloning the data packets for all raw socket processes.
- The main stress-ng process may still continue to create the rest of
instances. The kernel may hit the lock starvation (as error shown above)
- similar to it, when the timeout expires, the parents would try to close
their sockets, which the kernel also try to acquire the Write Lock, before
sending SIGKILL to their child processes. We may hit the lock starvation, since
clients have not closed their sockets and continue sending data.
I'm not sure this is intended, but to avoid the kernel lock starvation
in raw socket, I propose the simple patch attached. I has tested it a
large system with 128 CPUs without hitting any BUG: soft lock up.
Thanks,
Thinh Tran
To manage notifications about this bug go to:
https://bugs.launchpad.net/linux/+bug/1968361/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp