https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219216
Bug ID: 219216 Summary: sched_bind() blocks if the entropy pool is starved Product: Base System Version: 11.0-STABLE Hardware: amd64 OS: Any Status: New Severity: Affects Many People Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: k...@freebsd.org I recently updated my 11-stable system: FreeBSD AprilRyan.norad 11.0-STABLE FreeBSD 11.0-STABLE #3 r318143: Wed May 10 17:56:12 CEST 2017 root@AprilRyan.norad:/usr/obj/S403/amd64/usr/src/sys/S403 amd64 I immediately noticed that rand_harvestq is permanently running and consuming a small but significant amount of CPU time, now. To investigate I started to `dd bs=1m < /dev/random > /dev/null` Coincidentally I was running a release candidate of powerd++ in foreground mode with temperature throttling at the same time: https://github.com/lonkamikaze/powerdxx/releases/tag/0.3.0-rc1 The following happened when I started the `dd`: - Two cores were fully consumed, one by dd, one by random_harvestq - powerd++ started to stutter and then completely freeze After I killed the `dd` process the following happened: - random_harvestq continued to consume an entire core for a long time - powerd++ remained frozen By erratically swiping my fingers over the touch screen I got powerd++ to return operation in a stuttering fashion. It took several minutes before the system acted normal again. The two surprising conclusions so far: - /dev/random blocks - powerd++ consumes randomness So I investigated the issue to find that it is the access to the following sysctls that blocks: dev.cpu.0.temperature dev.cpu.1.temperature dev.cpu.2.temperature dev.cpu.3.temperature Unloading the coretemp module in the blocked state resulted in a kernel panic that told me coretemp was stuck in coretemp_get_val_sysctl(). With an unhealthy dose of uprintf() calls I figured out that the block happens in coretemp_get_thermal_msr() (see /usr/src/sys/dev/coretemp/coretemp.c:306). The problem is the following code: 311 thread_lock(curthread); 312 sched_bind(curthread, cpu); 313 thread_unlock(curthread); The call to sched_bind() blocks when the entropy pool is starved (I suspect only if the thread is not currently running on the right core any way). Because I cannot fiddle with and replace sched_ule at runtime, I have decided this is as far as I'm digging. I think that the scheduler depends on entropy is very worrying, not to say a bug, especially if randomness is a scarce resource. I got the system to panic many times during this investigation, mostly because locks have been held too long. E.g.: spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid 100196) too long spin lock 0xffffffff81c8e380 (sched lock 3) held by 0xfffff80028b19560 (tid 100196) too long panic: spin lock held too long cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe045154f850 vpanic() at vpanic+0x186/frame 0xfffffe045154f8d0 panic() at panic+0x43/frame 0xfffffe045154f930 _mtx_lock_spin_cookie() at _mtx_lock_spin_cookie+0x311/frame 0xfffffe045154f9a0 sched_idletd() at sched_idletd+0x3aa/frame 0xfffffe045154fa70 fork_exit() at fork_exit+0x85/frame 0xfffffe045154fab0 fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe045154fab0 --- trap 0, rip = 0, rsp = 0, rbp = 0 --- KDB: enter: panic Uptime: 4m31s I also find it questionable that entropy harvesting continues after initially seeding the RNG, making /dev/random susceptible to entropy poisoning by a malicious process that feeds bad entropy into /dev/random. -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"