On Mon, Apr 07, 2025 at 06:13:21AM -0400, Bill deWindt wrote:
> Thanks for the additional info on this issue. Here's the output from both of
> the machines I have here. One interesting thing I've been seeing from the
> beginning is that the 450Mhz machine always has the hung process at PID 19,
> and here's its output:
> 
> # cat /proc/19/stack
> [<0>] __remove_hrtimer+0x5c/0xd8
> [<0>] msleep+0x30/0x4c
> [<0>] tau_work_func+0x24/0x68
> [<0>] process_one_work+0x1b8/0x3d8
> [<0>] worker_thread+0x288/0x3cc
> [<0>] kthread+0xe0/0xe4
> [<0>] start_kernel_thread+0x10/0x14

...

> I am guessing (perhaps incorrectly?) that since all of the output from each
> trace above matches, with the exception of the first line, this gives an
> idea of where the tickle lies. Is there further digging I can do that would
> be useful?

It looks like this is the thermal monitoring for the CPU. The code is
found in arch/powerpc/kernel/tau_6xx.c and tau_work_func does call the
function msleep, which does an uninterruptible sleep.

static void tau_work_func(struct work_struct *work)
{
        msleep(shrink_timer);
        on_each_cpu(tau_timeout, NULL, 0);
        /* schedule ourselves to be run again */
        queue_work(tau_workq, work);
}

The function at the top of each stack is presumably happening in a
hardware interrupt handler since msleep would cause the task to sleep.

Since this worker thread would be created very early in the boot
process, it's not surprising if it gets a fairly consistent PID.

This function is called from a worker thread running items from a work
queue, and it does an uninterruptible sleep before running tau_timeout
on each CPU followed by putting itself back on the workqueue. Since
this is a dedicated worker thread for this queue, that one thread
will basically just sit in this function all the time. If tau_timeout
doesn't take any time to run on this hardware, that thread will spend
most of its time in msleep which will show as state 'D' in ps and thus
affect the load average.

I'm not an expert in this particular driver or how it needs to behave,
but perhaps it shouldn't be using msleep for something like this. I
know in some of the code I do manage that we changed out some
uninterruptible sleeps for interruptible ones specifically so the
threads would show in state 'S' instead of 'D' to avoid affecting
the load average. Signal handling for kernel threads is different
from the handling in a user thread in a system call, so there are
some tricks that work without causing major issues.

Someone who knows the core powerpc code better than I will need to
comment on this driver and if it makes sense to change it.

        Brad Boyer
        f...@allandria.com

Reply via email to