On 28/10/2023 08.58, Eyal Lebedinsky wrote:
Fully updated F28.

I had to send one (of 7) member disk for RMA.
I notice that the system is very non responsive. 'top' shows

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
1365697 root      20   0       0      0      0 R  93.8   0.0 384:40.55 
kworker/u16:3+flush-9:127

This continues even when there are no user actions (ff, tb closed).

A few days ago it stopped, but today I see that it kept running all night where 
there were
period of inactivity for a few hours.

As another point: a few days ago I received a disk from RMA and the recovery 
went as fast as expected.
I then removed another disk to send for RMA.

Is this expected? Is there anything I can do to improve the situation?

TIA

Maybe a hint. On a whim I decided to look at interrupts on the machine. I see 
an item in
        /proc/interrupts
that grows by 80-90 every second.
It is listed as 'IR-PCI-MSIX-0000:03:00.0    0-edge      mpt2sas0-msix0' which 
is probably related
to the raid card used for this array.

Another hint: I see a job stuck in D state.

$ ps aux|grep parted
root     2398175  0.0  0.0   6184  3700 ?        D    05:10   0:00 parted -l

This command runs overnight to collect some stats, and it seems that this 
program is hanging.
This one started at "2023-10-27 05:10:01", so when the disk was still in the 
machine (not in the array)
but after it just finished being zeroed.

--
Eyal at Home (e...@eyal.emu.id.au)
_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to