Please CC me on comments.

I'm seeing a lot of these errors on my dual core fileserver:
-----------------------------------------------------------------------

Sep 23 01:51:28 files kernel: INFO: rcu_sched detected stalls on CPUs/tasks:
Sep 23 01:51:28 files kernel:         1-...!: (0 ticks this GP) idle=27c/0/0 
softirq=35425/35425 fqs=0
Sep 23 01:51:28 files kernel:         (detected by 0, t=60009 jiffies, g=20812, 
c=20811, q=121)
Sep 23 01:51:28 files kernel: Sending NMI from CPU 0 to CPUs 1:
Sep 23 01:51:28 files kernel: NMI backtrace for cpu 1 skipped: idling at 
native_safe_halt+0x2/0x10
Sep 23 01:51:28 files kernel: rcu_sched kthread starved for 60009 jiffies! g20812 
c20811 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x402 ->cpu=1
Sep 23 01:51:28 files kernel: RCU grace-period kthread stack dump:
Sep 23 01:51:28 files kernel: rcu_sched       I    0    10      2 0x80000000
Sep 23 01:51:33 files kernel: Call Trace:
Sep 23 01:51:33 files kernel:  ? __schedule+0x25c/0x860
Sep 23 01:51:33 files kernel:  schedule+0x28/0x80
Sep 23 01:51:33 files kernel:  schedule_timeout+0x174/0x370
Sep 23 01:51:33 files kernel:  ? __next_timer_interrupt+0xc0/0xc0
Sep 23 01:51:33 files kernel:  rcu_gp_kthread+0x4b6/0x8c0
Sep 23 01:51:33 files kernel:  ? 
_synchronize_rcu_expedited.constprop.68+0x310/0x310
Sep 23 01:51:33 files kernel:  kthread+0x113/0x130
Sep 23 01:51:33 files kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Sep 23 01:51:33 files kernel:  ret_from_fork+0x35/0x40

-----------------------------------------------------------------------

The kernel reported bogoMIPS for the cores are as follows:

$ grep bogo /proc/cpuinfo
bogomips        : 4219.49
bogomips        : 184253.06
$

What is that value for the second Athlon core (seems extremely bogus), and 
would/could that be the reason for the schedule_timeouts?  This bogus value 
also shows up in the bootup log when the second core is activated.  Seems to be 
AMD specific, as the values are correct on my Xeon machines.

Kernel is a stock Fedora 4.18.7-100 release.  Machine is an old Dell Experion 
that I've repurposed as a fileserver and postgresql machine.

Other than RTFM, or please build a bunch of kernels from source on your slow 
machine, using differing config options to help track down the cause of 
this...any thoughts about a solution?


Reply via email to