Re: [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations

Vishal Chourasia Thu, 05 Mar 2026 21:45:40 -0800

On Mon, Mar 02, 2026 at 05:17:16PM +0530, Samir M wrote:
> 
> On 27/02/26 6:43 am, Joel Fernandes wrote:
> > On Wed, Feb 18, 2026 at 02:09:18PM +0530, Vishal Chourasia wrote:
> > > Expedite synchronize_rcu during the SMT mode switch operation when
> > > initiated via /sys/devices/system/cpu/smt/control interface
> > >
> > After the locking related changes in patch 1, is expediting still required? 
> > I
Yes.
> > am just a bit concerned that we are papering over the real issue of over
> > usage of synchronize_rcu() (which IIRC we discussed in earlier versions of
> > the patches that reducing the number of lock acquire/release was supposed to
> > help.)
At present, I am not sure about the underlying issue. So far what I have
found is when synchronize_rcu() is invoked, it marks the start of a new
grace period number, say A. Thread invoking synchronize_rcu() blocks
until all CPUs have reported QS for GP "A". There is a rcu grace period
kthread that runs periodically looping over a CPU list to figure out all
CPUs have reported QS. In the trace, I find some CPUs reporting QS for
sequence number way back in the past for ex. A - N where N is > 10.


> > 
> > Could you provide more justification of why expediting these sections is
> > required if the locking concerns were addressed? It would be great if you 
> > can
> > provide performance numbers with only the first patch and without the second
> > patch. That way we can quantify this patch.
> > 
> > 
> SMT Mode    | Without Patch(Base) | both patch applied | % Improvement  |
> ------------------------------------------------------------------------|
> SMT=off     | 16m 13.956s         |     6m 18.435s     |  +61.14 %      |
> SMT=on      | 12m 0.982s          |     5m 59.576s     |  +50.10 %      |
> 
> When I tested the below patch independently, I did not observe any
> improvements for either smt=on or smt=off. However, in the smt=off scenario,
> I encountered hung task splats (with call traces), where some threads were
> blocked on cpus_read_lock. Please also refer to the attached call trace
> below.
> Patch 1:
> https://lore.kernel.org/all/[email protected]/
> 
> SMT Mode    | Without Patch(Base) | just patch 1 applied   | % Improvement 
> |
> ----------------------------------------------------------------------------|
> SMT=off     | 16m 13.956s         |     16m 9.793s         |  +0.43 %     
>  |
> SMT=on      | 12m 0.982s          |     12m 19.494s        |  -2.57 %     
>  |
> 
> 
> Call traces:
> 12377] [  T8746]    Tainted: G      E 7.0.0-rc1-150700.51-default-dirty #1
> [ 1477.612384] [  T8746] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [ 1477.612389] [  T8746] task:systemd     state:D stack:0   pid:1  tgid:1 
>  ppid:0   task_flags:0x400100 flags:0x00040000
> [ 1477.612397] [  T8746] Call Trace:
> [ 1477.612399] [  T8746] [c00000000cc0f4f0] [0000000000100000] 0x100000
> (unreliable)
> [ 1477.612416] [  T8746] [c00000000cc0f6a0] [c00000000001fe5c]
> __switch_to+0x1dc/0x290
> [ 1477.612425] [  T8746] [c00000000cc0f6f0] [c0000000012598ac]
> __schedule+0x40c/0x1a70
> [ 1477.612433] [  T8746] [c00000000cc0f840] [c00000000125af58]
> schedule+0x48/0x1a0
> [ 1477.612439] [  T8746] [c00000000cc0f870] [c0000000002e27b8]
> percpu_rwsem_wait+0x198/0x200
> [ 1477.612445] [  T8746] [c00000000cc0f8f0] [c000000001262930]
> __percpu_down_read+0xb0/0x210
> [ 1477.612449] [  T8746] [c00000000cc0f930] [c00000000022f400]
> cpus_read_lock+0xc0/0xd0
> [ 1477.612456] [  T8746] [c00000000cc0f950] [c0000000003a6398]
> cgroup_procs_write_start+0x328/0x410
> [ 1477.612462] [  T8746] [c00000000cc0fa00] [c0000000003a9620]
> __cgroup_procs_write+0x70/0x2c0
> [ 1477.612468] [  T8746] [c00000000cc0fac0] [c0000000003a98e8]
> cgroup_procs_write+0x28/0x50
> [ 1477.612473] [  T8746] [c00000000cc0faf0] [c0000000003a1624]
> cgroup_file_write+0xb4/0x240
> [ 1477.612478] [  T8746] [c00000000cc0fb50] [c000000000853ba8]
> kernfs_fop_write_iter+0x1a8/0x2a0
> [ 1477.612485] [  T8746] [c00000000cc0fba0] [c000000000733d5c]
> vfs_write+0x27c/0x540
> [ 1477.612491] [  T8746] [c00000000cc0fc50] [c000000000734350]
> ksys_write+0x80/0x150
> [ 1477.612495] [  T8746] [c00000000cc0fca0] [c000000000032898]
> system_call_exception+0x148/0x320
> [ 1477.612500] [  T8746] [c00000000cc0fe50] [c00000000000d6a0]
> system_call_common+0x160/0x2c4
> [ 1477.612506] [  T8746] ---- interrupt: c00 at 0x7fffa8f73df4
> [ 1477.612509] [  T8746] NIP: 00007fffa8f73df4 LR: 00007fffa8eb6144 CTR:
> 0000000000000000
> [ 1477.612512] [  T8746] REGS: c00000000cc0fe80 TRAP: 0c00 Tainted: G     
> E    (7.0.0-rc1-150700.51-default-dirty)
> [ 1477.612515] [  T8746] MSR: 800000000000d033 <SF,EE,PR,ME,IR,DR,RI,LE> CR:
> 28002288 XER: 00000000
> 
> 

Default timeout is set to 8 mins.

$ grep . /proc/sys/kernel/hung_task_timeout_secs
/proc/sys/kernel/hung_task_timeout_secs:480

Now that cpus_write_lock is taken once, and SMT mode switch can take
tens of minutes to complete and relinquish the lock, threads waiting on 
cpus_read_lock will be blocked for this entire duration.

Although there were no splats observed for "both patch applied" case
the issue still remains.

regards,
vishal

Re: [PATCH v3 2/2] cpuhp: Expedite RCU grace periods during SMT operations

Reply via email to