cpuset: Call housekeeping_update() without holding cpus_read_lock

Waiman Long Mon, 02 Mar 2026 07:44:44 -0800

On 3/2/26 9:15 AM, Waiman Long wrote:

On 3/2/26 7:14 AM, Frederic Weisbecker wrote:

On Sat, Feb 21, 2026 at 01:54:18PM -0500, Waiman Long wrote:

The current cpuset partition code is able to dynamically update
the sched domains of a running system and the corresponding
HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentally the
"isolcpus=domain,..." boot command line feature at run time.


The housekeeping cpumask update requires flushing a number of different
workqueues which may not be safe with cpus_read_lock() held as the
workqueue flushing code may acquire cpus_read_lock() or acquiring locks

which have locking dependency with cpus_read_lock() down the chain.Below

is an example of such circular locking problem.

   ======================================================
   WARNING: possible circular locking dependency detected
   6.18.0-test+ #2 Tainted: G S
   ------------------------------------------------------
   test_cpuset_prs/10971 is trying to acquire lock:

ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at:touch_wq_lockdep_map+0x7a/0x180


   but task is already holding lock:

ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at:cpuset_partition_write+0x85/0x130


   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:
   -> #4 (cpuset_mutex){+.+.}-{4:4}:
   -> #3 (cpu_hotplug_lock){++++}-{0:0}:
   -> #2 (rtnl_mutex){+.+.}-{4:4}:
   -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
   -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}:

   Chain exists of:
     (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex

Which workqueue is involved here that holds rtnl_mutex?
Is this an existing problem or added test code?

Circular locking dependency here may not necessarily mean thatrtnl_mutex is directly used in a work function. However it can beused in a locking chain involving multiple parties that can result ina deadlock situation if they happen in the right order. So it isbetter safe that sorry even if the chance of this occurrence is minimal.

Below is the full lockdep splat, I didn't include the individual stacktraces to make the commit log less verbose.


The rtnl_mutex is indeed involved in local_pci_probe().

Cheers,
Longman

[  909.360022] ======================================================
[  909.366208] WARNING: possible circular locking dependency detected
[  909.372387] 7.0.0-rc1-test+ #3 Tainted: G S
[  909.378044] ------------------------------------------------------
[  909.384225] test_cpuset_prs/8673 is trying to acquire lock:

[ 909.389798] ffff8890b0fd6558 ((wq_completion)sync_wq){+.+.}-{0:0},at: touch_wq_lockdep_map+0x7a/0x180

[  909.399114]
               but task is already holding lock:

[ 909.404946] ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at:cpuset_partition_write+0x85/0x130

[  909.413562]
               which lock already depends on the new lock.

[  909.421733]
               the existing dependency chain (in reverse order) is:
[  909.429213]
               -> #4 (cpuset_mutex){+.+.}-{4:4}:
[  909.435056]        __lock_acquire+0x58c/0xbd0
[  909.439421]        lock_acquire.part.0+0xbd/0x260
[  909.444129]        __mutex_lock+0x1a7/0x1ba0
[  909.448411]        cpuset_css_online+0x59/0x410
[  909.452948]        online_css+0x9b/0x2d0
[  909.456877]        css_create+0x3c6/0x610
[  909.460895]        cgroup_apply_control_enable+0x2ff/0x460
[  909.466384]        cgroup_subtree_control_write+0x79a/0xc70
[  909.471963]        cgroup_file_write+0x1a5/0x680
[  909.476582]        kernfs_fop_write_iter+0x3df/0x5f0
[  909.481550]        vfs_write+0x525/0xfd0
[  909.485482]        ksys_write+0xf9/0x1d0
[  909.489410]        do_syscall_64+0x13a/0x1520
[  909.493778]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.499361]
               -> #3 (cpu_hotplug_lock){++++}-{0:0}:
[  909.505547]        __lock_acquire+0x58c/0xbd0
[  909.509914]        lock_acquire.part.0+0xbd/0x260
[  909.514630]        cpus_read_lock+0x40/0xe0
[  909.518824]        flush_all_backlogs+0x83/0x4b0
[  909.523451] unregister_netdevice_many_notify+0x7e8/0x1fa0
[  909.529465]        default_device_exit_batch+0x356/0x490
[  909.534788]        ops_undo_list+0x2f4/0x930
[  909.539067]        cleanup_net+0x40a/0x8f0
[  909.543168]        process_one_work+0xd8b/0x1320
[  909.547795]        worker_thread+0x5f3/0xfe0
[  909.552068]        kthread+0x36c/0x470
[  909.555830]        ret_from_fork+0x5dc/0x8e0
[  909.560109]        ret_from_fork_asm+0x1a/0x30
[  909.564557]
               -> #2 (rtnl_mutex){+.+.}-{4:4}:
[  909.570224]        __lock_acquire+0x58c/0xbd0
[  909.574592]        lock_acquire.part.0+0xbd/0x260
[  909.579304]        __mutex_lock+0x1a7/0x1ba0
[  909.583580]        rtnl_net_lock_killable+0x1e/0x70
[  909.588465]        register_netdev+0x40/0x70
[  909.592738]        i40e_vsi_setup+0x892/0x14b0 [i40e]
[  909.597854]        i40e_setup_pf_switch+0xaa1/0xe80 [i40e]
[  909.603392]        i40e_probe.cold+0xdb0/0x1d1b [i40e]
[  909.608582]        local_pci_probe+0xdb/0x180
[  909.612951]        local_pci_probe_callback+0x35/0x80
[  909.618008]        process_one_work+0xd8b/0x1320
[  909.622631]        worker_thread+0x5f3/0xfe0
[  909.626912]        kthread+0x36c/0x470
[  909.630673]        ret_from_fork+0x5dc/0x8e0
[  909.634951]        ret_from_fork_asm+0x1a/0x30
[  909.639399]
               -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
[  909.646627]        __lock_acquire+0x58c/0xbd0
[  909.650994]        lock_acquire.part.0+0xbd/0x260
[  909.655699]        process_one_work+0xd58/0x1320
[  909.660321]        worker_thread+0x5f3/0xfe0
[  909.664602]        kthread+0x36c/0x470
[  909.668363]        ret_from_fork+0x5dc/0x8e0
[  909.672641]        ret_from_fork_asm+0x1a/0x30
[  909.677089]
               -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}:
[  909.683795]        check_prev_add+0xf1/0xc80
[  909.688068]        validate_chain+0x481/0x560
[  909.692431]        __lock_acquire+0x58c/0xbd0
[  909.696797]        lock_acquire.part.0+0xbd/0x260
[  909.701511]        touch_wq_lockdep_map+0x93/0x180
[  909.706314]        __flush_workqueue+0x111/0x10b0
[  909.711026]        housekeeping_update+0x12d/0x2d0
[  909.715819]        update_parent_effective_cpumask+0x595/0x2440
[  909.721747]        update_prstate+0x89d/0xce0
[  909.726105]        cpuset_partition_write+0xc5/0x130
[  909.731073]        cgroup_file_write+0x1a5/0x680
[  909.735701]        kernfs_fop_write_iter+0x3df/0x5f0
[  909.740664]        vfs_write+0x525/0xfd0
[  909.744592]        ksys_write+0xf9/0x1d0
[  909.748520]        do_syscall_64+0x13a/0x1520
[  909.752887]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.758465]
               other info that might help us debug this:

[  909.766466] Chain exists of:

(wq_completion)sync_wq --> cpu_hotplug_lock -->cpuset_mutex


[  909.777679]  Possible unsafe locking scenario:

[  909.783599]        CPU0                    CPU1
[  909.788130]        ----                    ----
[  909.792666]   lock(cpuset_mutex);
[  909.795991] lock(cpu_hotplug_lock);
[  909.802171]                                lock(cpuset_mutex);
[  909.808013]   lock((wq_completion)sync_wq);
[  909.812207]
                *** DEADLOCK ***

[  909.818127] 5 locks held by test_cpuset_prs/8673:

[ 909.822830] #0: ffff888140592440 (sb_writers#7){.+.+}-{0:0}, at:ksys_write+0xf9/0x1d0[ 909.830839] #1: ffff889100a49890 (&of->mutex#2){+.+.}-{4:4}, at:kernfs_fop_write_iter+0x260/0x5f0[ 909.839890] #2: ffff8890fbfa5368 (kn->active#353){.+.+}-{0:0}, at:kernfs_fop_write_iter+0x2b6/0x5f0[ 909.849118] #3: ffffffffb9134d00 (cpu_hotplug_lock){++++}-{0:0}, at:cpuset_partition_write+0x77/0x130[ 909.858522] #4: ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at:cpuset_partition_write+0x85/0x130

[  909.867576]
               stack backtrace:

[ 909.871940] CPU: 95 UID: 0 PID: 8673 Comm: test_cpuset_prs Kdump:loaded Tainted: G S 7.0.0-rc1-test+ #3 PREEMPT(full)

[  909.871946] Tainted: [S]=CPU_OUT_OF_SPEC

[ 909.871948] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOSSE5C620.86B.0X.02.0001.043020191705 04/30/2019

[  909.871950] Call Trace:
[  909.871952]  <TASK>
[  909.871955]  dump_stack_lvl+0x6f/0xb0
[  909.871961]  print_circular_bug.cold+0x38/0x45
[  909.871968]  check_noncircular+0x146/0x160
[  909.871975]  check_prev_add+0xf1/0xc80
[  909.871978]  ? alloc_chain_hlocks+0x13e/0x1d0
[  909.871982]  ? add_chain_cache+0x11c/0x300
[  909.871986]  validate_chain+0x481/0x560
[  909.871991]  __lock_acquire+0x58c/0xbd0
[  909.871995]  ? lockdep_init_map_type+0x66/0x250
[  909.872000]  lock_acquire.part.0+0xbd/0x260
[  909.872004]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872009]  ? rcu_is_watching+0x15/0xb0
[  909.872013]  ? trace_rcu_sr_normal+0x1d5/0x2e0
[  909.872018]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872021]  ? lock_acquire+0x159/0x180
[  909.872026]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872030]  touch_wq_lockdep_map+0x93/0x180
[  909.872034]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872038]  __flush_workqueue+0x111/0x10b0
[  909.872042]  ? local_clock_noinstr+0xd/0xe0
[  909.872049]  ? __pfx___flush_workqueue+0x10/0x10
[  909.872059]  housekeeping_update+0x12d/0x2d0
[  909.872063]  update_parent_effective_cpumask+0x595/0x2440
[  909.872070]  update_prstate+0x89d/0xce0
[  909.872076]  ? __pfx_update_prstate+0x10/0x10
[  909.872085]  cpuset_partition_write+0xc5/0x130
[  909.872089]  cgroup_file_write+0x1a5/0x680
[  909.872093]  ? __pfx_cgroup_file_write+0x10/0x10
[  909.872097]  ? kernfs_fop_write_iter+0x2b6/0x5f0
[  909.872102]  ? __pfx_cgroup_file_write+0x10/0x10
[  909.872105]  kernfs_fop_write_iter+0x3df/0x5f0
[  909.872109]  vfs_write+0x525/0xfd0
[  909.872113]  ? __pfx_vfs_write+0x10/0x10
[  909.872118]  ? __lock_acquire+0x58c/0xbd0
[  909.872124]  ? find_held_lock+0x32/0x90
[  909.872130]  ksys_write+0xf9/0x1d0
[  909.872133]  ? __pfx_ksys_write+0x10/0x10
[  909.872136]  ? lockdep_hardirqs_on+0x78/0x100
[  909.872141]  ? do_syscall_64+0xde/0x1520
[  909.872146]  do_syscall_64+0x13a/0x1520
[  909.872151]  ? rcu_is_watching+0x15/0xb0
[  909.872154]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.872157]  ? lockdep_hardirqs_on+0x78/0x100
[  909.872161]  ? do_syscall_64+0x212/0x1520
[  909.872166]  ? find_held_lock+0x32/0x90
[  909.872170]  ? local_clock_noinstr+0xd/0xe0
[  909.872174]  ? __lock_release.isra.0+0x1a2/0x2c0
[  909.872178]  ? exc_page_fault+0x78/0xf0
[  909.872183]  ? rcu_is_watching+0x15/0xb0
[  909.872186]  ? trace_irq_enable.constprop.0+0x194/0x200
[  909.872191]  ? lockdep_hardirqs_on_prepare.part.0+0x8e/0x170
[  909.872196]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.872199] RIP: 0033:0x7f877d3e9544

[ 909.872203] Code: 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 0000 0f 1f 40 00 f3 0f 1e fa 80 3d a5 cb 0d 00 00 74 13 b8 01 00 00 00 0f05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48[ 909.872206] RSP: 002b:00007ffd6ff21b28 EFLAGS: 00000202 ORIG_RAX:0000000000000001[ 909.872210] RAX: ffffffffffffffda RBX: 00007f877d4bf5c0 RCX:00007f877d3e9544[ 909.872213] RDX: 0000000000000009 RSI: 0000557ff7ec2320 RDI:0000000000000001[ 909.872215] RBP: 0000000000000009 R08: 0000000000000073 R09:00000000ffffffff[ 909.872217] R10: 0000000000000000 R11: 0000000000000202 R12:0000000000000009[ 909.872219] R13: 0000557ff7ec2320 R14: 0000000000000009 R15:00007f877d4bcf00

[  909.872226]  </TASK>

Re: [PATCH v6 8/8] cgroup/cpuset: Call housekeeping_update() without holding cpus_read_lock

Reply via email to