On 3/2/26 9:15 AM, Waiman Long wrote:
On 3/2/26 7:14 AM, Frederic Weisbecker wrote:
On Sat, Feb 21, 2026 at 01:54:18PM -0500, Waiman Long wrote:
The current cpuset partition code is able to dynamically update
the sched domains of a running system and the corresponding
HK_TYPE_DOMAIN housekeeping cpumask to perform what is essentally the
"isolcpus=domain,..." boot command line feature at run time.

The housekeeping cpumask update requires flushing a number of different
workqueues which may not be safe with cpus_read_lock() held as the
workqueue flushing code may acquire cpus_read_lock() or acquiring locks
which have locking dependency with cpus_read_lock() down the chain. Below
is an example of such circular locking problem.

   ======================================================
   WARNING: possible circular locking dependency detected
   6.18.0-test+ #2 Tainted: G S
   ------------------------------------------------------
   test_cpuset_prs/10971 is trying to acquire lock:
   ffff888112ba4958 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180

   but task is already holding lock:
   ffffffffae47f450 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130

   which lock already depends on the new lock.

   the existing dependency chain (in reverse order) is:
   -> #4 (cpuset_mutex){+.+.}-{4:4}:
   -> #3 (cpu_hotplug_lock){++++}-{0:0}:
   -> #2 (rtnl_mutex){+.+.}-{4:4}:
   -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
   -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}:

   Chain exists of:
     (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex
Which workqueue is involved here that holds rtnl_mutex?
Is this an existing problem or added test code?

Circular locking dependency here may not necessarily mean that rtnl_mutex is directly used in a work function.  However it can be used in a locking chain involving multiple parties that can result in a deadlock situation if they happen in the right order. So it is better safe that sorry even if the chance of this occurrence is minimal.

Below is the full lockdep splat, I didn't include the individual stack traces to make the commit log less verbose.

The rtnl_mutex is indeed involved in local_pci_probe().

Cheers,
Longman

[  909.360022] ======================================================
[  909.366208] WARNING: possible circular locking dependency detected
[  909.372387] 7.0.0-rc1-test+ #3 Tainted: G S
[  909.378044] ------------------------------------------------------
[  909.384225] test_cpuset_prs/8673 is trying to acquire lock:
[  909.389798] ffff8890b0fd6558 ((wq_completion)sync_wq){+.+.}-{0:0}, at: touch_wq_lockdep_map+0x7a/0x180
[  909.399114]
               but task is already holding lock:
[  909.404946] ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130
[  909.413562]
               which lock already depends on the new lock.

[  909.421733]
               the existing dependency chain (in reverse order) is:
[  909.429213]
               -> #4 (cpuset_mutex){+.+.}-{4:4}:
[  909.435056]        __lock_acquire+0x58c/0xbd0
[  909.439421]        lock_acquire.part.0+0xbd/0x260
[  909.444129]        __mutex_lock+0x1a7/0x1ba0
[  909.448411]        cpuset_css_online+0x59/0x410
[  909.452948]        online_css+0x9b/0x2d0
[  909.456877]        css_create+0x3c6/0x610
[  909.460895]        cgroup_apply_control_enable+0x2ff/0x460
[  909.466384]        cgroup_subtree_control_write+0x79a/0xc70
[  909.471963]        cgroup_file_write+0x1a5/0x680
[  909.476582]        kernfs_fop_write_iter+0x3df/0x5f0
[  909.481550]        vfs_write+0x525/0xfd0
[  909.485482]        ksys_write+0xf9/0x1d0
[  909.489410]        do_syscall_64+0x13a/0x1520
[  909.493778]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.499361]
               -> #3 (cpu_hotplug_lock){++++}-{0:0}:
[  909.505547]        __lock_acquire+0x58c/0xbd0
[  909.509914]        lock_acquire.part.0+0xbd/0x260
[  909.514630]        cpus_read_lock+0x40/0xe0
[  909.518824]        flush_all_backlogs+0x83/0x4b0
[  909.523451] unregister_netdevice_many_notify+0x7e8/0x1fa0
[  909.529465]        default_device_exit_batch+0x356/0x490
[  909.534788]        ops_undo_list+0x2f4/0x930
[  909.539067]        cleanup_net+0x40a/0x8f0
[  909.543168]        process_one_work+0xd8b/0x1320
[  909.547795]        worker_thread+0x5f3/0xfe0
[  909.552068]        kthread+0x36c/0x470
[  909.555830]        ret_from_fork+0x5dc/0x8e0
[  909.560109]        ret_from_fork_asm+0x1a/0x30
[  909.564557]
               -> #2 (rtnl_mutex){+.+.}-{4:4}:
[  909.570224]        __lock_acquire+0x58c/0xbd0
[  909.574592]        lock_acquire.part.0+0xbd/0x260
[  909.579304]        __mutex_lock+0x1a7/0x1ba0
[  909.583580]        rtnl_net_lock_killable+0x1e/0x70
[  909.588465]        register_netdev+0x40/0x70
[  909.592738]        i40e_vsi_setup+0x892/0x14b0 [i40e]
[  909.597854]        i40e_setup_pf_switch+0xaa1/0xe80 [i40e]
[  909.603392]        i40e_probe.cold+0xdb0/0x1d1b [i40e]
[  909.608582]        local_pci_probe+0xdb/0x180
[  909.612951]        local_pci_probe_callback+0x35/0x80
[  909.618008]        process_one_work+0xd8b/0x1320
[  909.622631]        worker_thread+0x5f3/0xfe0
[  909.626912]        kthread+0x36c/0x470
[  909.630673]        ret_from_fork+0x5dc/0x8e0
[  909.634951]        ret_from_fork_asm+0x1a/0x30
[  909.639399]
               -> #1 ((work_completion)(&arg.work)){+.+.}-{0:0}:
[  909.646627]        __lock_acquire+0x58c/0xbd0
[  909.650994]        lock_acquire.part.0+0xbd/0x260
[  909.655699]        process_one_work+0xd58/0x1320
[  909.660321]        worker_thread+0x5f3/0xfe0
[  909.664602]        kthread+0x36c/0x470
[  909.668363]        ret_from_fork+0x5dc/0x8e0
[  909.672641]        ret_from_fork_asm+0x1a/0x30
[  909.677089]
               -> #0 ((wq_completion)sync_wq){+.+.}-{0:0}:
[  909.683795]        check_prev_add+0xf1/0xc80
[  909.688068]        validate_chain+0x481/0x560
[  909.692431]        __lock_acquire+0x58c/0xbd0
[  909.696797]        lock_acquire.part.0+0xbd/0x260
[  909.701511]        touch_wq_lockdep_map+0x93/0x180
[  909.706314]        __flush_workqueue+0x111/0x10b0
[  909.711026]        housekeeping_update+0x12d/0x2d0
[  909.715819]        update_parent_effective_cpumask+0x595/0x2440
[  909.721747]        update_prstate+0x89d/0xce0
[  909.726105]        cpuset_partition_write+0xc5/0x130
[  909.731073]        cgroup_file_write+0x1a5/0x680
[  909.735701]        kernfs_fop_write_iter+0x3df/0x5f0
[  909.740664]        vfs_write+0x525/0xfd0
[  909.744592]        ksys_write+0xf9/0x1d0
[  909.748520]        do_syscall_64+0x13a/0x1520
[  909.752887]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.758465]
               other info that might help us debug this:

[  909.766466] Chain exists of:
                 (wq_completion)sync_wq --> cpu_hotplug_lock --> cpuset_mutex

[  909.777679]  Possible unsafe locking scenario:

[  909.783599]        CPU0                    CPU1
[  909.788130]        ----                    ----
[  909.792666]   lock(cpuset_mutex);
[  909.795991] lock(cpu_hotplug_lock);
[  909.802171]                                lock(cpuset_mutex);
[  909.808013]   lock((wq_completion)sync_wq);
[  909.812207]
                *** DEADLOCK ***

[  909.818127] 5 locks held by test_cpuset_prs/8673:
[  909.822830]  #0: ffff888140592440 (sb_writers#7){.+.+}-{0:0}, at: ksys_write+0xf9/0x1d0 [  909.830839]  #1: ffff889100a49890 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x260/0x5f0 [  909.839890]  #2: ffff8890fbfa5368 (kn->active#353){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x2b6/0x5f0 [  909.849118]  #3: ffffffffb9134d00 (cpu_hotplug_lock){++++}-{0:0}, at: cpuset_partition_write+0x77/0x130 [  909.858522]  #4: ffffffffb9741c10 (cpuset_mutex){+.+.}-{4:4}, at: cpuset_partition_write+0x85/0x130
[  909.867576]
               stack backtrace:
[  909.871940] CPU: 95 UID: 0 PID: 8673 Comm: test_cpuset_prs Kdump: loaded Tainted: G S                  7.0.0-rc1-test+ #3 PREEMPT(full)
[  909.871946] Tainted: [S]=CPU_OUT_OF_SPEC
[  909.871948] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.043020191705 04/30/2019
[  909.871950] Call Trace:
[  909.871952]  <TASK>
[  909.871955]  dump_stack_lvl+0x6f/0xb0
[  909.871961]  print_circular_bug.cold+0x38/0x45
[  909.871968]  check_noncircular+0x146/0x160
[  909.871975]  check_prev_add+0xf1/0xc80
[  909.871978]  ? alloc_chain_hlocks+0x13e/0x1d0
[  909.871982]  ? add_chain_cache+0x11c/0x300
[  909.871986]  validate_chain+0x481/0x560
[  909.871991]  __lock_acquire+0x58c/0xbd0
[  909.871995]  ? lockdep_init_map_type+0x66/0x250
[  909.872000]  lock_acquire.part.0+0xbd/0x260
[  909.872004]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872009]  ? rcu_is_watching+0x15/0xb0
[  909.872013]  ? trace_rcu_sr_normal+0x1d5/0x2e0
[  909.872018]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872021]  ? lock_acquire+0x159/0x180
[  909.872026]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872030]  touch_wq_lockdep_map+0x93/0x180
[  909.872034]  ? touch_wq_lockdep_map+0x7a/0x180
[  909.872038]  __flush_workqueue+0x111/0x10b0
[  909.872042]  ? local_clock_noinstr+0xd/0xe0
[  909.872049]  ? __pfx___flush_workqueue+0x10/0x10
[  909.872059]  housekeeping_update+0x12d/0x2d0
[  909.872063]  update_parent_effective_cpumask+0x595/0x2440
[  909.872070]  update_prstate+0x89d/0xce0
[  909.872076]  ? __pfx_update_prstate+0x10/0x10
[  909.872085]  cpuset_partition_write+0xc5/0x130
[  909.872089]  cgroup_file_write+0x1a5/0x680
[  909.872093]  ? __pfx_cgroup_file_write+0x10/0x10
[  909.872097]  ? kernfs_fop_write_iter+0x2b6/0x5f0
[  909.872102]  ? __pfx_cgroup_file_write+0x10/0x10
[  909.872105]  kernfs_fop_write_iter+0x3df/0x5f0
[  909.872109]  vfs_write+0x525/0xfd0
[  909.872113]  ? __pfx_vfs_write+0x10/0x10
[  909.872118]  ? __lock_acquire+0x58c/0xbd0
[  909.872124]  ? find_held_lock+0x32/0x90
[  909.872130]  ksys_write+0xf9/0x1d0
[  909.872133]  ? __pfx_ksys_write+0x10/0x10
[  909.872136]  ? lockdep_hardirqs_on+0x78/0x100
[  909.872141]  ? do_syscall_64+0xde/0x1520
[  909.872146]  do_syscall_64+0x13a/0x1520
[  909.872151]  ? rcu_is_watching+0x15/0xb0
[  909.872154]  ? entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.872157]  ? lockdep_hardirqs_on+0x78/0x100
[  909.872161]  ? do_syscall_64+0x212/0x1520
[  909.872166]  ? find_held_lock+0x32/0x90
[  909.872170]  ? local_clock_noinstr+0xd/0xe0
[  909.872174]  ? __lock_release.isra.0+0x1a2/0x2c0
[  909.872178]  ? exc_page_fault+0x78/0xf0
[  909.872183]  ? rcu_is_watching+0x15/0xb0
[  909.872186]  ? trace_irq_enable.constprop.0+0x194/0x200
[  909.872191]  ? lockdep_hardirqs_on_prepare.part.0+0x8e/0x170
[  909.872196]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  909.872199] RIP: 0033:0x7f877d3e9544
[  909.872203] Code: 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 80 3d a5 cb 0d 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 [  909.872206] RSP: 002b:00007ffd6ff21b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [  909.872210] RAX: ffffffffffffffda RBX: 00007f877d4bf5c0 RCX: 00007f877d3e9544 [  909.872213] RDX: 0000000000000009 RSI: 0000557ff7ec2320 RDI: 0000000000000001 [  909.872215] RBP: 0000000000000009 R08: 0000000000000073 R09: 00000000ffffffff [  909.872217] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000009 [  909.872219] R13: 0000557ff7ec2320 R14: 0000000000000009 R15: 00007f877d4bcf00
[  909.872226]  </TASK>


Reply via email to