vector: Move pr_warn() out of vector_lock

Thomas Gleixner Mon, 29 Mar 2021 05:43:35 -0700

Waiman,

On Sun, Mar 28 2021 at 20:52, Waiman Long wrote:
> It was found that the following circular locking dependency warning
> could happen in some systems:
>
> [  218.097878] ======================================================
> [  218.097879] WARNING: possible circular locking dependency detected
> [  218.097880] 4.18.0-228.el8.x86_64+debug #1 Not tainted


Reports have to be against latest mainline and not against the random
distro frankenkernel of the day. That's nothing new.

Plus I was asking you to provide a full splat to look at so this can be
discussed _upfront_. Oh well...

> [  218.097914] -> #2 (&irq_desc_lock_class){-.-.}:
> [  218.097917]        _raw_spin_lock_irqsave+0x48/0x81
> [  218.097918]        __irq_get_desc_lock+0xcf/0x140
> [  218.097919]        __dble_irq_nosync+0x6e/0x110

This function does not even exist in mainline and never existed...

> [  218.097967]
> [  218.097967] Chain exists of:
> [  218.097968]   console_oc_lock_class --> vector_lock
> [  218.097972]
> [  218.097973]  Possible unsafe locking scenario:
> [  218.097973]
> [  218.097974]        CPU0                    CPU1
> [  218.097975]        ----                    ----
> [  218.097975]   lock(vector_lock);
> [  218.097977]                                lock(&irq_desc_lock_class);
> [  218.097980]                                lock(vector_lock);
> [  218.097981]   lock(console_owner);
> [  218.097983]
> [  218.097984]  *** DEADLOCK ***
> [  218.097984]
> [  218.097985] 6 locks held by systemd/1:
> [  218.097986]  #0: ffff88822b5cc1e8 (&tty->legacy_mutex){+.+.}, at: 
> tty_init_dev+0x79/0x440
> [  218.097989]  #1: ffff88832ee00770 (&port->mutex){+.+.}, at: 
> tty_port_open+0x85/0x190
> [  218.097993]  #2: ffff88813be85a88 (&desc->request_mutex){+.+.}, at: 
> __setup_irq+0x249/0x1e60
> [  218.097996]  #3: ffff88813be858c0 (&irq_desc_lock_class){-.-.}, at: 
> __setup_irq+0x2d9/0x1e60
> [  218.098000]  #4: ffffffff84afca78 (vector_lock){-.-.}, at: 
> x86_vector_activate+0xca/0xab0
> [  218.098003]  #5: ffffffff84c27e20 (console_lock){+.+.}, at: 
> vprintk_emit+0x13a/0x450

This is a more fundamental problem than just vector lock and the same
problem exists with any other printk over serial which is nested in the
interrupt activation chain not only on X86.

> -static int activate_reserved(struct irq_data *irqd)
> +static int activate_reserved(struct irq_data *irqd, char *wbuf, size_t wsize)
>  {

...

>       if (!cpumask_subset(irq_data_get_effective_affinity_mask(irqd),
>                           irq_data_get_affinity_mask(irqd))) {
> -             pr_warn("irq %u: Affinity broken due to vector space 
> exhaustion.\n",
> -                     irqd->irq);
> +             snprintf(wbuf, wsize, KERN_WARNING
> +                      "irq %u: Affinity broken due to vector space 
> exhaustion.\n",
> +                      irqd->irq);

This is not really any more tasteful than the previous one and it does
not fix the fundamental underlying problem.

But, because I'm curious and printk is a constant source of trouble, I
just added unconditional pr_warns into those functions under vector_lock
on 5.12-rc5.

Still waiting for the lockdep splat to show up while enjoying the
trickle of printks over serial.

If you really think this is an upstream problem then please provide a
corresponding lockdep splat on plain 5.12-rc5 along with a .config and
the scenario which triggers this. Not less, not more.

Thanks,

        tglx

Re: [PATCH v2] x86/apic/vector: Move pr_warn() out of vector_lock

Reply via email to