Thomas Gleixner <t...@linutronix.de> writes: > On Tue, 5 Sep 2017, Huang, Ying wrote: > >> From: Huang Ying <ying.hu...@intel.com> >> >> When developing code to bootup some APs (Application CPUs) >> asynchronously, the following kernel panic is encountered. After >> checking the code, it is found that the irq_to_desc() may return NULL >> during CPU hotplug. So the NULL pointer checking is added to fix >> this. > > You forgot to describe why this can happen. "After checking the code" is > not really helpful for someone who looks at that commit. > > for_each_active_irq() is iterated with the sparse lock held in both cases > (cpu up and down). So if there is an active bit in the sparse map and the > the radix tree entry is empty then there is an inconsistency. The > inconsistency originates from the way the irq descriptor allocation/free is > implemented. The bitmap is set/cleared seperately from the actual pointer > store/remove in the radix tree: > > The allocation side: > > irq_sparse_lock(); > bitmap_set(); > irq_sparse_unlock(); > > desc = alloc(); > irq_sparse_lock(); > store_in_radix_tree(irq, desc); > irq_sparse_unlock(); > > The deallocation side: > > irq_sparse_lock(); > store_in_radix_tree(irq, NULL); > irq_sparse_unlock(); > > irq_sparse_lock(); > bitmap_clear(); > irq_sparse_unlock(); > > So the real question is, whether we keep it that way and have the extra > checks all over the place or simply extend the protected sections in the > alloc/free path. > > Untested patch below.
The below patch fixed the issue on my test system, Tested-by: "Huang, Ying" <ying.hu...@intel.com> Best Regards, Huang, Ying > Thanks, > > tglx > 8<---------------- > > kernel/irq/irqdesc.c | 17 ++++------------- > 1 file changed, 4 insertions(+), 13 deletions(-) > > Index: b/kernel/irq/irqdesc.c > =================================================================== > --- a/kernel/irq/irqdesc.c > +++ b/kernel/irq/irqdesc.c > @@ -421,10 +421,8 @@ static void free_desc(unsigned int irq) > * The sysfs entry must be serialized against a concurrent > * irq_sysfs_init() as well. > */ > - mutex_lock(&sparse_irq_lock); > kobject_del(&desc->kobj); > delete_irq_desc(irq); > - mutex_unlock(&sparse_irq_lock); > > /* > * We free the descriptor, masks and stat fields via RCU. That > @@ -462,20 +460,14 @@ static int alloc_descs(unsigned int star > desc = alloc_desc(start + i, node, flags, mask, owner); > if (!desc) > goto err; > - mutex_lock(&sparse_irq_lock); > irq_insert_desc(start + i, desc); > irq_sysfs_add(start + i, desc); > - mutex_unlock(&sparse_irq_lock); > } > return start; > > err: > for (i--; i >= 0; i--) > free_desc(start + i); > - > - mutex_lock(&sparse_irq_lock); > - bitmap_clear(allocated_irqs, start, cnt); > - mutex_unlock(&sparse_irq_lock); > return -ENOMEM; > } > > @@ -670,10 +662,10 @@ void irq_free_descs(unsigned int from, u > if (from >= nr_irqs || (from + cnt) > nr_irqs) > return; > > + mutex_lock(&sparse_irq_lock); > for (i = 0; i < cnt; i++) > free_desc(from + i); > > - mutex_lock(&sparse_irq_lock); > bitmap_clear(allocated_irqs, from, cnt); > mutex_unlock(&sparse_irq_lock); > } > @@ -727,10 +719,9 @@ int __ref > if (ret) > goto err; > } > - > - bitmap_set(allocated_irqs, start, cnt); > - mutex_unlock(&sparse_irq_lock); > - return alloc_descs(start, cnt, node, affinity, owner); > + ret = alloc_descs(start, cnt, node, affinity, owner); > + if (ret >= 0) > + bitmap_set(allocated_irqs, start, cnt); > > err: > mutex_unlock(&sparse_irq_lock);