On Wed, 2015-06-17 at 13:52 +0200, Joerg Roedeljoro wrote: > On Wed, Jun 17, 2015 at 10:42:49AM +0000, 范冬冬 wrote: > > Hi maintainer, > > > > We found a problem that a panic happen when cpu was hot-removed. We also > > trace the problem according to the calltrace information. > > An endless loop happen because value head is not equal to value tail > > forever in the function qi_check_fault( ). > > The location code is as follows: > > > > > > do { > > if (qi->desc_status[head] == QI_IN_USE) > > qi->desc_status[head] = QI_ABORT; > > head = (head - 2 + QI_LENGTH) % QI_LENGTH; > > } while (head != tail); > > Hmm, this code interates only over every second QI descriptor, and tail > probably points to a descriptor that is not iterated over. > > Jiang, can you please have a look?
I think that part is normal, the way we use the queue is to always submit a work operation followed by a wait operation so that we can determine the work operation is complete. That's done via qi_submit_sync(). We have had spurious reports of the queue getting impossibly out of sync though. I saw one that was somehow linked to the I/O AT DMA engine. Roland Dreier saw something similar[1]. I'm not sure if they're related to this, but maybe worth comparing. Thanks, Alex [1] http://lists.linuxfoundation.org/pipermail/iommu/2015-January/011502.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/