On 2019/1/31 23:12, Marc Zyngier wrote: > Hi Zeng, > > On 31/01/2019 14:47, Zheng Xiang wrote: >> Hi Marc, >> >> On 2019/1/29 13:42, Zheng Xiang wrote: >>> On 2019/1/28 21:51, Marc Zyngier wrote: >>>> On 28/01/2019 07:13, Zheng Xiang wrote: >>>>> Hi Marc, >>>>> >>>>> Thanks for your review. >>>>> >>>>> On 2019/1/26 19:38, Marc Zyngier wrote: >>>>>> Hi Zheng, >>>>>> >>>>>> On Sat, 26 Jan 2019 06:16:24 +0000, >>>>>> Zheng Xiang <zhengxia...@huawei.com> wrote: >>>>>>> >>>>>>> Currently each PCI device under a PCI Bridge shares the same device id >>>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare >>>>>>> concurrently and they are both going to find and create their ITS >>>>>>> device. There is a chance that the later one couldn't find ITS device >>>>>>> before the other one creating the ITS device. It will cause the later >>>>>>> one to create a different ITS device even if they have the same >>>>>>> device_id. >>>>>> >>>>>> Interesting finding. Is this something you've actually seen in practice >>>>>> with two devices being probed in parallel? Or something that you found >>>>>> by inspection? >>>>> >>>>> Yes, I find this problem after analyzing the reason of VM hung. At last, I >>>>> find that the virtio-gpu cannot receive the MSI interrupts due to sharing >>>>> a same event_id as virtio-serial. >>>>> >>>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report. >>>>> >>>>> This problem can be reproducted with high probability by booting a >>>>> Qemu/KVM >>>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI Bridge >>>>> and also adding some delay before creating ITS device. >>>> >>>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful >>>> if I could reproduce it here (and would give me a way to check that it >>>> doesn't regress). >>> >> >> Have you reproduced it with my QEMU command line? >> >> If so, should I send a V2 patch with your suggestion? > > I've queued the following, much more complete patch: > > https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next&id=9791ec7df0e7b4d80706ccea8f24b6542f6059e9 > > Can you check that it works for you? I didn't manage to get the right > timing conditions, but I also had issues getting virtio-gpu running on > my TX2, so one might explain the other. >
It works for my case, but I worried about the below lines which may cause memory leak. @@ -2627,8 +2640,14 @@ static void its_irq_domain_free(struct irq_domain *domain, unsigned int virq, irq_domain_reset_irq_data(data); } - /* If all interrupts have been freed, start mopping the floor */ - if (bitmap_empty(its_dev->event_map.lpi_map, + mutex_lock(&its->dev_alloc_lock); + + /* + * If all interrupts have been freed, start mopping the + * floor. This is conditionned on the device not being shared. + */ + if (!its_dev->shared && + bitmap_empty(its_dev->event_map.lpi_map, its_dev->event_map.nr_lpis)) { its_lpi_free(its_dev->event_map.lpi_map, its_dev->event_map.lpi_base, It seems that the shared its_dev would never be freed since the value of its_dev->shared is always *true*. > Thanks, > > M. > -- Thanks, Xiang