On 2019/2/1 17:28, Marc Zyngier wrote: > On 01/02/2019 06:41, Zheng Xiang wrote: >> >> On 2019/1/31 23:12, Marc Zyngier wrote: >>> Hi Zeng, >>> >>> On 31/01/2019 14:47, Zheng Xiang wrote: >>>> Hi Marc, >>>> >>>> On 2019/1/29 13:42, Zheng Xiang wrote: >>>>> On 2019/1/28 21:51, Marc Zyngier wrote: >>>>>> On 28/01/2019 07:13, Zheng Xiang wrote: >>>>>>> Hi Marc, >>>>>>> >>>>>>> Thanks for your review. >>>>>>> >>>>>>> On 2019/1/26 19:38, Marc Zyngier wrote: >>>>>>>> Hi Zheng, >>>>>>>> >>>>>>>> On Sat, 26 Jan 2019 06:16:24 +0000, >>>>>>>> Zheng Xiang <zhengxia...@huawei.com> wrote: >>>>>>>>> >>>>>>>>> Currently each PCI device under a PCI Bridge shares the same device id >>>>>>>>> and ITS device. Assume there are two PCI devices call its_msi_prepare >>>>>>>>> concurrently and they are both going to find and create their ITS >>>>>>>>> device. There is a chance that the later one couldn't find ITS device >>>>>>>>> before the other one creating the ITS device. It will cause the later >>>>>>>>> one to create a different ITS device even if they have the same >>>>>>>>> device_id. >>>>>>>> >>>>>>>> Interesting finding. Is this something you've actually seen in practice >>>>>>>> with two devices being probed in parallel? Or something that you found >>>>>>>> by inspection? >>>>>>> >>>>>>> Yes, I find this problem after analyzing the reason of VM hung. At >>>>>>> last, I >>>>>>> find that the virtio-gpu cannot receive the MSI interrupts due to >>>>>>> sharing >>>>>>> a same event_id as virtio-serial. >>>>>>> >>>>>>> See https://lkml.org/lkml/2019/1/10/299 for the bug report. >>>>>>> >>>>>>> This problem can be reproducted with high probability by booting a >>>>>>> Qemu/KVM >>>>>>> VM with a virtio-serial controller and a virtio-gpu adding to a PCI >>>>>>> Bridge >>>>>>> and also adding some delay before creating ITS device. >>>>>> >>>>>> Fair enough. Do you mind sharing your QEMU command line? It'd be useful >>>>>> if I could reproduce it here (and would give me a way to check that it >>>>>> doesn't regress). >>>>> >>>> >>>> Have you reproduced it with my QEMU command line? >>>> >>>> If so, should I send a V2 patch with your suggestion? >>> >>> I've queued the following, much more complete patch: >>> >>> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/commit/?h=irq/irqchip-next&id=9791ec7df0e7b4d80706ccea8f24b6542f6059e9 >>> >>> Can you check that it works for you? I didn't manage to get the right >>> timing conditions, but I also had issues getting virtio-gpu running on >>> my TX2, so one might explain the other. >>> >> >> It works for my case, but I worried about the below lines which may >> cause memory leak. >> >> @@ -2627,8 +2640,14 @@ static void its_irq_domain_free(struct irq_domain >> *domain, unsigned int virq, >> irq_domain_reset_irq_data(data); >> } >> >> - /* If all interrupts have been freed, start mopping the floor */ >> - if (bitmap_empty(its_dev->event_map.lpi_map, >> + mutex_lock(&its->dev_alloc_lock); >> + >> + /* >> + * If all interrupts have been freed, start mopping the >> + * floor. This is conditionned on the device not being shared. >> + */ >> + if (!its_dev->shared && >> + bitmap_empty(its_dev->event_map.lpi_map, >> its_dev->event_map.nr_lpis)) { >> its_lpi_free(its_dev->event_map.lpi_map, >> its_dev->event_map.lpi_base, >> >> It seems that the shared its_dev would never be freed since the value of >> its_dev->shared is always *true*. > > Yes, and that is on purpose. As we don't refcount the number of > interrupts that have been requested in the prepare phase, there is a > race between free and alloc. We can have the following situation: > > CPU0: CPU1: > > msi_prepare: > mutex_lock() > find device() > -> found > store its_dev > mutex_unlock() > > its_irq_domain_free: > mutex_lock() > free_device() > mutex_unlock() > > its_irq_domain_alloc: > use its_dev -> boom. > > > So the trick is not to free the its_dev structure if it shares a devid. > It is not really a leak, as the next device sharing the same devid will > pick up the same structure. > > Does it make sense?
Yes, Thanks a lot! -- Thanks, Xiang