Hi Dmitry, David Please find responses inline.
> -----Original Message----- > From: David Marchand <david.march...@redhat.com> > Sent: Thursday, October 21, 2021 7:03 PM > To: Dmitry Kozlyuk <dmitry.kozl...@gmail.com>; Harman Kalra > <hka...@marvell.com> > Cc: Stephen Hemminger <step...@networkplumber.org>; Thomas > Monjalon <tho...@monjalon.net>; dev@dpdk.org; Ray Kinsella > <m...@ashroe.eu> > Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v3 2/7] eal/interrupts: implement > get set APIs > > On Thu, Oct 21, 2021 at 2:33 PM Dmitry Kozlyuk <dmitry.kozl...@gmail.com> > wrote: > > > Hi All, > > > > > > I came across 2 issues introduced with auto detection mechanism. > > > 1. In case of primary secondary model. Primary application is > > > started which makes lots of allocations via > > > rte_malloc* > > > > > > Secondary side: > > > a. Secondary starts, in its "rte_eal_init()" it makes some > > > allocation via rte_*, and in one of the allocation request for heap expand > is made as current memseg got exhausted. (malloc_heap_alloc_on_heap_id > ()-> > > > alloc_more_mem_on_socket()->try_expand_heap()) > > > b. A request to primary for heap expand is sent. Please note > > > secondary holds the spinlock while making the request. > > > (malloc_heap_alloc_on_heap_id ()->rte_spinlock_lock(&(heap->lock));) > > > > > > Primary side: > > > a. Primary receives the request, install a new hugepage and setups up > the heap (handle_alloc_request()) > > > b. To inform all the secondaries about the new memseg, primary > > > sends a sync notice where it sets up an alarm (rte_mp_request_async ()- > >mp_request_async()). > > > c. Inside alarm setup API, we register an interrupt callback. > > > d. Inside rte_intr_callback_register(), a new interrupt instance > > > allocation > is requested for "src->intr_handle" > > > e. Since memory management is detected as up, inside > > > "rte_intr_instance_alloc()", call to "rte_zmalloc" for allocating > > > memory and further inside "malloc_heap_alloc_on_heap_id()", primary > will experience a deadlock while taking up the spinlock because this spinlock > is already hold by secondary. > > > > > > > > > 2. "eal_flags_file_prefix_autotest" is failing because the spawned > > > process by this tests are expected to cleanup their hugepage traces from > respective directories (eg /dev/hugepage). > > > a. Inside eal_cleanup, rte_free()->malloc_heap_free(), where element > > > to be freed is added to the free list and checked if nearby elements can > be joined together and form a big free chunk (malloc_elem_free()). > > > b. If this free chunk is big enough than the hugepage size, > > > respective hugepage can be uninstalled after making sure no > > > allocation from this hugepage exists. > > > (malloc_heap_free()->malloc_heap_free_pages()- > >eal_memalloc_free_seg > > > ()) > > > > > > But because of interrupt allocations made for pci intr handles (used > > > for VFIO) and other driver specific interrupt handles are not cleaned up > > > in > "rte_eal_cleanup()", these hugepage files are not removed and test fails. > > > > Sad to hear. But it's a great and thorough analysis. Sad but a good learning, atleast we identified areas to be worked upon. > > > > > There could be more such issues, I think we should firstly fix the DPDK. > > > 1. Memory management should be made independent and should be the > > > first thing to come up in rte_eal_init() > > > > As I have explained, buses must be able to report IOVA requirement at > > this point (`get_iommu_class()` bus method). > > Either `scan()` must complete before that or `get_iommu_class()` must > > be able to work before `scan()` is called. > > > > > 2. rte_eal_cleanup() should be exactly opposite to rte_eal_init(), > > > just like bus_probe, we should have bus_remove to clean up all the > memory allocations. > > > > Yes. For most buses it will be just "unplug each device". > > In fact, EAL could do it with `unplug()`, but it is not mandatory. I implemented a rough bus_remove which was similar to unplug, faced some issue. Not sure but some drivers might not be supporting hotplug, for them unplug might be a challenge. > > > > > > > > Regarding this IRQ series, I would like to fall back to our original > > > design i.e. rte_intr_instance_alloc() should take an argument whether its > memory should be allocated using glibc malloc or rte_malloc*. > > > > Seems there's no other option to make it on time. > > - Sorry, my memory is too short, did we describe where we need to share > rte_intr_handle objects? Intr handle objects are shared in very few drivers. > > I spent some time looking at uses of rte_intr_handle objects. > > In many cases intr_handle objects are referenced in malloc() objects. > The cases where rte_intr_handle are shared is in per device private bits in > drivers. > Yes, in V2 design I allocated memory using glibc malloc for such instances by passing respective flag. > A intr_handle often contains fds. > For them to be used in mp setups, there needs to be a big machinery with > SCM_RIGHTS but I see only 3 drivers which actually reference this. > So if intr_handle fds are accessed by multiple processes, their content > probably makes no sense wrt fds. Those drivers will allocate using SHARED flag. > > > From these two hints, I think we are going backwards, and the main usecase > is that those rte_intr_instance objects are not used in mp. > I even think they are never accessed from other processes. > But I am not sure. > > > - Seeing how time it short for rc1, I am ok with > rte_intr_instance_alloc() taking a flag argument. > And we can still go back on this API later. Sure, I will revert back to original design and send V5 by tomorrow. > > Can we agree on the flag name? > rte_malloc() interest is that it makes objects shared for mp, so how about > RTE_INTR_INSTANCE_F_SHARED ? Yeah, it sounds good: RTE_INTR_INSTANCE_F_SHARED - rte_malloc RTE_INTR_INSTANCE_F_PRIVATE - malloc Thanks David, Dmitry, Thomas, Stephan for reviewing the series thoroughly and providing inputs to improvise it. Thanks Harman > > > -- > David Marchand