Hi Dmitry, David

Please find responses inline.

> -----Original Message-----
> From: David Marchand <david.march...@redhat.com>
> Sent: Thursday, October 21, 2021 7:03 PM
> To: Dmitry Kozlyuk <dmitry.kozl...@gmail.com>; Harman Kalra
> <hka...@marvell.com>
> Cc: Stephen Hemminger <step...@networkplumber.org>; Thomas
> Monjalon <tho...@monjalon.net>; dev@dpdk.org; Ray Kinsella
> <m...@ashroe.eu>
> Subject: Re: [EXT] Re: [dpdk-dev] [PATCH v3 2/7] eal/interrupts: implement
> get set APIs
> 
> On Thu, Oct 21, 2021 at 2:33 PM Dmitry Kozlyuk <dmitry.kozl...@gmail.com>
> wrote:
> > > Hi All,
> > >
> > > I came across 2 issues introduced with auto detection mechanism.
> > > 1. In case of primary secondary model.  Primary application is
> > > started which makes lots of allocations via
> > > rte_malloc*
> > >
> > >     Secondary side:
> > >     a. Secondary starts, in its "rte_eal_init()" it makes some
> > > allocation via rte_*, and in one of the allocation request for heap expand
> is made as current memseg got exhausted. (malloc_heap_alloc_on_heap_id
> ()->
> > >    alloc_more_mem_on_socket()->try_expand_heap())
> > >    b. A request to primary for heap expand is sent. Please note
> > > secondary holds the spinlock while making the request.
> > > (malloc_heap_alloc_on_heap_id ()->rte_spinlock_lock(&(heap->lock));)
> > >
> > >    Primary side:
> > >    a. Primary receives the request, install a new hugepage and setups up
> the heap (handle_alloc_request())
> > >    b. To inform all the secondaries about the new memseg, primary
> > > sends a sync notice where it sets up an alarm (rte_mp_request_async ()-
> >mp_request_async()).
> > >    c. Inside alarm setup API, we register an interrupt callback.
> > >    d. Inside rte_intr_callback_register(), a new interrupt instance 
> > > allocation
> is requested for "src->intr_handle"
> > >    e. Since memory management is detected as up, inside
> > > "rte_intr_instance_alloc()", call to "rte_zmalloc" for allocating
> > > memory and further inside "malloc_heap_alloc_on_heap_id()", primary
> will experience a deadlock while taking up the spinlock because this spinlock
> is already hold by secondary.
> > >
> > >
> > > 2. "eal_flags_file_prefix_autotest" is failing because the spawned
> > > process by this tests are expected to cleanup their hugepage traces from
> respective directories (eg /dev/hugepage).
> > > a. Inside eal_cleanup, rte_free()->malloc_heap_free(), where element
> > > to be freed is added to the free list and checked if nearby elements can
> be joined together and form a big free chunk (malloc_elem_free()).
> > > b. If this free chunk is big enough than the hugepage size,
> > > respective hugepage can be uninstalled after making sure no
> > > allocation from this hugepage exists.
> > > (malloc_heap_free()->malloc_heap_free_pages()-
> >eal_memalloc_free_seg
> > > ())
> > >
> > > But because of interrupt allocations made for pci intr handles (used
> > > for VFIO) and other driver specific interrupt handles are not cleaned up 
> > > in
> "rte_eal_cleanup()", these hugepage files are not removed and test fails.
> >
> > Sad to hear. But it's a great and thorough analysis.

Sad but a good learning, atleast we identified areas to be worked upon.

> >
> > > There could be more such issues, I think we should firstly fix the DPDK.
> > > 1. Memory management should be made independent and should be the
> > > first thing to come up in rte_eal_init()
> >
> > As I have explained, buses must be able to report IOVA requirement at
> > this point (`get_iommu_class()` bus method).
> > Either `scan()` must complete before that or `get_iommu_class()` must
> > be able to work before `scan()` is called.
> >
> > > 2. rte_eal_cleanup() should be exactly opposite to rte_eal_init(),
> > > just like bus_probe, we should have bus_remove to clean up all the
> memory allocations.
> >
> > Yes. For most buses it will be just "unplug each device".
> > In fact, EAL could do it with `unplug()`, but it is not mandatory.

I implemented a rough bus_remove which was similar to unplug, faced
some issue. Not sure but some drivers might not be supporting hotplug, for
them unplug might be a challenge.


> >
> > >
> > > Regarding this IRQ series, I would like to fall back to our original
> > > design i.e. rte_intr_instance_alloc() should take an argument whether its
> memory should be allocated using glibc malloc or rte_malloc*.
> >
> > Seems there's no other option to make it on time.
> 
> - Sorry, my memory is too short, did we describe where we need to share
> rte_intr_handle objects?

Intr handle objects are shared in very few drivers.

> 
> I spent some time looking at uses of rte_intr_handle objects.
> 
> In many cases intr_handle objects are referenced in malloc() objects.
> The cases where rte_intr_handle are shared is in per device private bits in
> drivers.
> 

Yes, in V2 design I allocated memory using glibc malloc for such instances by
passing respective flag.

> A intr_handle often contains fds.
> For them to be used in mp setups, there needs to be a big machinery with
> SCM_RIGHTS but I see only 3 drivers which actually reference this.
> So if intr_handle fds are accessed by multiple processes, their content
> probably makes no sense wrt fds.

Those drivers will allocate using SHARED flag.

> 
> 
> From these two hints, I think we are going backwards, and the main usecase
> is that those rte_intr_instance objects are not used in mp.
> I even think they are never accessed from other processes.
> But I am not sure.
> 
> 
> - Seeing how time it short for rc1, I am ok with
> rte_intr_instance_alloc() taking a flag argument.
> And we can still go back on this API later.


Sure, I will revert back to original design and send V5 by tomorrow.

> 
> Can we agree on the flag name?
> rte_malloc() interest is that it makes objects shared for mp, so how about
> RTE_INTR_INSTANCE_F_SHARED ?

Yeah, it sounds good:
RTE_INTR_INSTANCE_F_SHARED  - rte_malloc
RTE_INTR_INSTANCE_F_PRIVATE - malloc


Thanks David, Dmitry, Thomas, Stephan for reviewing the series thoroughly and 
providing
inputs to improvise it.


Thanks
Harman

> 
> 
> --
> David Marchand

Reply via email to