From: Shradha Gupta <shradhagu...@linux.microsoft.com> Sent: Thursday, May 1, 
2025 7:24 AM
> 
> On Thu, May 01, 2025 at 05:27:49AM +0000, Michael Kelley wrote:
> > From: Shradha Gupta <shradhagu...@linux.microsoft.com> Sent: Friday, April 
> > 25,
> 2025 3:55 AM
> > >
> > > Currently, the MANA driver allocates MSI-X vectors statically based on
> > > MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
> > > up allocating more vectors than it needs. This is because, by this time
> > > we do not have a HW channel and do not know how many IRQs should be
> > > allocated.
> > >
> > > To avoid this, we allocate 1 MSI-X vector during the creation of HWC and
> > > after getting the value supported by hardware, dynamically add the
> > > remaining MSI-X vectors.
> >
> > I have a top-level thought about the data structures used to manage a
> > dynamic number of MSI-X vectors. The current code allocates a fixed size
> > array of struct gdma_irq_context, with one entry in the array for each
> > MSI-X vector. To find the entry for a particular msi_index, the code can
> > just index into the array, which is nice and simple.
> >
> > The new code uses a linked list of struct gdma_irq_context entries, with
> > one entry in the list for each MSI-X vector.  In the dynamic case, you can
> > start with one entry in the list, and then add to the list however many
> > additional entries the hardware will support.
> >
> > But this additional linked list adds significant complexity to the code
> > because it must be linearly searched to find the entry for a particular
> > msi_index, and there's the messiness of putting entries onto the list
> > and taking them off.  A spin lock is required.  Etc., etc.
> >
> > Here's an intermediate approach that would be simpler. Allocate a fixed
> > size array of pointers to struct gdma_irq_context. The fixed size is the
> > maximum number of possible MSI-X vectors for the device, which I
> > think is MANA_MAX_NUM_QUEUES, or 64 (correct me if I'm wrong
> > about that). Allocate a new struct gdma_irq_context when needed,
> > but store the address in the array rather than adding it onto a list.
> > Code can then directly index into the array to access the entry.
> >
> > Some entries in the array will be unused (and "wasted") if the device
> > uses fewer MSI-X vector, but each unused entry is only 8 bytes. The
> > max space unused is fewer than 512 bytes (assuming 64 entries in
> > the array), which is neglible in the grand scheme of things. With the
> > simpler code, and not having the additional list entry embedded in
> > each struct gmda_irq_context, you'll get some of that space back
> > anyway.
> >
> > Maybe there's a reason for the list that I missed in my initial
> > review of the code. But if not, it sure seems like the code could
> > be simpler, and having some unused 8 bytes entries in the array
> > is worth the tradeoff for the simplicity.
> >
> > Michael
> 
> Hey  Michael,
> 
> Thanks for your inputs. We did think of this approach and in fact that
> was how this patch was implemented(fixed size array) in the v1 of our
> internal reviews.
> 
> However, it came up in those reviews that we want to move away
> from the 64(MANA_MAX_NUM_QUEUES) as a hard limit for some new
> requirements, atleast for the dynamic IRQ allocation path. And now the
> new limit for all hardening purposes would be num_online_cpus().
> 
> Using this limit and the fixed array size approach creates problems,
> especially in machines with high number of vCPUs. It would lead to
> quite a bit of memory/resource wastage.
> 
> Hence, we decided to go ahead with this design.
> 
> Regards,
> Shradha.

One other thought:  Did you look at using an xarray? See
https://www.kernel.org/doc/html/latest/core-api/xarray.html.
It has most of or all the properties you need to deal with
a variable number of entries, while handling all the locking
automatically. Entries can be accessed with just a simple
index value.

I don't have first-hand experience writing code using xarrays,
so I can't be sure that it would simplify things for MANA IRQ
allocation, but it seems to be a very appropriate abstraction
for this use case.

Michael



Reply via email to