On 7/19/2024 10:57 AM, Bruce Richardson wrote:
> On Fri, Jul 19, 2024 at 09:59:50AM +0100, Ferruh Yigit wrote:
>> On 7/11/2024 1:35 PM, Bruce Richardson wrote:
>>> When allocating memory for an ethdev, the rte_malloc_socket call used
>>> only allocates memory on the NUMA node/socket local to the device. This
>>> means that even if the user wanted to, they could never use a remote NIC
>>> without also having memory on that NIC's socket.
>>>
>>> For example, if we change examples/skeleton/basicfwd.c to have
>>> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should
>>> be able to run the app cross-numa e.g. as below, where the two PCI
>>> devices are on socket 1, and core 1 is on socket 0:
>>>
>>>  ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \
>>>             -a a8:00.0 -a b8:00.0
>>>
>>> This fails however, with the error:
>>>
>>>   ETHDEV: failed to allocate private data
>>>   PCI_BUS: Requested device 0000:a8:00.0 cannot be used
>>>
>>> We can remove this restriction by doing a fallback call to general
>>> rte_malloc after a call to rte_malloc_socket fails. This should be safe
>>> to do because the later ethdev calls to setup Rx/Tx queues all take a
>>> socket_id parameter, which can be used by applications to enforce the
>>> requirement for local-only memory for a device, if so desired. [If
>>> device-local memory is present it will be used as before, while if not
>>> present the rte_eth_dev_configure call will now pass, but the subsequent
>>> queue setup calls requesting local memory will fail].
>>>
>>> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs")
>>> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers")
>>> Cc: sta...@dpdk.org
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richard...@intel.com>
>>> Signed-off-by: Padraig Connolly <padraig.j.conno...@intel.com>
>>>
>>
>> Hi Bruce,
>>
>> If device-local memory is present, behavior will be same, so I agree
>> this is low impact.
>>
>> But for the case device-local memory is NOT present, should we enforce
>> the HW setup is the question. This can be beneficial for users new to DPDK.
>>
> 
> No we should not do so, because if we do, there is no way for the user to
> allow using remote memory - the probe/init and even configure call has NO
> socket_id parameter in it, so the enforcement of local memory is an
> internal assumption on the part of the API which is not documented
> anywhere, and is not possible for the user to override.
> 
>> Probably 'dev_private' on its own has small impact on the performance
>> (although it may depend if these fields used in datapath), but it may be
>> vehicle to enforce local memory.
>>
> 
> As I explain above in the cover letter, there are already other ways to
> enforce local memory - we don't need another one. If the user only wants to
> use local memory for a port, they can do so by setting the socket_id
> parameter of the rx and tx queues. Enforcing local memory in probe
> doesn't add anything to that, and just prevents other use-cases.
> 
>> What is enabled by enabling app to run on cross-numa memory, since on a
>> production I expect users would like to use device-local memory for
>> performance reasons anyway?
>>
> 
> Mostly users want socket-local memory, but with increasing speeds of NICs
> we are already seeing cases where users want to run cross-NUMA. For
> example, a multi-port NIC may have some ports in use on each socket.
> 

Ack.

>>
>> Also I am not sure if this is a fix, or change of a intentional behavior.
>>
> 
> I suppose it can be viewed either way. However, for me this is a fix,
> because right now it is impossible for many users to run their applications
> with memory on a different socket to the ports. Nowhere is it documented in
> DPDK that there is a hard restriction that ports must have local memory, so
> any enforcement of such a policy is wrong.
> 

Although it seems this is done intentionally in the API, I agree that
this is not documented in the API, or this restriction is not part of
the API definition.

> Turning things the other way around - I can't see how anything will break
> or even slow down with this patch applied. As I point out above, the user
> can already enforce local memory by passing the required socket id when
> allocating rx and tx rings - this patch only pushed the failure for
> non-local memory a bit later in the initialization sequence, where the user
> can actually specify the desired NUMA behaviour. Is there some
> case I'm missing where you forsee this causing problems?
> 

A new user may not know that allocating memory from cross-numa impacts
performance negatively and have this configuration unintentionally,
current failure enforces the ideal configuration.

One option can be adding a warning log to the fallback case, saying that
memory allocated from non-local socket and performance will be less.
Although this message may not mean much to a new user, it may still help
via a support engineer or internet search...

Reply via email to