On 7/19/2024 10:57 AM, Bruce Richardson wrote: > On Fri, Jul 19, 2024 at 09:59:50AM +0100, Ferruh Yigit wrote: >> On 7/11/2024 1:35 PM, Bruce Richardson wrote: >>> When allocating memory for an ethdev, the rte_malloc_socket call used >>> only allocates memory on the NUMA node/socket local to the device. This >>> means that even if the user wanted to, they could never use a remote NIC >>> without also having memory on that NIC's socket. >>> >>> For example, if we change examples/skeleton/basicfwd.c to have >>> SOCKET_ID_ANY as the socket_id parameter for Rx and Tx rings, we should >>> be able to run the app cross-numa e.g. as below, where the two PCI >>> devices are on socket 1, and core 1 is on socket 0: >>> >>> ./build/examples/dpdk-skeleton -l 1 --legacy-mem --socket-mem=1024,0 \ >>> -a a8:00.0 -a b8:00.0 >>> >>> This fails however, with the error: >>> >>> ETHDEV: failed to allocate private data >>> PCI_BUS: Requested device 0000:a8:00.0 cannot be used >>> >>> We can remove this restriction by doing a fallback call to general >>> rte_malloc after a call to rte_malloc_socket fails. This should be safe >>> to do because the later ethdev calls to setup Rx/Tx queues all take a >>> socket_id parameter, which can be used by applications to enforce the >>> requirement for local-only memory for a device, if so desired. [If >>> device-local memory is present it will be used as before, while if not >>> present the rte_eth_dev_configure call will now pass, but the subsequent >>> queue setup calls requesting local memory will fail]. >>> >>> Fixes: e489007a411c ("ethdev: add generic create/destroy ethdev APIs") >>> Fixes: dcd5c8112bc3 ("ethdev: add PCI driver helpers") >>> Cc: sta...@dpdk.org >>> >>> Signed-off-by: Bruce Richardson <bruce.richard...@intel.com> >>> Signed-off-by: Padraig Connolly <padraig.j.conno...@intel.com> >>> >> >> Hi Bruce, >> >> If device-local memory is present, behavior will be same, so I agree >> this is low impact. >> >> But for the case device-local memory is NOT present, should we enforce >> the HW setup is the question. This can be beneficial for users new to DPDK. >> > > No we should not do so, because if we do, there is no way for the user to > allow using remote memory - the probe/init and even configure call has NO > socket_id parameter in it, so the enforcement of local memory is an > internal assumption on the part of the API which is not documented > anywhere, and is not possible for the user to override. > >> Probably 'dev_private' on its own has small impact on the performance >> (although it may depend if these fields used in datapath), but it may be >> vehicle to enforce local memory. >> > > As I explain above in the cover letter, there are already other ways to > enforce local memory - we don't need another one. If the user only wants to > use local memory for a port, they can do so by setting the socket_id > parameter of the rx and tx queues. Enforcing local memory in probe > doesn't add anything to that, and just prevents other use-cases. > >> What is enabled by enabling app to run on cross-numa memory, since on a >> production I expect users would like to use device-local memory for >> performance reasons anyway? >> > > Mostly users want socket-local memory, but with increasing speeds of NICs > we are already seeing cases where users want to run cross-NUMA. For > example, a multi-port NIC may have some ports in use on each socket. >
Ack. >> >> Also I am not sure if this is a fix, or change of a intentional behavior. >> > > I suppose it can be viewed either way. However, for me this is a fix, > because right now it is impossible for many users to run their applications > with memory on a different socket to the ports. Nowhere is it documented in > DPDK that there is a hard restriction that ports must have local memory, so > any enforcement of such a policy is wrong. > Although it seems this is done intentionally in the API, I agree that this is not documented in the API, or this restriction is not part of the API definition. > Turning things the other way around - I can't see how anything will break > or even slow down with this patch applied. As I point out above, the user > can already enforce local memory by passing the required socket id when > allocating rx and tx rings - this patch only pushed the failure for > non-local memory a bit later in the initialization sequence, where the user > can actually specify the desired NUMA behaviour. Is there some > case I'm missing where you forsee this causing problems? > A new user may not know that allocating memory from cross-numa impacts performance negatively and have this configuration unintentionally, current failure enforces the ideal configuration. One option can be adding a warning log to the fallback case, saying that memory allocated from non-local socket and performance will be less. Although this message may not mean much to a new user, it may still help via a support engineer or internet search...