> From: Bruce Richardson [mailto:bruce.richard...@intel.com] > Sent: Tuesday, 17 January 2023 14.59 > > On Tue, Jan 17, 2023 at 02:36:21PM +0100, Morten Brørup wrote: > > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > > Sent: Tuesday, 17 January 2023 14.04 > > > > > > On Tue, Jan 17, 2023 at 12:32:14PM +0100, Morten Brørup wrote: > > > > > From: Didier Pallard [mailto:didier.pall...@6wind.com] > > > > > Sent: Tuesday, 17 January 2023 11.17 > > > > > > > > > > Since DPDK 22.11 and below commit: > > > > > > > > > https://git.dpdk.org/dpdk/commit/?id=7dcd73e37965ba0bfa430efeac362fe183 > > > > > ed0ae2 > > > > > rte_cryptodev_socket_id() could return an incorrect value of > 255. > > > > > Problem has been seen during configuration of the qat device > > > > > on an Atom C3000 architecture. On this arch, PCI is not > depending > > > on > > > > > any numa socket, causing device numa_node to be equal to > > > SOCKET_ID_ANY. > > > > > > > > Disclaimer: I'm not up to speed with this topic or patch, so feel > > > free to ignore my comments here! I'm only speaking up because I > fear we > > > are increasing the risk of bugs here. But again, please bear with > me, > > > if I have totally misunderstood this! > > > > > > > > I was under the impression that single-socket systems used > socket_id > > > 0 as default. How can the PCI bus (or QAT device) not depend on any > > > socket? It must be connected somewhere. > > > > > > > > Doesn't assigning socket_id = -1 for devices (QAT or anything > else) > > > introduce a big risk of bugs, e.g. in comparisons? The special > > > socket_id value -1 should have only two meanings: 1) return value > > > "error", or 2) input value "any". Now it also can mean 3) > "unknown"? > > > How do comparison functions work for that... is "any" == "unknown"? > And > > > does searching for "0" match "unknown"? It might, or might not, but > > > searching for "any" does match "0". And how about searching for > > > "unknown", if such a value is propagate around in the system. > > > > > > > > And if we started considering socket_id == -1 valid with that > patch, > > > should the return type of rte_socket_id(void) be signed instead of > > > unsigned? > > > > > > > The issue here is that not all PCI endpoints connect directly to a > > > socket, > > > some connect to the chipset instead, and so do not have any numa > > > affinity. > > > That was the original meaning of the "-1" value, and it came about > from > > > an > > > era before we had on-die PCI endpoints. > > > > Thank you for elaborating, Bruce. Now I get it! > > > > Then it does make sense instantiating devices with socket_id = -1. > > > > A minor detail: SOCKET_ID_ANY is defined in > lib/eal/include/rte_memory.h [1]. Since it is being used for other > purposes than memory allocation, it could move to a more central > location. No good ideas from me, but perhaps memory and devices have an > appropriate header file in common. > > > > [1]: > https://elixir.bootlin.com/dpdk/latest/source/lib/eal/include/rte_memor > y.h#L38 > > > > And the multiple purposes of SOCKET_ID_ANY (e.g. what you just > described) could be mentioned as a comment with its definition. > > > > Agreed. > > > And I'm still worried about the risk of comparison bugs, e.g. when > requesting allocation of a device resource, you cannot specify a > preference for a device that is connected to the chipset, because the > SOCKET_ID_ANY in the allocation request would be interpreted as "any > socket" instead of "no socket". > > > > Although that might just be me worrying too much. ;-) > > > No, I think you may be on to something. I wonder if it would break a > lot of > things to define another magic constant for SOCKET_NONE (or > SOCKET_ID_NONE) > to cover the case where a device is not connected to a socket. That > woudl > allow SOCKET_ID_ANY to have a single meaning.
I hate polluting an API with workarounds like reusing SOCKET_ID_ANY for alternative purposes, so a roadmap for introducing SOCKET_ID_NONE as a pseudo numa node should be defined before the current workaround causes too much damage. E.g. here's a discussion [1] referring to an application using rte_eth_dev_socket_id() to improve performance. [1]: http://inbox.dpdk.org/dev/sj0pr11mb4783be2c718d03e3c447b7dd80...@sj0pr11mb4783.namprd11.prod.outlook.com/ PS: High level changes like this (support for devices connected to the chipset instead of a specific cpu socket) should go on the roadmap page.