On Tue, Jan 17, 2023 at 02:36:21PM +0100, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richard...@intel.com] > > Sent: Tuesday, 17 January 2023 14.04 > > > > On Tue, Jan 17, 2023 at 12:32:14PM +0100, Morten Brørup wrote: > > > > From: Didier Pallard [mailto:didier.pall...@6wind.com] > > > > Sent: Tuesday, 17 January 2023 11.17 > > > > > > > > Since DPDK 22.11 and below commit: > > > > > > https://git.dpdk.org/dpdk/commit/?id=7dcd73e37965ba0bfa430efeac362fe183 > > > > ed0ae2 > > > > rte_cryptodev_socket_id() could return an incorrect value of 255. > > > > Problem has been seen during configuration of the qat device > > > > on an Atom C3000 architecture. On this arch, PCI is not depending > > on > > > > any numa socket, causing device numa_node to be equal to > > SOCKET_ID_ANY. > > > > > > Disclaimer: I'm not up to speed with this topic or patch, so feel > > free to ignore my comments here! I'm only speaking up because I fear we > > are increasing the risk of bugs here. But again, please bear with me, > > if I have totally misunderstood this! > > > > > > I was under the impression that single-socket systems used socket_id > > 0 as default. How can the PCI bus (or QAT device) not depend on any > > socket? It must be connected somewhere. > > > > > > Doesn't assigning socket_id = -1 for devices (QAT or anything else) > > introduce a big risk of bugs, e.g. in comparisons? The special > > socket_id value -1 should have only two meanings: 1) return value > > "error", or 2) input value "any". Now it also can mean 3) "unknown"? > > How do comparison functions work for that... is "any" == "unknown"? And > > does searching for "0" match "unknown"? It might, or might not, but > > searching for "any" does match "0". And how about searching for > > "unknown", if such a value is propagate around in the system. > > > > > > And if we started considering socket_id == -1 valid with that patch, > > should the return type of rte_socket_id(void) be signed instead of > > unsigned? > > > > > The issue here is that not all PCI endpoints connect directly to a > > socket, > > some connect to the chipset instead, and so do not have any numa > > affinity. > > That was the original meaning of the "-1" value, and it came about from > > an > > era before we had on-die PCI endpoints. > > Thank you for elaborating, Bruce. Now I get it! > > Then it does make sense instantiating devices with socket_id = -1. > > A minor detail: SOCKET_ID_ANY is defined in lib/eal/include/rte_memory.h [1]. > Since it is being used for other purposes than memory allocation, it could > move to a more central location. No good ideas from me, but perhaps memory > and devices have an appropriate header file in common. > > [1]: > https://elixir.bootlin.com/dpdk/latest/source/lib/eal/include/rte_memory.h#L38 > > And the multiple purposes of SOCKET_ID_ANY (e.g. what you just described) > could be mentioned as a comment with its definition. >
Agreed. > And I'm still worried about the risk of comparison bugs, e.g. when requesting > allocation of a device resource, you cannot specify a preference for a device > that is connected to the chipset, because the SOCKET_ID_ANY in the allocation > request would be interpreted as "any socket" instead of "no socket". > > Although that might just be me worrying too much. ;-) > No, I think you may be on to something. I wonder if it would break a lot of things to define another magic constant for SOCKET_NONE (or SOCKET_ID_NONE) to cover the case where a device is not connected to a socket. That woudl allow SOCKET_ID_ANY to have a single meaning. /Bruce