On Sun, Jul 21, 2024 at 11:56:08PM +0100, Ferruh Yigit wrote: > On 7/19/2024 5:10 PM, Bruce Richardson wrote: > > On Fri, Jul 19, 2024 at 04:31:11PM +0100, Ferruh Yigit wrote: > >> On 7/19/2024 2:22 PM, Bruce Richardson wrote: > >>> On Fri, Jul 19, 2024 at 12:10:24PM +0100, Ferruh Yigit wrote: > >>>> One option can be adding a warning log to the fallback case, saying that > >>>> memory allocated from non-local socket and performance will be less. > >>>> Although this message may not mean much to a new user, it may still help > >>>> via a support engineer or internet search... > >>>> > >>> > >>> Yes, we could add a warning, but that is probably better in the app > >>> itself. > >>> Thinking about where we get issues, they primarily stem from running the > >>> cores on a different numa node Since the private_data struct is accessed > >>> by cores not devices, any perf degradation will come from having it remote > >>> to the cores. Because of that, I actually think the original > >>> implementation > >>> should really have used "rte_socket_id()" rather than the device socket id > >>> for the allocation. > >>> > >> > >> Yes I guess private_data is not accessed by device, but it may be > >> accessed by cores that is running the datapath. > >> > >> This API may be called by core that is not involved to the datapath, so > >> it may not correct to allocate memory from its numa. > >> > >> Will it be wrong to assume that cores used for datapath will be same > >> numa with device, if so allocating memory from that numa (which device > >> is in) makes more sense. Am I missing something? > >> > > > > It depends on which you think is more likely for the polling cores: > > - they will be on the same socket as the device, but different socket to > > the main lcore. > > - they will be on the same socket as the main lcore, but different socket > > to the device. > > > > Personally, I'd suspect both to be unlikely, but also both to be possible. > > For the first scenario, I don't see anything being broken or affected by > > the proposed fix here, since priority is still being given to memory on the > > same socket as the device. It just opens up the possibility of scenario > > two. > > > > My comment was on suggestion to use "rte_socket_id()" rather than the > device socket id, > if both nodes have memory, memory should be allocated from the one where > device is in, because although device doesn't use private_data, polling > cores will and polling cores will be most likely in the same node with > device and memory, but we don't know main core is in. > So I think first try for memory allocation should be node where device > is in, which is the existing code. > > If node that has device doesn't have any memory attached, your change > enables this case, as already there is memory only in one node, it > doesn't matter if we check device node or main core node anyway. > > > Briefly, I am OK to current patch with a warning log in fallback, but > not to "rte_socket_id()" change. > Ack, makes sense. Done in V2 patch.
Thanks for review. /Bruce