On Fri, 17 May 2024 19:55:21 +0530, Nilay Shroff wrote: > On NUMA aware system, we make a numa-node online only if that node is > attached to cpu/memory. However it's possible that we have some PCI/IO > device affinitized to a numa-node which is not currently online. In such > case we set the numa-node id of the corresponding PCI device to -1 > (NUMA_NO_NODE). Not assigning the correct numa-node id to PCI device may > impact the performance of such device. For instance, we have a multi > controller NVMe disk where each controller of the disk is attached to > different PHB (PCI host bridge). Each of these PHBs has numa-node id > assigned during PCI enumeration. During PCI enumeration if we find that > the numa-node is not online then we set the numa-node id of the PHB to -1. > If we create shared namespace and attach to multi controller NVMe disk > then that namespace could be accessed through each controller and as each > controller is connected to different PHBs, it's possible to access the > same namespace using multiple PCI channel. While sending IO to a shared > namespace, NVMe driver would calculate the optimal IO path using numa-node > distance. However if the numa-node id is not correctly assigned to NVMe > PCIe controller then it's possible that driver would calculate incorrect > NUMA distance and hence select the non-optimal path for sending IO. If > this happens then we could potentially observe the degraded IO performance. > > [...]
Applied to powerpc/next. [1/1] powerpc/numa: Online a node if PHB is attached. https://git.kernel.org/powerpc/c/11981816e3614156a1fe14a1e8e77094ea46c7d5 cheers