Hi, On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote: > 2) I'm a bit unsure what "NUMA nodes" actually means. The patch mostly > assumes each core / piece of RAM is assigned to a particular NUMA node.
There are systems in which some NUMA nodes do *not* contain any CPUs. E.g. if you attach memory via a CXL/PCIe add-in card, rather than via the CPUs memory controller. In that case numactl -H (and obviously also the libnuma APIs) will report that the numa node is not associated with any CPU. I don't currently have live access to such a system, but this PR piece happens to have numactl -H output: https://lenovopress.lenovo.com/lp2184-implementing-cxl-memory-on-linux-on-thinksystem-v4-servers > numactl -H > available: 4 nodes (0-3) > node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 > 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 96 97 98 > 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 > 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 > 137 138 139 140 141 142 143 > node 0 size: 1031904 MB > node 0 free: 1025554 MB > node 1 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 > 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 > 95 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 > 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 > 181 182 183 184 185 186 187 188 189 190 191 > node 1 size: 1032105 MB > node 1 free: 1024244 MB > node 2 cpus: > node 2 size: 262144 MB > node 2 free: 262143 MB > node 3 cpus: > node 3 size: 262144 MB > node 3 free: 262142 MB > node distances: > node 0 1 2 3 > 0: 10 21 14 24 > 1: 21 10 24 14 > 2: 14 24 10 26 > 3: 24 14 26 10 Note that node 2 & 3 don't have associated CPUs (and higher access costs). I don't think this is common enough to worry about from a performance POV, but we probably shouldn't crash if we encounter it... > But it also cares about the cores (and the node for each core), because > it uses that to pick the right partition for a backend. And here the > situation is less clear, because the CPUs don't need to be assigned to a > particular node, even on a NUMA system. Consider the rpi5 NUMA layout: > > $ numactl --hardware > available: 8 nodes (0-7) > node 0 cpus: 0 1 2 3 > node 0 size: 992 MB > node 0 free: 274 MB > node 1 cpus: 0 1 2 3 > node 1 size: 1019 MB > node 1 free: 327 MB > ... > node 0 1 2 3 4 5 6 7 > 0: 10 10 10 10 10 10 10 10 > 1: 10 10 10 10 10 10 10 10 > 2: 10 10 10 10 10 10 10 10 > 3: 10 10 10 10 10 10 10 10 > 4: 10 10 10 10 10 10 10 10 > 5: 10 10 10 10 10 10 10 10 > 6: 10 10 10 10 10 10 10 10 > 7: 10 10 10 10 10 10 10 10 > This says there are 8 NUMA nodes, each with ~1GB of RAM. But the 4 cores > are not assigned to particular nodes - each core is mapped to all 8 NUMA > nodes. FWIW, you can get a different version of this with AMD Epyc too, if "L3 LLC as NUMA" is enabled. > I'm not sure what to do about this (or how getcpu() or libnuma handle this). I don't immediately see any libnuma functions that would care? I also am somewhat curious about what getcpu() returns for the current node... Greetings, Andres Freund