Hi, VPP crashes on CSIT Taishan server due to function vlib_get_thread_core_numa (unsigned cpu_id) not getting NUMA node correctly via cpu_id on Taishan server. vlib_get_thread_core_numa () is using physical_package_id as NUMA node.
However, Taishan server has 2 physical sockets, but 4 NUMA nodes as below output of lscpu. And the sysfs shows the physical_package_id is not sequential on Taishan server. taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu37/topology/physical_package_id 3002 taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu3/topology/physical_package_id 36 taishan-d05-08:~$ cat /sys/devices/system/cpu/cpu15/topology/physical_package_id 36 How about using below sysfs to get NUA node via cpu_id? The code change is also attached in the end. $ cat /sys/devices/system/node/online 0-3 $ cat /sys/devices/system/node/node0/cpulist 0-15 $ cat /sys/devices/system/node/node1/cpulist 16-31 $ cat /sys/devices/system/node/node2/cpulist 32-47 $ cat /sys/devices/system/node/node3/cpulist 48-63 taishan-d05-08:~$ lscpu Architecture: aarch64 Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 32 Socket(s): 2 NUMA node(s): 4 Vendor ID: ARM Model: 2 Model name: Cortex-A72 Stepping: r0p2 BogoMIPS: 100.00 L1d cache: 32K L1i cache: 48K L2 cache: 1024K L3 cache: 16384K NUMA node0 CPU(s): 0-15 NUMA node1 CPU(s): 16-31 NUMA node2 CPU(s): 32-47 NUMA node3 CPU(s): 48-63 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid diff --git a/src/vlib/threads.c b/src/vlib/threads.c index 1ce4dc156..3f0905421 100644 --- a/src/vlib/threads.c +++ b/src/vlib/threads.c @@ -598,15 +598,30 @@ void vlib_get_thread_core_numa (vlib_worker_thread_t * w, unsigned cpu_id) { const char *sys_cpu_path = "/sys/devices/system/cpu/cpu"; + const char *sys_node_path = "/sys/devices/system/node/node"; + clib_bitmap_t *nbmp = 0, *cbmp = 0; + u32 node; u8 *p = 0; int core_id = -1, numa_id = -1; p = format (p, "%s%u/topology/core_id%c", sys_cpu_path, cpu_id, 0); clib_sysfs_read ((char *) p, "%d", &core_id); vec_reset_length (p); - p = format (p, "%s%u/topology/physical_package_id%c", sys_cpu_path, - cpu_id, 0); - clib_sysfs_read ((char *) p, "%d", &numa_id); + + /* *INDENT-OFF* */ + clib_sysfs_read ("/sys/devices/system/node/online", "%U", + unformat_bitmap_list, &nbmp); + clib_bitmap_foreach (node, nbmp, ({ + p = format (p, "%s%u/cpulist%c", sys_node_path, node, 0); + clib_sysfs_read ((char *) p, "%U", unformat_bitmap_list, &cbmp); + if (clib_bitmap_get (cbmp, cpu_id)) + numa_id = node; + vec_reset_length (cbmp); + vec_reset_length (p); + })); + /* *INDENT-ON* */ + vec_free (nbmp); + vec_free (cbmp); vec_free (p); w->core_id = core_id; Thanks.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#15395): https://lists.fd.io/g/vpp-dev/message/15395 Mute This Topic: https://lists.fd.io/mt/71261446/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-