Changelog v2 -> v3: v2: https://lore.kernel.org/linuxppc-dev/20210821102535.169643-1-sri...@linux.vnet.ibm.com/t/#u Add patch 1: to drop dbg and numa=debug (Suggested by Michael Ellerman) Add patch 2: to convert printk to pr_xxx (Suggested by Michael Ellerman) Use pr_warn instead of pr_debug(WARNING) (Suggested by Laurent)
Changelog v1 -> v2: Moved patch to this series: powerpc/numa: Fill distance_lookup_table for offline nodes fixed a missing prototype warning Scheduler expects unique number of node distances to be available at boot. It uses node distance to calculate this unique node distances. On Power Servers, node distances for offline nodes is not available. However, Power Servers already knows unique possible node distances. Fake the offline node's distance_lookup_table entries so that all possible node distances are updated. For example distance info from numactl from a fully populated 8 node system at boot may look like this. node distances: node 0 1 2 3 4 5 6 7 0: 10 20 40 40 40 40 40 40 1: 20 10 40 40 40 40 40 40 2: 40 40 10 20 40 40 40 40 3: 40 40 20 10 40 40 40 40 4: 40 40 40 40 10 20 40 40 5: 40 40 40 40 20 10 40 40 6: 40 40 40 40 40 40 10 20 7: 40 40 40 40 40 40 20 10 However the same system when only two nodes are online at boot, then distance info from numactl will look like node distances: node 0 1 0: 10 20 1: 20 10 With the faked numa distance at boot, the node distance table will look like node 0 1 2 0: 10 20 40 1: 20 10 40 2: 40 40 10 The actual distance will be populated once the nodes are onlined. Also when simultaneously running CPU online/offline with CPU add/remove in a loop, we see a WARNING messages. WARNING: CPU: 13 PID: 1142 at kernel/sched/topology.c:898 build_sched_domains+0xd48/0x1720 Modules linked in: rpadlpar_io rpaphp mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink pseries_rng xts vmx_crypto uio_pdrv_genirq uio binfmt_misc ip_tables xfs libcrc32c dm_service_time sd_mod t10_pi sg ibmvfc scsi_transport_fc ibmveth dm_multipath dm_mirror dm_region_hash dm_log dm_mod fuse CPU: 13 PID: 1142 Comm: kworker/13:2 Not tainted 5.13.0-rc6+ #28 Workqueue: events cpuset_hotplug_workfn NIP: c0000000001caac8 LR: c0000000001caac4 CTR: 00000000007088ec REGS: c00000005596f220 TRAP: 0700 Not tainted (5.13.0-rc6+) MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE> CR: 48828222 XER: 00000009 CFAR: c0000000001ea698 IRQMASK: 0 GPR00: c0000000001caac4 c00000005596f4c0 c000000001c4a400 0000000000000036 GPR04: 00000000fffdffff c00000005596f1d0 0000000000000027 c0000018cfd07f90 GPR08: 0000000000000023 0000000000000001 0000000000000027 c0000018fe68ffe8 GPR12: 0000000000008000 c00000001e9d1880 c00000013a047200 0000000000000800 GPR16: c000000001d3c7d0 0000000000000240 0000000000000048 c000000010aacd18 GPR20: 0000000000000001 c000000010aacc18 c00000013a047c00 c000000139ec2400 GPR24: 0000000000000280 c000000139ec2520 c000000136c1b400 c000000001c93060 GPR28: c00000013a047c20 c000000001d3c6c0 c000000001c978a0 000000000000000d NIP [c0000000001caac8] build_sched_domains+0xd48/0x1720 LR [c0000000001caac4] build_sched_domains+0xd44/0x1720 Call Trace: [c00000005596f4c0] [c0000000001caac4] build_sched_domains+0xd44/0x1720 (unreliable) [c00000005596f670] [c0000000001cc5ec] partition_sched_domains_locked+0x3ac/0x4b0 [c00000005596f710] [c0000000002804e4] rebuild_sched_domains_locked+0x404/0x9e0 [c00000005596f810] [c000000000283e60] rebuild_sched_domains+0x40/0x70 [c00000005596f840] [c000000000284124] cpuset_hotplug_workfn+0x294/0xf10 [c00000005596fc60] [c000000000175040] process_one_work+0x290/0x590 [c00000005596fd00] [c0000000001753c8] worker_thread+0x88/0x620 [c00000005596fda0] [c000000000181704] kthread+0x194/0x1a0 [c00000005596fe10] [c00000000000ccec] ret_from_kernel_thread+0x5c/0x70 Instruction dump: 485af049 60000000 2fa30800 409e0028 80fe0000 e89a00f8 e86100e8 38da0120 7f88e378 7ce53b78 4801fb91 60000000 <0fe00000> 39000000 38e00000 38c00000 This was because cpu_cpu_mask() was not getting updated on CPU online/offline but would be only updated when add/remove of CPUs. Other cpumasks get updated both on CPU online/offline and add/remove Update cpu_cpu_mask() on CPU online/offline too. Cc: linuxppc-dev@lists.ozlabs.org Cc: Nathan Lynch <nath...@linux.ibm.com> Cc: Michael Ellerman <m...@ellerman.id.au> Cc: Ingo Molnar <mi...@kernel.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: Valentin Schneider <valentin.schnei...@arm.com> Cc: Gautham R Shenoy <e...@linux.vnet.ibm.com> Cc: Vincent Guittot <vincent.guit...@linaro.org> Cc: Geetika Moolchandani <geetika.moolchanda...@ibm.com> Cc: Laurent Dufour <lduf...@linux.ibm.com> Srikar Dronamraju (5): powerpc/numa: Drop dbg in favour of pr_debug powerpc/numa: convert printk to pr_xxx powerpc/numa: Print debug statements only when required powerpc/numa: Update cpu_cpu_map on CPU online/offline powerpc/numa: Fill distance_lookup_table for offline nodes arch/powerpc/include/asm/topology.h | 12 +++ arch/powerpc/kernel/smp.c | 3 + arch/powerpc/mm/numa.c | 120 ++++++++++++++++++++-------- 3 files changed, 103 insertions(+), 32 deletions(-) -- 2.18.2