On 2014/11/28 1:20, Paolo Bonzini wrote: > > On 27/11/2014 14:00, Gonglei (Arei) wrote: >>> >> >>> >> Running a redhat-6.4-64bit (kernel 2.6.32-358.el6.x86_64) or elder guest >>> >> on >>> >> qemu-2.1, with kvm enabled and -cpu host, non default cpu-topology and >>> >> guest >>> >> numa >>> >> I'm seeing a reliable kernel panic from the guest shortly after boot. It >>> >> is >>> >> happening in >>> >> find_busiest_group(). >>> >> >>> >> We also found it happend since commit >>> >> 787aaf5703a702094f395db6795e74230282cd62 by git bisect. >>> >> >>> >> The reproducer: >>> >> >>> >> (1) full qemu cmd line: >>> >> qemu-system-x86_64 -machine pc-i440fx-2.1,accel=kvm,usb=off \ >>> >> -cpu host -m 16384 \ >>> >> -smp 16,sockets=2,cores=4,threads=2 \ >>> >> -object memory-backend-ram,size=8192M,id=ram-node0 \ >>> >> -numa node,nodeid=0,cpus=0-7,memdev=ram-node0 \ >>> >> -object memory-backend-ram,size=8192M,id=ram-node1 \ >>> >> -numa node,nodeid=1,cpus=8-15,memdev=ram-node1 \ >>> >> -boot c -drive file=/data/wxin/vm/redhat_6.4_64 \ >>> >> -vnc 0.0.0.0:0 -device >>> >> cirrus-vga,id=video0,vgamem_mb=8,bus=pci.0,addr=0x1.0x4 \ >>> >> -msg timestamp=on >>> >> >>> >> (2)the guest kernel messages: > Can you find what line of kernel/sched.c it is?
Yes, of course. See below please: "sgs->avg_load = (sgs->group_load * SCHED_LOAD_SCALE) / group->cpu_power; " in update_sg_lb_stats(), file sched.c, line 4094 And I can share the cause of we found. After commit 787aaf57(target-i386: forward CPUID cache leaves when -cpu host is used), guest will get cpu cache from host when -cpu host is used. But if we configure guest numa: node 0 cpus 0~7 node 1 cpus 8~15 then the numa nodes lie in the same host cpu cache (cpus 0~16). When the guest os boot, calculate group->cpu_power, but the guest find thoes two different nodes own the same cache, then node1's group->cpu_power will not be valued, just is the initial value '0'. And when vcpu is scheduled, division by 0 causes kernel panic. Regards, -Gonglei