Hi all,
I have reported this issue on gitlab at
https://gitlab.com/qemu-project/qemu/-/issues/2984.
Steps to reproduce:
1) Start a guest with virt-type as qemu
<domain type='qemu'>
<name>linux</name>
<uuid>cba9037f-2a62-41f9-98c1-0780b2ff49b9</uuid>
<maxMemory slots='16' unit='KiB'>419430400</maxMemory>
<memory unit='KiB'>20971520</memory>
<currentMemory unit='KiB'>10485760</currentMemory>
<memoryBacking>
<locked/>
</memoryBacking>
<vcpu placement='static' current='4'>1024</vcpu>
2) lscpu on host:
lscpu
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Model name: POWER10 (architected), altivec supported
Model: 2.0 (pvr 0080 0200)
Thread(s) per core: 8
Core(s) per socket: 5
Socket(s): 1
Physical sockets: 4
Physical chips: 1
Physical cores/chip: 12
3) [On host] virsh setvcpus linux 800
error: Unable to read from monitor: Connection reset by peer
4) Guest is getting into shutoff state
5) I am seeing this issue on upstream qemu also
Tried reproducing while attaching gdb shows below backtrace which
happened after hotplugging 249 CPUs in TCG mode:
Thread 261 "CPU 249/TCG" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ff97c00ea20 (LWP 51567)]
0x00007fff84cac3e8 in __pthread_kill_implementation () from
target:/lib64/glibc-hwcaps/power10/libc.so.6
(gdb) bt
#0 0x00007fff84cac3e8 in __pthread_kill_implementation () from
target:/lib64/glibc-hwcaps/power10/libc.so.6
#1 0x00007fff84c46ba0 in raise () from
target:/lib64/glibc-hwcaps/power10/libc.so.6
#2 0x00007fff84c29354 in abort () from
target:/lib64/glibc-hwcaps/power10/libc.so.6
#3 0x00007fff850f1e30 in g_assertion_message () from
target:/lib64/libglib-2.0.so.0
#4 0x00007fff850f1ebc in g_assertion_message_expr () from
target:/lib64/libglib-2.0.so.0
#5 0x00000001376c6f00 in tcg_region_initial_alloc__locked
(s=0x7fff7c000f30) at ../tcg/region.c:396
#6 0x00000001376c6fa8 in tcg_region_initial_alloc (s=0x7fff7c000f30) at
../tcg/region.c:402
#7 0x00000001376dae08 in tcg_register_thread () at ../tcg/tcg.c:1011
#8 0x000000013768b7e4 in mttcg_cpu_thread_fn (arg=0x143e884f0) at
../accel/tcg/tcg-accel-ops-mttcg.c:77
#9 0x0000000137bbb2d0 in qemu_thread_start (args=0x143b4aff0) at
../util/qemu-thread-posix.c:542
#10 0x00007fff84ca9be0 in start_thread () from
target:/lib64/glibc-hwcaps/power10/libc.so.6
#11 0x00007fff84d4ef3c in __clone3 () from
target:/lib64/glibc-hwcaps/power10/libc.so.6
(gdb)
which points to below code:
/*
* Perform a context's first region allocation.
* This function does _not_ increment region.agg_size_full.
*/
static void tcg_region_initial_alloc__locked(TCGContext *s)
{
bool err = tcg_region_alloc__locked(s);
g_assert(!err);
}
Here, tcg_region_alloc__locked is defined as below:
static bool tcg_region_alloc__locked(TCGContext *s)
{
if (region.current == region.n) {
return true;
}
tcg_region_assign(s, region.current);
region.current++;
return false;
}
Thanks,
Anushree-Mathur