Hi all,
I have reported this issue on gitlab at https://gitlab.com/qemu-project/qemu/-/issues/2984.

Steps to reproduce:

1) Start a guest with virt-type as qemu
<domain type='qemu'>
  <name>linux</name>
  <uuid>cba9037f-2a62-41f9-98c1-0780b2ff49b9</uuid>
  <maxMemory slots='16' unit='KiB'>419430400</maxMemory>
  <memory unit='KiB'>20971520</memory>
  <currentMemory unit='KiB'>10485760</currentMemory>
  <memoryBacking>
    <locked/>
  </memoryBacking>
  <vcpu placement='static' current='4'>1024</vcpu>


2) lscpu on host:
lscpu
Architecture:             ppc64le
  Byte Order:             Little Endian
CPU(s):                   40
  On-line CPU(s) list:    0-39
Model name:               POWER10 (architected), altivec supported
  Model:                  2.0 (pvr 0080 0200)
  Thread(s) per core:     8
  Core(s) per socket:     5
  Socket(s):              1
  Physical sockets:       4
  Physical chips:         1
  Physical cores/chip:    12

3) [On host] virsh setvcpus linux 800
error: Unable to read from monitor: Connection reset by peer

4) Guest is getting into shutoff state

5) I am seeing this issue on upstream qemu also



Tried reproducing while attaching gdb shows below backtrace which happened after hotplugging 249 CPUs in TCG mode:

Thread 261 "CPU 249/TCG" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ff97c00ea20 (LWP 51567)]
0x00007fff84cac3e8 in __pthread_kill_implementation () from target:/lib64/glibc-hwcaps/power10/libc.so.6
(gdb) bt
#0  0x00007fff84cac3e8 in __pthread_kill_implementation () from target:/lib64/glibc-hwcaps/power10/libc.so.6 #1  0x00007fff84c46ba0 in raise () from target:/lib64/glibc-hwcaps/power10/libc.so.6 #2  0x00007fff84c29354 in abort () from target:/lib64/glibc-hwcaps/power10/libc.so.6 #3  0x00007fff850f1e30 in g_assertion_message () from target:/lib64/libglib-2.0.so.0 #4  0x00007fff850f1ebc in g_assertion_message_expr () from target:/lib64/libglib-2.0.so.0 #5  0x00000001376c6f00 in tcg_region_initial_alloc__locked (s=0x7fff7c000f30) at ../tcg/region.c:396 #6  0x00000001376c6fa8 in tcg_region_initial_alloc (s=0x7fff7c000f30) at ../tcg/region.c:402
#7  0x00000001376dae08 in tcg_register_thread () at ../tcg/tcg.c:1011
#8  0x000000013768b7e4 in mttcg_cpu_thread_fn (arg=0x143e884f0) at ../accel/tcg/tcg-accel-ops-mttcg.c:77 #9  0x0000000137bbb2d0 in qemu_thread_start (args=0x143b4aff0) at ../util/qemu-thread-posix.c:542 #10 0x00007fff84ca9be0 in start_thread () from target:/lib64/glibc-hwcaps/power10/libc.so.6 #11 0x00007fff84d4ef3c in __clone3 () from target:/lib64/glibc-hwcaps/power10/libc.so.6
(gdb)


which points to below code:

/*
 * Perform a context's first region allocation.
 * This function does _not_ increment region.agg_size_full.
 */
static void tcg_region_initial_alloc__locked(TCGContext *s)
{
    bool err = tcg_region_alloc__locked(s);
    g_assert(!err);
}

Here, tcg_region_alloc__locked is defined as below:


static bool tcg_region_alloc__locked(TCGContext *s)
{
    if (region.current == region.n) {
        return true;
    }
    tcg_region_assign(s, region.current);
    region.current++;
    return false;
}

Thanks,
Anushree-Mathur

Reply via email to