On 16.09.2021 06:06, osstest service owner wrote:
> flight 164996 xen-unstable real [real]
> flight 165002 xen-unstable real-retest [real]
> http://logs.test-lab.xenproject.org/osstest/logs/164996/
> http://logs.test-lab.xenproject.org/osstest/logs/165002/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-arm64-arm64-libvirt-raw 17 guest-start/debian.repeat fail REGR. vs. 
> 164945

Since no-one gave a sign so far of looking into this failure, I took
a look despite having little hope to actually figure something. I'm
pretty sure the randomness of the "when" of this failure correlates
with

Sep 15 14:44:48.518439 [ 1613.227909] rpc-worker: page allocation failure: 
order:4, mode:0x40dc0(GFP_KERNEL|__GFP_COMP|__GFP_ZERO), 
nodemask=(null),cpuset=/,mems_allowed=0
Sep 15 14:44:55.418534 [ 1613.240888] CPU: 48 PID: 2029 Comm: rpc-worker Not 
tainted 5.4.17+ #1
Sep 15 14:44:55.430511 [ 1613.247370] Hardware name: Cavium ThunderX CN88XX 
board (DT)
Sep 15 14:44:55.430576 [ 1613.253099] Call trace:
Sep 15 14:44:55.442497 [ 1613.255620]  dump_backtrace+0x0/0x140
Sep 15 14:44:55.442558 [ 1613.259348]  show_stack+0x14/0x20
Sep 15 14:44:55.442606 [ 1613.262734]  dump_stack+0xbc/0x100
Sep 15 14:44:55.442651 [ 1613.266206]  warn_alloc+0xf8/0x160
Sep 15 14:44:55.454512 [ 1613.269677]  __alloc_pages_slowpath+0x9c4/0x9f0
Sep 15 14:44:55.454574 [ 1613.274277]  __alloc_pages_nodemask+0x1cc/0x248
Sep 15 14:44:55.466498 [ 1613.278878]  kmalloc_order+0x24/0xa8
Sep 15 14:44:55.466559 [ 1613.282523]  __kmalloc+0x244/0x270
Sep 15 14:44:55.466607 [ 1613.285995]  alloc_empty_pages.isra.17+0x34/0xb0
Sep 15 14:44:55.478495 [ 1613.290681]  
privcmd_ioctl_mmap_batch.isra.20+0x414/0x428
Sep 15 14:44:55.478560 [ 1613.296149]  privcmd_ioctl+0xbc/0xb7c
Sep 15 14:44:55.478608 [ 1613.299883]  do_vfs_ioctl+0xb8/0xae0
Sep 15 14:44:55.490475 [ 1613.303527]  ksys_ioctl+0x78/0xa8
Sep 15 14:44:55.490536 [ 1613.306911]  __arm64_sys_ioctl+0x1c/0x28
Sep 15 14:44:55.490584 [ 1613.310906]  el0_svc_common.constprop.2+0x88/0x150
Sep 15 14:44:55.502489 [ 1613.315765]  el0_svc_handler+0x20/0x80
Sep 15 14:44:55.502551 [ 1613.319583]  el0_svc+0x8/0xc

As per

Sep 15 14:44:55.502598 [ 1613.322585] Mem-Info:
Sep 15 14:44:55.502643 [ 1613.324918] active_anon:5639 inactive_anon:15857 
isolated_anon:0
Sep 15 14:44:55.514480 [ 1613.324918]  active_file:13286 inactive_file:11182 
isolated_file:0
Sep 15 14:44:55.514545 [ 1613.324918]  unevictable:0 dirty:30 writeback:0 
unstable:0
Sep 15 14:44:55.526477 [ 1613.324918]  slab_reclaimable:10922 
slab_unreclaimable:30234
Sep 15 14:44:55.526540 [ 1613.324918]  mapped:11277 shmem:10975 pagetables:401 
bounce:0
Sep 15 14:44:55.538474 [ 1613.324918]  free:8364 free_pcp:100 free_cma:1650

the system doesn't look to really be out of memory; as per

Sep 15 14:44:55.598538 [ 1613.419061] DMA32: 2788*4kB (UMEC) 890*8kB (UMEC) 
497*16kB (UMEC) 36*32kB (UMC) 1*64kB (C) 1*128kB (C) 9*256kB (C) 7*512kB (C) 
0*1024kB 0*2048kB 0*4096kB = 33456kB

there even look to be a number of higher order pages available (albeit
without digging I can't tell what "(C)" means). Nevertheless order-4
allocations aren't really nice.

What I can't see is why this may have started triggering recently. Was
the kernel updated in osstest? Is 512Mb of memory perhaps a bit too
small for a Dom0 on this system (with 96 CPUs)? Going through the log
I haven't been able to find crucial information like how much memory
the host has or what the hypervisor command line was.

Jan


Reply via email to