[Kernel-packages] [Bug 2076866] Re: Guest crashes post migration with migrate_misplaced_folio+0x4cc/0x5d0

bugproxy Wed, 16 Oct 2024 10:56:14 -0700

------- Comment From anushree.math...@ibm.com 2024-10-16 13:44 EDT-------
I updated the logs for the different bugzilla to this bugzilla. Now updating 
the bugzilla along with proper logs to specify that it is working fine!


HOST[SOURCE](MDC):
OS : ubuntu
Kernel: 6.8.0-48-generic
qemu : QEMU emulator version 8.2.2 (Debian 1:8.2.2+ds-0ubuntu1.2)
libvirt : libvirtd (libvirt) 10.0.0

HOST[DESTINATION](NON-MDC)
OS: ubuntu
Kernel: 6.8.0-48-generic
qemu : QEMU emulator version 8.2.2 (Debian 1:8.2.2+ds-0ubuntu1.2)
libvirt : libvirtd (libvirt) 10.0.0

GUEST:
OS : ubuntu
kernel: 6.8.0-47-generic

I started the ltp memstress test and started the migration of the guest
from MDC to non-MDC system and after 2 hours i saw that the system was
still up!

Setps i tried:
1) Start the LTP memstress test
root@ubuntu:/opt/ltp ./runltp -f controllers -s memcg_stress -t 120m
-------------------------------------------
INFO: runltp script is deprecated, try kirk
https://github.com/linux-test-project/kirk
-------------------------------------------
Checking for required user/group ids

'root' user id and group found.
'nobody' user id and group found.
'bin' user id and group found.
'daemon' user id and group found.
Users group found.
Sys group found.
Required users/groups exist.
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.

/etc/lsb-release
/etc/os-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.1 LTS"
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

uname:
Linux ubuntu 6.8.0-47-generic #47-Ubuntu SMP Fri Sep 27 21:38:55 UTC 2024 
ppc64le ppc64le ppc64le GNU/Linux

/proc/cmdline
BOOT_IMAGE=/vmlinux-6.8.0-47-generic root=/dev/mapper/ubuntu--vg-ubuntu--lv ro 
crashkernel=4096M

Gnu C                  gcc (Ubuntu 13.2.0-23ubuntu4) 13.2.0
Clang
Gnu make               4.3
util-linux             2.39.3
mount                  linux 2.39.3 (libmount 2.39.3: selinux, smack, btrfs, 
verity, namespaces, idmapping, statx, assert, debug)
modutils               31
e2fsprogs              1.47.0
Linux C Library        gnu/libc.so.6
Dynamic linker (ldd)   2.39
Procps                 4.0.4
iproute2               1.3.0
iputils                20240117
ethtool                6.7
Sh-utils               9.4
Modules Loaded         xt_tcpudp nft_compat nf_tables qrtr cfg80211 binfmt_misc 
uio_pdrv_genirq vmx_crypto uio sch_fq_codel dm_multipath nfnetlink ip_tables 
x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 
ibmvscsi poly1305_p10_crypto crct10dif_vpmsum xhci_pci crc32c_vpmsum 
xhci_pci_renesas aes_gcm_p10_crypto

cpuinfo:
Architecture:             ppc64le
Byte Order:             Little Endian
CPU(s):                   16
On-line CPU(s) list:    0-15
Model name:               POWER10 (architected), altivec supported
Model:                  2.0 (pvr 0080 0200)
Thread(s) per core:     2
Core(s) per socket:     8
Socket(s):              1
Virtualization features:
Hypervisor vendor:      KVM
Virtualization type:    para
Caches (sum of all):
L1d:                    256 KiB (8 instances)
L1i:                    384 KiB (8 instances)
NUMA:
NUMA node(s):           1
NUMA node0 CPU(s):      0-15
Vulnerabilities:
Gather data sampling:   Not affected
Itlb multihit:          Not affected
L1tf:                   Mitigation; RFI Flush, L1D private per thread
Mds:                    Not affected
Meltdown:               Mitigation; RFI Flush, L1D private per thread
Mmio stale data:        Not affected
Reg file data sampling: Not affected
Retbleed:               Not affected
Spec rstack overflow:   Not affected
Spec store bypass:      Mitigation; Kernel entry/exit barrier (eieio)
Spectre v1:             Mitigation; __user pointer sanitization, ori31 specula
tion barrier enabled
Spectre v2:             Mitigation; Software count cache flush (hardware accel
erated), Software link stack flush
Srbds:                  Not affected
Tsx async abort:        Not affected

free reports:
total        used        free      shared  buff/cache   available
Mem:        26067968     1099520    24681664       25024      465600    24968448
Swap:              0           0           0

memory (/proc/meminfo):
MemTotal:       26067968 kB
MemFree:        24681664 kB
MemAvailable:   24968448 kB
Buffers:           16448 kB
Cached:           415488 kB
SwapCached:            0 kB
Active:           454848 kB
Inactive:         108800 kB
Active(anon):     118336 kB
Inactive(anon):    50304 kB
Active(file):     336512 kB
Inactive(file):    58496 kB
Unevictable:       33856 kB
Mlocked:           33856 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Zswap:                 0 kB
Zswapped:              0 kB
Dirty:             93568 kB
Writeback:             0 kB
AnonPages:        165440 kB
Mapped:            57472 kB
Shmem:             25024 kB
KReclaimable:      33664 kB
Slab:             449920 kB
SReclaimable:      33664 kB
SUnreclaim:       416256 kB
KernelStack:        5952 kB
PageTables:         4928 kB
SecPageTables:         0 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    13033984 kB
Committed_AS:     511168 kB
VmallocTotal:   549755813888 kB
VmallocUsed:       23744 kB
VmallocChunk:          0 kB
Percpu:            86016 kB
HardwareCorrupted:     0 kB
AnonHugePages:      2048 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
DirectMap4k:           0 kB
DirectMap64k:    31457280 kB
DirectMap2M:           0 kB
DirectMap1G:           0 kB

available filesystems:
autofs bdev binfmt_misc bpf btrfs cgroup cgroup2 configfs cpuset debugfs devpts 
devtmpfs ecryptfs ext2 ext3 ext4 fuse fuseblk fusectl hugetlbfs mqueue pipefs 
proc pstore ramfs securityfs sockfs squashfs sysfs tmpfs tracefs vfat

mounted filesystems (/proc/mounts):
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs 
rw,nosuid,relatime,size=12999168k,nr_inodes=203112,mode=755,inode64 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=2606848k,mode=755,inode64 
0 0
/dev/mapper/ubuntu--vg-ubuntu--lv / ext4 rw,relatime 0 0
securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev/shm tmpfs rw,nosuid,nodev,inode64 0 0
tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k,inode64 0 0
cgroup2 /sys/fs/cgroup cgroup2 
rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
bpf /sys/fs/bpf bpf rw,nosuid,nodev,noexec,relatime,mode=700 0 0
systemd-1 /proc/sys/fs/binfmt_misc autofs 
rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=19477 
0 0
debugfs /sys/kernel/debug debugfs rw,nosuid,nodev,noexec,relatime 0 0
hugetlbfs /dev/hugepages hugetlbfs rw,nosuid,nodev,relatime,pagesize=2M 0 0
mqueue /dev/mqueue mqueue rw,nosuid,nodev,noexec,relatime 0 0
tracefs /sys/kernel/tracing tracefs rw,nosuid,nodev,noexec,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,nosuid,nodev,noexec,relatime 0 0
configfs /sys/kernel/config configfs rw,nosuid,nodev,noexec,relatime 0 0
/dev/sda2 /boot ext4 rw,relatime 0 0
binfmt_misc /proc/sys/fs/binfmt_misc binfmt_misc 
rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /run/user/0 tmpfs 
rw,nosuid,nodev,relatime,size=2606784k,nr_inodes=651696,mode=700,inode64 0 0

mounted filesystems (df):
Filesystem                        Type   Size  Used Avail Use% Mounted on
tmpfs                             tmpfs  2.5G   13M  2.5G   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv ext4   9.8G  5.2G  4.1G  57% /
tmpfs                             tmpfs   13G     0   13G   0% /dev/shm
tmpfs                             tmpfs  5.0M     0  5.0M   0% /run/lock
/dev/sda2                         ext4   1.8G  255M  1.4G  16% /boot
tmpfs                             tmpfs  2.5G  192K  2.5G   1% /run/user/0

tainted (/proc/sys/kernel/tainted):
0

AppArmor enabled
apparmor module is loaded.
119 profiles are loaded.
24 profiles are in enforce mode.
/usr/bin/man
/usr/lib/snapd/snap-confine
/usr/lib/snapd/snap-confine//mount-namespace-capture-helper
lsb_release
man_filter
man_groff
nvidia_modprobe
nvidia_modprobe//kmod
plasmashell
plasmashell//QtWebEngineProcess
rsyslogd
tcpdump
ubuntu_pro_apt_news
ubuntu_pro_esm_cache
ubuntu_pro_esm_cache//apt_methods
ubuntu_pro_esm_cache//apt_methods_gpgv
ubuntu_pro_esm_cache//cloud_id
ubuntu_pro_esm_cache//dpkg
ubuntu_pro_esm_cache//ps
ubuntu_pro_esm_cache//ubuntu_distro_info
ubuntu_pro_esm_cache_systemctl
ubuntu_pro_esm_cache_systemd_detect_virt
unix-chkpwd
unprivileged_userns
4 profiles are in complain mode.
transmission-cli
transmission-daemon
transmission-gtk
transmission-qt
0 profiles are in prompt mode.
0 profiles are in kill mode.
91 profiles are in unconfined mode.
1password
Discord
MongoDB Compass
QtWebEngineProcess
balena-etcher
brave
buildah
busybox
cam
ch-checkns
ch-run
chrome
crun
devhelp
element-desktop
epiphany
evolution
firefox
flatpak
foliate
geary
github-desktop
goldendict
ipa_verify
kchmviewer
keybase
lc-compliance
libcamerify
linux-sandbox
loupe
lxc-attach
lxc-create
lxc-destroy
lxc-execute
lxc-stop
lxc-unshare
lxc-usernsexec
mmdebstrap
msedge
nautilus
notepadqq
obsidian
opam
opera
pageedit
podman
polypane
privacybrowser
qcam
qmapshack
qutebrowser
rootlesskit
rpm
rssguard
runc
sbuild
sbuild-abort
sbuild-adduser
sbuild-apt
sbuild-checkpackages
sbuild-clean
sbuild-createchroot
sbuild-destroychroot
sbuild-distupgrade
sbuild-hold
sbuild-shell
sbuild-unhold
sbuild-update
sbuild-upgrade
scide
signal-desktop
slack
slirp4netns
steam
stress-ng
surfshark
systemd-coredump
thunderbird
toybox
trinity
tup
tuxedo-control-center
userbindmount
uwsgi-core
vdens
virtiofsd
vivaldi-bin
vpnns
vscode
wike
wpcom
0 processes have profiles defined.
0 processes are in enforce mode.
0 processes are in complain mode.
0 processes are in prompt mode.
0 processes are in kill mode.
0 processes are unconfined but have a profile defined.
0 processes are in mixed mode.

SELinux mode: unknown
no big block device was specified on commandline.
Tests which require a big block device are disabled.
You can specify it with option -z
COMMAND:    /opt/ltp/bin/ltp-pan   -e -S  -t 60m -a 2579     -n 2579 -p -f 
/tmp/ltp-ELgJBKb6OQ/alltests -l 
/opt/ltp/results/LTP_RUN_ON-2024_10_16-13h_50m_13s.log  -C 
/opt/ltp/output/LTP_RUN_ON-2024_10_16-13h_50m_13s.failed -T 
/opt/ltp/output/LTP_RUN_ON-2024_10_16-13h_50m_13s.tconf
INFO: Restricted to memcg_stress
LOG File: /opt/ltp/results/LTP_RUN_ON-2024_10_16-13h_50m_13s.log
FAILED COMMAND File: /opt/ltp/output/LTP_RUN_ON-2024_10_16-13h_50m_13s.failed
TCONF COMMAND File: /opt/ltp/output/LTP_RUN_ON-2024_10_16-13h_50m_13s.tconf
Running tests.......
PAN will run for 3600 seconds
<<<test_start>>>
tag=memcg_stress stime=1729086613
cmdline="memcg_stress_test.sh"
contacts=""
analysis=exit
<<<test_output>>>
memcg_stress_test 1 TINFO: Running: memcg_stress_test.sh
memcg_stress_test 1 TINFO: Tested kernel: Linux ubuntu 6.8.0-47-generic 
#47-Ubuntu SMP Fri Sep 27 21:38:55 UTC 2024 ppc64le ppc64le ppc64le GNU/Linux
memcg_stress_test 1 TINFO: trying to disable AppArmor (requires super/root)

/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
/lib/apparmor/apparmor.systemd: 273: printf: printf: I/O error
memcg_stress_test 1 TINFO: timeout per run is 23h 20m 0s
memcg_stress_test 1 TINFO: test starts with cgroup version 2
memcg_stress_test 1 TINFO: Calculated available memory 24361 MB
memcg_stress_test 1 TINFO: Testing 150 cgroups, using 162 MB, interval 5
memcg_stress_test 1 TINFO: Starting cgroups
memcg_stress_test 1 TINFO: Testing cgroups for 900s

2) start migration from MDC to non MDC system

root@ubuntu:/home# date;virsh migrate --live --domain ubuntu 
qemu+ssh://dest/system --verbose --undefinesource --persistent --auto-converge 
--postcopy ;date
Wed Oct 16 03:24:45 PM UTC 2024
error: internal error: unable to execute QEMU command 
'migrate-set-capabilities': Postcopy is not supported: Userfaultfd not 
available: Operation not permitted

Wed Oct 16 03:24:51 PM UTC 2024
root@ubuntu:/home# date;virsh migrate --live --domain ubuntu 
qemu+ssh://dest/system --verbose --undefinesource --persistent 
--auto-converge;date
Wed Oct 16 03:25:02 PM UTC 2024
Migration: [100.00 %]

3) After 2 hours i saw the non MDC system and it was up and running

4) I tried the migration back to MDC system that also worked fine

virsh console date;virsh migrate --live --domain ubuntu qemu+ssh://dest/system 
--verbose --undefinesource --persistent --auto-converge;date
Wed Oct 16 05:35:30 PM UTC 2024
Migration: [100.00 %]
Wed Oct 16 05:40:46 PM UTC 2024

I mistakenly updated the results of
https://bugzilla.linux.ibm.com/show_bug.cgi?id=208511 this bugzilla in
this bugzilla, sorry for the confusion

Conclusion: It is working fine!

Thanks,
Anushree Mathur

** Bug watch added: bugzilla.linux.ibm.com/ #208511
   https://bugzilla.linux.ibm.com/show_bug.cgi?id=208511

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2076866

Title:
  Guest crashes post migration with migrate_misplaced_folio+0x4cc/0x5d0

Status in The Ubuntu-power-systems project:
  Fix Committed
Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Noble:
  Fix Committed
Status in linux source package in Oracular:
  Fix Released

Bug description:
  SRU Justification:

  [ Impact ]

   * A KVM guest (VM) that got live migrated between two Power 10 systems
     (using nested virtualization, means KVM on top of PowerVM) will
     highly likely crash after about an hour.

   * At that point it looked like the live migration itself was already
     successful, but it wasn't, and the crash is caused due to it.

  [ Test Plan ]

   * Setting up two Power 10 systems (with firmware level FW1060 or newer,
     that supports nested KVM) with Ubuntu Server 24.04 for ppc64el.

   * Setup a qemu/KVM environment that allows to live migrate a KVM
     guest from one P10 system to the other.

   * (The disk type does not seem to matter, hence NFS based disk storage
      can be used for example).

   * After about an hour the live migrated guest is likely to crash.
     Hence wait for 2 hours (which increases the likeliness) and
     a crash due to:
     "migrate_misplaced_folio+0x540/0x5d0"
     occurs.

  [ Where problems could occur ]

   * The 'fix' to avoid calling folio_likely_mapped_shared for cases where
     folio might have already been unmapped and the move of the checks
     might have an impact on page table locks if done wrong,
     which may lead to wrong locks, blocked memory and finally crashes.

   * The direct folio calls in mm/huge_memory.c and mm/memory.c got now
     'in-directed', which may lead to a different behaviour and side-effects.
     However, isolation is still done, just slightly different and
     instead of using numamigrate_isolate_folio, now in (the renamed)
     migrate_misplaced_folio_prepare.

   * Further upstream conversations:
     https://lkml.kernel.org/r/8f85c31a-e603-4578-bf49-136dae0d4...@redhat.com
     https://lkml.kernel.org/r/20240626191129.658cfc32...@smtp.kernel.org
     https://lkml.kernel.org/r/20240620212935.656243-3-da...@redhat.com

   * Fixing a confusing return code, now to just return 0, on success is
     clarifying the return code handling and usage, and was mainly done in
     preparation of further changes,
     but can have bad side effects if the return code was used in other
     code places already as is.

   * Further upstream conversations:
     https://lkml.kernel.org/r/20240620212935.656243-1-da...@redhat.com
     https://lkml.kernel.org/r/20240620212935.656243-2-da...@redhat.com

   * Fixing the fact that NUMA balancing prohibits mTHP
     (multi-size Transparent Hugepage Support) seems to be unreasonable
     since its an exclusive mapping.
     Allowing this seems to bring significant performance improvements
     see commit message d2136d749d76), but introduced significant changes
     PTE mapping and modifications and even relies on further commits:
     859d4adc3415 ("mm: numa: do not trap faults on shared data section pages")
     80d47f5de5e3 ("mm: don't try to NUMA-migrate COW pages that have other 
uses")
     This case cause issues on systems configured for THP,
     may confuse the ordering, which may even lead to memory corruption.
     And this may especially hit (NUMA) systems with high core numbers,
     where balancing is more often needed.

   * Further upstream conversations:
     
https://lore.kernel.org/all/20231117100745.fnpijbk4xgmal...@techsingularity.net/
     
https://lkml.kernel.org/r/c33a5c0b0a0323b1f8ed53772f50501f4b196e25.1712132950.git.baolin.w...@linux.alibaba.com
     
https://lkml.kernel.org/r/d28d276d599c26df7f38c9de8446f60e22dd1950.1711683069.git.baolin.w...@linux.alibaba.com

   * The refactoring of the code for NUMA mapping rebuilding and moving
     it into a new helper, seems to be straight forward, since the active code
     stays unchanged, however the new function needs to be callable, but this
     is the case since its all in mm/memory.c.

   * Further upstream conversations:
     
https://lkml.kernel.org/r/cover.1712132950.git.baolin.w...@linux.alibaba.com
     
https://lkml.kernel.org/r/cover.1711683069.git.baolin.w...@linux.alibaba.com
     
https://lkml.kernel.org/r/8bc2586bdd8dbbe6d83c09b77b360ec8fcac3736.1711683069.git.baolin.w...@linux.alibaba.com

   * The refactoring of folio_estimated_sharers to folio_likely_mapped_shared
     is more significant, since the logic changed from
     (folio_estimated_sharers) 'estimate the number of sharers of a folio' to
     (folio_likely_mapped_shared) 'estimate if the folio is mapped into the page
     tables of more than one MM'.

   * Since this is an estimation, the results may be unpredictable
     (especially for bigger folios), and not like expected or assumed
     (there are quite some side-notes in the code comments of bb34f78d72c2,
     that mention potential fuzzy results), hence this
     may lead to unforeseen behavior.

   * The condition statements became clearer since it's now based on
     (more or less obvious) number counts, but can still be erroneous in
     case folio_estimated_sharers does incorrect calculations.

   * Further upstream conversations:
     https://lkml.kernel.org/r/dd0ad9f2-2d7a-45f3-9ba3-979488c7d...@redhat.com
     https://lkml.kernel.org/r/20240227201548.857831-1-da...@redhat.com

   * Commit 133d04b1eee9 extends commit bda420b98505 "numa balancing: migrate
     on fault among multiple bound nodes" from allowing NUMA fault migrations
     when the executing node is part of the policy mask for MPOL_BIND,
     to also support MPOL_PREFERRED_MANY policy.
     Both cases (MPOL_BIND and MPOL_PREFERRED_MANY) are treated in the same way.
     In case the NUMA topology is not correctly considered, changes here
     may lead to decreased memory performance.
     However, the code changes themselves are relatively traceable.

   * Further upstream conversations:
     
https://lkml.kernel.org/r/158acc57319129aa46d50fd64c9330f3e7c7b4bf.1711373653.git.donet...@linux.ibm.com
     
https://lkml.kernel.org/r/369d6a58758396335fd1176d97bbca4e7730d75a.1709909210.git.donet...@linux.ibm.com

   * Finally commit f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead
     of cpu_to_node()") is a patchset to further optimize the cross-socket
     memory access with MPOL_PREFERRED_MANY policy.
     The mpol_misplaced changes are mainly moving from cpu_to_node to
     numa_node_id, and with that make the code more NUMA aware.
     Based on that vm_fault/vmf needs to be considered instead of
     vm_area_struct/vma.
     This may have consequences on the memory policy itself.

   * Further upstream conversations:
     https://lkml.kernel.org/r/cover.1711373653.git.donet...@linux.ibm.com
     
https://lkml.kernel.org/r/6059f034f436734b472d066db69676fb3a459864.1711373653.git.donet...@linux.ibm.com
     https://lkml.kernel.org/r/cover.1709909210.git.donet...@linux.ibm.com
     
https://lkml.kernel.org/r/744646531af02cc687cde8ae788fb1779e99d02c.1709909210.git.donet...@linux.ibm.com

   * The overall patch set touches quite a bit of common code,
     but the modifications were intensely discussed with many experts
     in the various mailing-list threads that are referenced above.

  [ Other Info ]

   * The first two "mm/migrate" commits are the newest and were
     upstream accepted with kernel v6.11(-rc1),
     all other are already upstream since v6.10(-rc1).

   * Hence oracular (with a planned target kernel of 6.11) is not affect,
     and the SRU is for noble only.

   * And since (nested) KVM virtualization on ppc64el was (re-)introduced
     just with noble, no older Ubuntu releases older than noble are affected.

  __________

  == Comment: #0 - SEETEENA THOUFEEK <sthou...@in.ibm.com> - 2024-08-12 
23:50:17 ==
  +++ This bug was initially created as a clone of Bug #207985 +++

  ---Problem Description---
  Post Migration Non-MDC L1 eralp1 crashed with 
migrate_misplaced_folio+0x4cc/0x5d0 (

  Machine Type = na

  Contact Information = sthou...@in.ibm.com

  ---Steps to Reproduce---
   Problem description  :
  After 1 hour of successful migration from doodlp1 [MDC MODE] to eralp1[NON 
MDC mode],eralp1 guest
  and dump is collected

  ---uname output---
  na

  ---Debugger---
  A debugger is not configured

  [281827.975244] NIP [c0000000005f0620] migrate_misplaced_folio+0x4f0/0x5d0
  [281827.975251] LR [c0000000005f067c] migrate_misplaced_folio+0x54c/0x5d0
  [281827.975258] Call Trace:
  [281827.975260] [c000001e19ff7140] [c0000000005f0670] 
migrate_misplaced_folio+0x540/0x5d0 (unreliable)
  [281827.975268] [c000001e19ff71d0] [c00000000054c9f0] 
__handle_mm_fault+0xf70/0x28e0
  [281827.975276] [c000001e19ff7310] [c00000000054e478] 
handle_mm_fault+0x118/0x400
  [281827.975284] [c000001e19ff7360] [c00000000053598c] 
__get_user_pages+0x1ec/0x5b0
  [281827.975291] [c000001e19ff7420] [c000000000536920] 
get_user_pages_unlocked+0x120/0x4f0
  [281827.975298] [c000001e19ff74c0] [c00800001894ea9c] hva_to_pfn+0xf4/0x630 
[kvm]
  [281827.975316] [c000001e19ff7550] [c008000018b4efc4] 
kvmppc_book3s_instantiate_page+0xec/0x790 [kvm_hv]
  [281827.975326] [c000001e19ff7660] [c008000018b4f750] 
kvmppc_book3s_radix_page_fault+0xe8/0x380 [kvm_hv]
  [281827.975335] [c000001e19ff7700] [c008000018b488fc] 
kvmppc_book3s_hv_page_fault+0x294/0xd60 [kvm_hv]
  [281827.975344] [c000001e19ff77e0] [c008000018b43f5c] 
kvmppc_vcpu_run_hv+0xf94/0x11d0 [kvm_hv]
  [281827.975352] [c000001e19ff78a0] [c00800001896131c] 
kvmppc_vcpu_run+0x34/0x48 [kvm]
  [281827.975365] [c000001e19ff78c0] [c00800001895c164] 
kvm_arch_vcpu_ioctl_run+0x39c/0x570 [kvm]
  [281827.975379] [c000001e19ff7950] [c00800001894a104] 
kvm_vcpu_ioctl+0x20c/0x9a8 [kvm]
  [281827.975391] [c000001e19ff7b30] [c000000000683974] sys_ioctl+0x574/0x16a0
  [281827.975395] [c000001e19ff7c30] [c000000000030838] 
system_call_exception+0x168/0x310
  [281827.975400] [c000001e19ff7e50] [c00000000000d05c] 
system_call_vectored_common+0x15c/0x2ec
  [281827.975406] --- interrupt: 3000 at 0x7fffb7d4d2bc

  Mirroring to distro as per message in group channel

  Please pick these patches for this bug:

  ee86814b0562 ("mm/migrate: move NUMA hinting fault folio isolation + checks 
under PTL")
  4b88c23ab8c9 ("mm/migrate: make migrate_misplaced_folio() return 0 on 
success")
  d2136d749d76 ("mm: support multi-size THP numa balancing")
  6b0ed7b3c775 ("mm: factor out the numa mapping rebuilding into a new helper")
  ebb34f78d72c ("mm: convert folio_estimated_sharers() to 
folio_likely_mapped_shared()")
  133d04b1eee9 ("mm/numa_balancing: allow migrate on protnone reference with 
MPOL_PREFERRED_MANY policy")
  f8fd525ba3a2 ("mm/mempolicy: use numa_node_id() instead of cpu_to_node()")

  Thanks,
  Amit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-power-systems/+bug/2076866/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 2076866] Re: Guest crashes post migration with migrate_misplaced_folio+0x4cc/0x5d0

Reply via email to