Thank you Zhijian for your feedback.
So I'll try to push this change today.
Cheers,
William.
On 9/20/23 12:04, Zhijian Li (Fujitsu) wrote:
On 15/09/2023 19:31, William Roche wrote:
On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote:
I'm okay with "RDMA isn't touched".
BTW, could you share your reproducing program/hacking to poison the page, so
that
i am able to take a look the RDMA part later when i'm free.
Not sure it's suitable to acknowledge a not touched part. Anyway
Acked-by: Li Zhijian <lizhij...@fujitsu.com> # RDMA
Thanks.
As you asked for a procedure to inject memory errors into a running VM,
I've attached to this email the source code (mce_process_react.c) of a
program that will help to target the error injection in the VM.
I just tried you hwpoison program and do RDMA migration. Migration failed, but
fortunately
the source side is still alive :).
(qemu) Failed to register chunk!: Bad address
Chunk details: block: 0 chunk index 671 start 139955096518656 end
139955097567232 host 139955096518656 local 139954392924160 registrations: 636
qemu-system-x86_64: cannot get lkey
qemu-system-x86_64: rdma migration: write error! -22
qemu-system-x86_64: RDMA is in an error state waiting migration to abort!
qemu-system-x86_64: failed to save SaveStateEntry with id(name): 2(ram): -22
qemu-system-x86_64: Early error. Sending error.
Since current RDMA migration transfers guest memory in a chunk size(1M) by
default, we may need to
option 1: reduce all chunk size to 1 page
option 2: handle the hwpoison chunk specially
However, because there may be a chance to use another protocol, it's also
possible to temporarily not fix the issue.
Tested-by: Li Zhijian <lizhij...@fujitsu.com>
Thanks
Zhijian