On 19.01.2023 07:40, Hyeonggon Yoo wrote:
On Wed, Jan 18, 2023 at 12:39:16PM +0300, Pavel Dovgalyuk wrote:
Sometimes replay (or reverse debugging) have problems due to incomplete or
incorrect virtual device save/load implementation.
Can you try removing -cpu from your command line?
Or you can provide the files you load and I'll debug this case.
Ah, sorry to bother. I installed breakpoint _after_ kernel panic,
and installing breakpoint before boot worked fine. Every seems great!
Glad to hear that.
Just a side question, is there a reason QEMU record/replay
does not support -smp N (> 1)? is this feature planed, or should I use
other tools to debug SMP bugs?
Parallel SMP deterministic emulation is very hard.
However, I think multiple-cores-on-single-core deterministic emulation
will be supported someday.
On 18.01.2023 11:47, Hyeonggon Yoo wrote:
On Wed, Jan 18, 2023 at 10:12:48AM +0300, Pavel Dovgalyuk wrote:
As replay works well, the reverse debugging should be ok too.
But for "going back" it needs a VM snapshot that can be used for reload.
Snapshots are saved on qcow2 images connected to QEMU.
Therefore you need to add an empty qcow2 to your command line with the
following option: -drive file=empty.qcow2,if=none,id=rr
Oh, I guessed it's possible to reverse-debug without snapshot,
and your comments definitely helped! adding empty disk and snapshotting solved
it.
But I faced another problem:
(gdb) b __list_del_entry_valid
(gdb) reverse-continue
(it stuck forever)
^C
(gdb) info registers
eax 0xefe19f74 -270426252
ecx 0x0 0
edx 0xefe19f74 -270426252
ebx 0xf6ff4620 -151042528
esp 0xc02e9a34 0xc02e9a34
ebp 0xc02e9a6c 0xc02e9a6c
esi 0xc4fffb20 -989856992
edi 0xefe19f70 -270426256
eip 0xc1f38400 0xc1f38400 <__list_del_entry_valid>
eflags 0x6 [ IOPL=0 PF ]
cs 0x60 96
ss 0x68 104
ds 0x7b 123
es 0x7b 123
fs 0xd8 216
gs 0x0 0
fs_base 0x31cb4000 835403776
gs_base 0x0 0
k_gs_base 0x0 0
cr0 0x80050033 [ PG AM WP NE ET MP PE ]
cr2 0xffcb1000 -3469312
cr3 0x534e000 [ PDBR=0 PCID=0 ]
cr4 0x406d0 [ PSE MCE PGE OSFXSR OSXMMEXCPT OSXSAVE ]
cr8 0x1 1
efer 0x0 [ ]
it stuck here and it's not 'last breakpoint hit' from the panic
(it's early in boot), and stepi, nexti, continue commands do not work and
there's no forward progress. (eip doesn't change)
Did I miss something or did something wrong?
thank you so much with your help.
--
Best regards,
Hyeonggon
And you also need to add rrsnapshot to icount for creating the snapshot at
the start of VM execution:
-icount shift=auto,rr=record,rrfile=$REPLAY_FILE,rrsnapshot=start
On 18.01.2023 09:14, Hyeonggon Yoo wrote:
Hello QEMU folks.
I was struggling to fix a recent heisenbug in the Linux kernel,
and fortunately the bug was reproducible with TCG and -smp 1.
I'm using qemu version 7.2.0, and guest architecture is i386.
I tried to inspect the bug using record/replay and reverse-debugging
feature in the QEMU.
recorded with:
qemu-system-i386 \
-icount shift=auto,rr=record,rrfile=$REPLAY_FILE \
-kernel arch/x86/boot/bzImage \
-cpu SandyBridge \
-initrd debian-i386.cgz \
-smp 1 \
-m 1024 \
-nographic \
-net none \
-append "page_owner=on console=ttyS0"
and replayed with:
qemu-system-i386 \
-icount shift=auto,rr=replay,rrfile=$REPLAY_FILE \
-kernel arch/x86/boot/bzImage \
-cpu SandyBridge \
-initrd debian-i386.cgz \
-smp 1 \
-m 1024 \
-nographic \
-net none \
-s \
-append "page_owner=on console=ttyS0"
(I'm using a initrd image instead of a disk file.)
The record and replay works well. The bug is reliably reproduced
when relaying. but when I try to reverse-continue or reverse-stepi after
kernel panic, the gdb only says:
"remote failure reply 'E14'"
Is there something I'm missing, or record/replay do not work with
QEMU v7.2.0 or i386?
--
Best regards,
Hyeonggon