Re: gdbstub and gbd segfaults on different instructions in user spaceemulation

Libo Zhou Tue, 08 Oct 2019 02:51:42 -0700

Is there any follow-up guys? Help would be appreciated.

------------------ Original ------------------
From:  "Libo Zhou";<zhl...@foxmail.com>;
Date:  Oct 6, 2019
To:  "Peter Maydell"<peter.mayd...@linaro.org>; 
Cc:  "qemu-devel"<qemu-devel@nongnu.org>; 
Subject:  Re:  gdbstub and gbd segfaults on different instructions in user 
spaceemulation

Hi Peter,

I have finally got the chance to reply. Thanks for your explanation, I have 
learned the important concept of JIT.

I have been playing with the logging options in -d, but I found something weird 
that makes it tricky for me to figure out the cause of the segfault. As you 
mentioned, I need to know if the segfault is caused by guest program having bad 
mem access, or QEMU itself crashing.

To recap, I just changed the [31:26] opcode field of LW and SW instructions in 
translate.c. And I used this following line to diagnose:

$ ./qemu-mipsel -cpu maotu -d in_asm,nochain -D debug.log -singlestep test

And below is my weird in_asm log. The log looks very weird, the instructions 
are just not the ones I saw in my objdump. The dmult.g instruction, as you 
pointed out before, is a Loongson instruction. I have also noticed that, the 
in_asm should have given me a longer log, with some other parts besides main. 
This log only has main in it.

Do you have any idea what else I can try? This segfault has bugged me for 2 
weeks, but I still believe there is a solution, even if the logs are tricky to 
interpret. I just don't know how changing 6 bits of opcode field could lead to 
so many issues.

----------------
IN: main
0x00400090:  bovc sp,sp,0x400014

----------------
IN: main
0x00400094:  dmult.g zero,sp,s8

----------------
IN: main
0x00400098:  nop

----------------
IN: main
0x0040009c:  nop

----------------
IN: main
0x004000a0:  move s8,sp

----------------
IN: main
0x004000a4:  beqzalc zero,v0,0x4000ac

----------------
IN: main
0x004000a8:  addu.qb zero,s8,v0

----------------
IN: main
0x004000ac:  nop

----------------
IN: main
0x004000b0:  nop

----------------
IN: main
0x004000b4:  beqzalc zero,v0,0x4000c0

----------------
IN: main
0x004000b8:  insv v0,s8

----------------
IN: main
0x004000bc:  nop

----------------
IN: main
0x004000c0:  nop

----------------
IN: main
0x004000c4:  bltc s8,v1,0x400108

----------------
IN: main
0x004000c8:  nop

----------------
IN: main
0x004000cc:  bltc s8,v0,0x400100

----------------
IN: main
0x004000d0:  nop

----------------
IN: main
0x004000d4:  nop

----------------
IN: main
0x004000d8:  add v0,v1,v0

----------------
IN: main
0x004000dc:  fork zero,s8,v0

----------------
IN: main
0x004000e0:  nop

----------------
IN: main
0x004000e4:  nop

----------------
IN: main
0x004000e8:  move v0,zero

----------------
IN: main
0x004000ec:  move sp,s8

----------------
IN: main
0x004000f0:  bltc sp,s8,0x400164

----------------
IN: main
0x004000f4:  bovc sp,sp,0x400178

----------------
IN: main
0x004000f8:  jr ra

----------------
IN: main
0x004000fc:  nop

------------------ Original ------------------
From:  "Peter Maydell";<peter.mayd...@linaro.org>;
Send time: Tuesday, Oct 1, 2019 0:23 AM
To: "Libo Zhou"<zhl...@foxmail.com>; 
Cc: "qemu-devel"<qemu-devel@nongnu.org>; 
Subject:  Re: gdbstub and gbd segfaults on different instructions in user 
spaceemulation

On Mon, 30 Sep 2019 at 16:57, Libo Zhou <zhl...@foxmail.com> wrote:
> I am encountering segmentation fault while porting my custom ISA to QEMU. My 
> custom ISA is VERY VERY simple, it only changes the [31:26] opcode field of 
> LW and SW instructions. The link has my very simple implementation: 
> https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg06976.html

> I have tried 2 ways of debugging it.
> Firstly, I connected gdb-multiarch to gdbstub, and I single-stepped the 
> instructions in my ELF. Immediately after the LW instruction, the segfault 
> was thrown. I observed the memory location using 'x' command and found that 
> at least my SW instruction was implemented correctly.
> Secondly, I used gdb to directly debug QEMU. I set the breakpoint at function 
> in translate.c:decode_opc. Pressing 'c' should have the same effect as 
> single-stepping instruction in gdbstub. However, the segmentation fault 
> wasn't thrown after LW. It was instead thrown after the 'nop' after 'jr r31' 
> in the objdump.

(1) If you're debugging the QEMU JIT itself, then you're probably
better off using QEMU's logging facilities (under the -d option)
rather than the gdbstub. The gdbstub is good if you're sure that
QEMU is basically functional and want to debug your guest, but
if you suspect bugs in QEMU itself then it can confuse you.
The -d debug logging is at a much lower level, which makes it
a better guide to what QEMU is really doing, though it is also
trickier to interpret.

(2) No, breakpointing on decode_opc is not the same as singlestepping
an instruction in gdb. This is a really important concept in QEMU
(and JITs in general) and if you don't understand it you're going
to be very confused. A JIT has two phases:
(a) "translate time", when we take a block of guest instructions
and generate host machine code for them
(b) "execution time", when we execute one or more of the blocks
of host machine code that we wrote at translate time
QEMU calls the blocks it works with "translation blocks", and
usually it will put multiple guest instructions into each TB;
a TB usually stops after a guest branch instructions. (You can
ask QEMU to put just one guest instruction into a TB using
the -singlestep command line option -- this is sometimes useful
when debugging.)

So if you put a breakpoint on decode_opc you'll see it is hit
for every instruction in the TB, which for the TB starting at
"00400090 <main>" will be every instruction up to and including
the 'nop' in the delay slot of the 'jr'. Once the whole TB is
translated, *then* we will execute it. It's only at execute time
that we perform the actual operations on the guest CPU that
the instructions require. If the segfault is because we think
the guest has made a bad memory access, we'll generate it here.
If the segfault is an actual crash in QEMU itself, it will
happen here if the bug is one that happens at execution time.

Note that the -d logging will distinguish between things that
happen at translate time (which is when the in_asm, op, out_asm etc
logging is printed) and things that happen at execution time
(which is when cpu, exec, int, etc logs are printed).

thanks
-- PMM

Re: gdbstub and gbd segfaults on different instructions in user spaceemulation

Reply via email to