Is there any follow-up guys? Help would be appreciated.
------------------ Original ------------------ From: "Libo Zhou";<zhl...@foxmail.com>; Date: Oct 6, 2019 To: "Peter Maydell"<peter.mayd...@linaro.org>; Cc: "qemu-devel"<qemu-devel@nongnu.org>; Subject: Re: gdbstub and gbd segfaults on different instructions in user spaceemulation Hi Peter, I have finally got the chance to reply. Thanks for your explanation, I have learned the important concept of JIT. I have been playing with the logging options in -d, but I found something weird that makes it tricky for me to figure out the cause of the segfault. As you mentioned, I need to know if the segfault is caused by guest program having bad mem access, or QEMU itself crashing. To recap, I just changed the [31:26] opcode field of LW and SW instructions in translate.c. And I used this following line to diagnose: $ ./qemu-mipsel -cpu maotu -d in_asm,nochain -D debug.log -singlestep test And below is my weird in_asm log. The log looks very weird, the instructions are just not the ones I saw in my objdump. The dmult.g instruction, as you pointed out before, is a Loongson instruction. I have also noticed that, the in_asm should have given me a longer log, with some other parts besides main. This log only has main in it. Do you have any idea what else I can try? This segfault has bugged me for 2 weeks, but I still believe there is a solution, even if the logs are tricky to interpret. I just don't know how changing 6 bits of opcode field could lead to so many issues. ---------------- IN: main 0x00400090: bovc sp,sp,0x400014 ---------------- IN: main 0x00400094: dmult.g zero,sp,s8 ---------------- IN: main 0x00400098: nop ---------------- IN: main 0x0040009c: nop ---------------- IN: main 0x004000a0: move s8,sp ---------------- IN: main 0x004000a4: beqzalc zero,v0,0x4000ac ---------------- IN: main 0x004000a8: addu.qb zero,s8,v0 ---------------- IN: main 0x004000ac: nop ---------------- IN: main 0x004000b0: nop ---------------- IN: main 0x004000b4: beqzalc zero,v0,0x4000c0 ---------------- IN: main 0x004000b8: insv v0,s8 ---------------- IN: main 0x004000bc: nop ---------------- IN: main 0x004000c0: nop ---------------- IN: main 0x004000c4: bltc s8,v1,0x400108 ---------------- IN: main 0x004000c8: nop ---------------- IN: main 0x004000cc: bltc s8,v0,0x400100 ---------------- IN: main 0x004000d0: nop ---------------- IN: main 0x004000d4: nop ---------------- IN: main 0x004000d8: add v0,v1,v0 ---------------- IN: main 0x004000dc: fork zero,s8,v0 ---------------- IN: main 0x004000e0: nop ---------------- IN: main 0x004000e4: nop ---------------- IN: main 0x004000e8: move v0,zero ---------------- IN: main 0x004000ec: move sp,s8 ---------------- IN: main 0x004000f0: bltc sp,s8,0x400164 ---------------- IN: main 0x004000f4: bovc sp,sp,0x400178 ---------------- IN: main 0x004000f8: jr ra ---------------- IN: main 0x004000fc: nop ------------------ Original ------------------ From: "Peter Maydell";<peter.mayd...@linaro.org>; Send time: Tuesday, Oct 1, 2019 0:23 AM To: "Libo Zhou"<zhl...@foxmail.com>; Cc: "qemu-devel"<qemu-devel@nongnu.org>; Subject: Re: gdbstub and gbd segfaults on different instructions in user spaceemulation On Mon, 30 Sep 2019 at 16:57, Libo Zhou <zhl...@foxmail.com> wrote: > I am encountering segmentation fault while porting my custom ISA to QEMU. My > custom ISA is VERY VERY simple, it only changes the [31:26] opcode field of > LW and SW instructions. The link has my very simple implementation: > https://lists.gnu.org/archive/html/qemu-devel/2019-09/msg06976.html > I have tried 2 ways of debugging it. > Firstly, I connected gdb-multiarch to gdbstub, and I single-stepped the > instructions in my ELF. Immediately after the LW instruction, the segfault > was thrown. I observed the memory location using 'x' command and found that > at least my SW instruction was implemented correctly. > Secondly, I used gdb to directly debug QEMU. I set the breakpoint at function > in translate.c:decode_opc. Pressing 'c' should have the same effect as > single-stepping instruction in gdbstub. However, the segmentation fault > wasn't thrown after LW. It was instead thrown after the 'nop' after 'jr r31' > in the objdump. (1) If you're debugging the QEMU JIT itself, then you're probably better off using QEMU's logging facilities (under the -d option) rather than the gdbstub. The gdbstub is good if you're sure that QEMU is basically functional and want to debug your guest, but if you suspect bugs in QEMU itself then it can confuse you. The -d debug logging is at a much lower level, which makes it a better guide to what QEMU is really doing, though it is also trickier to interpret. (2) No, breakpointing on decode_opc is not the same as singlestepping an instruction in gdb. This is a really important concept in QEMU (and JITs in general) and if you don't understand it you're going to be very confused. A JIT has two phases: (a) "translate time", when we take a block of guest instructions and generate host machine code for them (b) "execution time", when we execute one or more of the blocks of host machine code that we wrote at translate time QEMU calls the blocks it works with "translation blocks", and usually it will put multiple guest instructions into each TB; a TB usually stops after a guest branch instructions. (You can ask QEMU to put just one guest instruction into a TB using the -singlestep command line option -- this is sometimes useful when debugging.) So if you put a breakpoint on decode_opc you'll see it is hit for every instruction in the TB, which for the TB starting at "00400090 <main>" will be every instruction up to and including the 'nop' in the delay slot of the 'jr'. Once the whole TB is translated, *then* we will execute it. It's only at execute time that we perform the actual operations on the guest CPU that the instructions require. If the segfault is because we think the guest has made a bad memory access, we'll generate it here. If the segfault is an actual crash in QEMU itself, it will happen here if the bug is one that happens at execution time. Note that the -d logging will distinguish between things that happen at translate time (which is when the in_asm, op, out_asm etc logging is printed) and things that happen at execution time (which is when cpu, exec, int, etc logs are printed). thanks -- PMM