Responses inline ... Thanks, Taylor
-----Original Message----- From: Richard Henderson <richard.hender...@linaro.org> Sent: Wednesday, November 20, 2019 2:07 AM To: Taylor Simpson <tsimp...@quicinc.com>; laur...@vivier.eu; riku.voi...@iki.fi; qemu-devel@nongnu.org Subject: Re: [PATCH] Add minimal Hexagon target - First in a series of patches - linux-user changes + linux-user/hexagon + skeleton of target/hexagon - Files in target/hexagon/imported are from another project and therefore do not conform to qemu coding standards ------------------------------------------------------------------------- CAUTION: This email originated from outside of the organization. ------------------------------------------------------------------------- On 11/20/19 6:15 AM, Taylor Simpson wrote: >> +const char * const hexagon_prednames[] = { >> + "p0 ", "p1 ", "p2 ", "p3 " >> +}; > > Inter-string spacing probably belongs to the format not the name. > [Taylor] Could you elaborate? Should I put spacing after the commas? "p0" not "p0 ". Typo on my part for "inter-"; sorry for the confusion. >> +static inline target_ulong hack_stack_ptrs(CPUHexagonState *env, >> + target_ulong addr) { >> + target_ulong stack_start = env->stack_start; >> + target_ulong stack_size = 0x10000; >> + target_ulong stack_adjust = env->stack_adjust; >> + >> + if (stack_start + 0x1000 >= addr && addr >= (stack_start - stack_size)) >> { >> + return addr - stack_adjust; >> + } >> + return addr; >> +} > > An explanation would be welcome here. > [Taylor] I will add a comment. One of the main debugging techniques is to > use "-d cpu" and compare against LLDB output when single stepping. However, > the target and qemu put the stacks in different locations. This is used to > compensate so the diff is cleaner. Ug. I understand why you want this, but... ug. [Taylor] Not sure what to say - I guess there's a fine line between elegance and mayhem. >> +static void hexagon_dump(CPUHexagonState *env, FILE *f) { >> + static target_ulong last_pc; >> + int i; >> + >> + /* >> + * When comparing with LLDB, it doesn't step through single-cycle >> + * hardware loops the same way. So, we just skip them here >> + */ >> + if (env->gpr[HEX_REG_PC] == last_pc) { >> + return; >> + } > > This seems mis-placed. > [Taylor] Hexagon has hardware controlled loops, so we can have a single > packet that executes multiple times. We don't want the dump to print every > time. Certainly I do. If I'm single-stepping, I want every packet present. Just as if this were written as a traditional loop, with a branch back to the single packet in the loop body. Also, you cannot expect a static variable to work in multi-threaded mode, and you cannot expect a __thread variable to work in single-threaded round-robin mode. [Taylor] This is another place where it's important to match the output of LLDB for a clean diff. In fact, LLDB uses hardware single step, and that will step to the end of a single-cycle hardware loop. Also, the technique of comparing with LLDB only works for single threaded examples, so I'm not worried about the thread-safety of this code. [Taylor] To clarify, this is for a rare case when a hardware loop body is a single packet. Here's an example. It's the portion of memset that gets called when the number of bytes is small 400404: 10 40 02 60 60024010 { loop0(0x40040c,r2) 400408: 08 c0 02 10 1002c008 p0 = cmp.eq(r2,#0); if (p0.new) jump:nt 0x400414 <memset+0x34> } 40040c: 01 81 03 a1 a1038101 { memb(r3+#1) = r1 400410: 10 c1 03 ab ab03c110 memb(r3++#2) = r1 } :endloop0 The loop0 instruction sets the start address and the iteration count. The :endloop0 marks the end of the loop and tells the hardware to decrement the counter and go back to the start if it's not zero. So, the loop at 0x40040c-0x400410 is a single packet. In this case the hardware single step will execute the entire loop. Finally, if the loop has more than one packet, the single stepping will go through each iteration as you describe. >> +static void decode_packet(CPUHexagonState *env, DisasContext *ctx) { >> + size4u_t words[4]; >> + int i; >> + /* Brute force way to make sure current PC is set */ >> + tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next); > > What's this for? > [Taylor] Honestly, I'm not sure. If I remove it, nothing works - not even > "hello world". Weird. I would have expected that the update you do within hexagon_tr_tb_stop would be sufficient. I guess I'll have to peek at it. I presume your minimal test case is a bit of hand-crafted assembly that issues a write syscall and exit? [Taylor] The minimal test case is #define SYS_write 64 #define SYS_exit_group 94 #define SYS_exit 93 #define FD_STDOUT 1 .typestr,@object .section.rodata str: .string"Hello!\n" .sizestr, 8 .text .global _start _start: r6 = #SYS_write r0 = #FD_STDOUT r1 = ##str r2 = #7 trap0(#1) r6 = #SYS_exit_group trap0(#1) r6 = #SYS_exit r0 = #0 trap0(#1) Without that code, it prints "Hello!" twice. With the full set of patches, I get segfaults ☹ >> + >> + for (i = 0; i < 4; i++) { >> + words[i] = cpu_ldl_code(env, ctx->base.pc_next + i * >> sizeof(size4u_t)); >> + } > > And this? > [Taylor] It's reading from the instruction memory. The switch statement > below uses it. I thought packets are variable length and ended by a bit set. Therefore why the fixed iteration to 4? Also... [Taylor] The maximum size of a packet is 4 words, so I go ahead and read that much. Once the packet is decoded, I set ctx->base.pc_next appropriately. Note that most of the instructions in the switch add 4, but one of them adds 8. > >> + /* >> + * Brute force just enough to get the first program to execute. >> + */ >> + switch (words[0]) { ... you only use 1 word, but you read 4. >> +static void hexagon_tr_init_disas_context(DisasContextBase *dcbase, >> + CPUState *cs) { >> + DisasContext *ctx = container_of(dcbase, DisasContext, base); >> + >> + ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK; > > Since you're not initializing this in cpu_get_tb_cpu_state, you might > as well just use > > ctx->mem_idx = MMU_USER_IDX; > [Taylor] Should I be initialize this in cpu_get_bt_cpu_state? Not until there's something more interesting to put there, when you implement system state. The constant initialization should be fine. r~