Responses inline ...

Thanks,
Taylor


-----Original Message-----
From: Richard Henderson <richard.hender...@linaro.org>
Sent: Wednesday, November 20, 2019 2:07 AM
To: Taylor Simpson <tsimp...@quicinc.com>; laur...@vivier.eu; 
riku.voi...@iki.fi; qemu-devel@nongnu.org
Subject: Re: [PATCH] Add minimal Hexagon target - First in a series of patches 
- linux-user changes + linux-user/hexagon + skeleton of target/hexagon - Files 
in target/hexagon/imported are from another project and therefore do not 
conform to qemu coding standards

-------------------------------------------------------------------------
CAUTION: This email originated from outside of the organization.
-------------------------------------------------------------------------

On 11/20/19 6:15 AM, Taylor Simpson wrote:
>> +const char * const hexagon_prednames[] = {
>> +  "p0 ", "p1 ", "p2 ", "p3 "
>> +};
>
> Inter-string spacing probably belongs to the format not the name.
> [Taylor] Could you elaborate?  Should I put spacing  after the commas?

"p0" not "p0 ".  Typo on my part for "inter-"; sorry for the confusion.

>> +static inline target_ulong hack_stack_ptrs(CPUHexagonState *env,
>> +                                           target_ulong addr) {
>> +    target_ulong stack_start = env->stack_start;
>> +    target_ulong stack_size = 0x10000;
>> +    target_ulong stack_adjust = env->stack_adjust;
>> +
>> +    if (stack_start + 0x1000 >= addr && addr >= (stack_start - stack_size)) 
>> {
>> +        return addr - stack_adjust;
>> +    }
>> +    return addr;
>> +}
>
> An explanation would be welcome here.
> [Taylor]  I will add a comment.  One of the main debugging techniques is to 
> use "-d cpu" and compare against LLDB output when single stepping.  However, 
> the target and qemu put the stacks in different locations.  This is used to 
> compensate so the diff is cleaner.

Ug.  I understand why you want this, but... ug.

[Taylor] Not sure what to say - I guess there's a fine line between elegance 
and mayhem.


>> +static void hexagon_dump(CPUHexagonState *env, FILE *f) {
>> +    static target_ulong last_pc;
>> +    int i;
>> +
>> +    /*
>> +     * When comparing with LLDB, it doesn't step through single-cycle
>> +     * hardware loops the same way.  So, we just skip them here
>> +     */
>> +    if (env->gpr[HEX_REG_PC] == last_pc) {
>> +        return;
>> +    }
>
> This seems mis-placed.
> [Taylor] Hexagon has hardware controlled loops, so we can have a single 
> packet that executes multiple times.  We don't want the dump to print every 
> time.

Certainly I do.  If I'm single-stepping, I want every packet present.  Just as 
if this were written as a traditional loop, with a branch back to the single 
packet in the loop body.

Also, you cannot expect a static variable to work in multi-threaded mode, and 
you cannot expect a __thread variable to work in single-threaded round-robin 
mode.

[Taylor] This is another place where it's important to match the output of LLDB 
for a clean diff.  In fact, LLDB uses hardware single step, and that will step 
to the end of a single-cycle hardware loop.  Also, the technique of comparing 
with LLDB only works for single threaded examples, so I'm not worried about the 
thread-safety of this code.

[Taylor] To clarify, this is for a rare case when a hardware loop body is a 
single packet.  Here's an example.  It's the portion of memset that gets called 
when the number of bytes is small
  400404:       10 40 02 60     60024010 {      loop0(0x40040c,r2)
  400408:       08 c0 02 10     1002c008        p0 = cmp.eq(r2,#0); if (p0.new) 
jump:nt 0x400414 <memset+0x34> }
  40040c:       01 81 03 a1     a1038101 {      memb(r3+#1) = r1
  400410:       10 c1 03 ab     ab03c110        memb(r3++#2) = r1 }  :endloop0
The loop0 instruction sets the start address and the iteration count.  The 
:endloop0 marks the end of the loop and tells the hardware to decrement the 
counter and go back to the start if it's not zero.  So, the loop at 
0x40040c-0x400410 is a single packet.  In this case the hardware single step 
will execute the entire loop.  Finally, if the loop has more than one packet, 
the single stepping will go through each iteration as you describe.

>> +static void decode_packet(CPUHexagonState *env, DisasContext *ctx) {
>> +    size4u_t words[4];
>> +    int i;
>> +    /* Brute force way to make sure current PC is set */
>> +    tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], ctx->base.pc_next);
>
> What's this for?
> [Taylor] Honestly, I'm not sure.  If I remove it, nothing works - not even 
> "hello world".

Weird.  I would have expected that the update you do within hexagon_tr_tb_stop 
would be sufficient.  I guess I'll have to peek at it.

I presume your minimal test case is a bit of hand-crafted assembly that issues 
a write syscall and exit?

[Taylor] The minimal test case is
#define SYS_write 64
#define SYS_exit_group           94
#define SYS_exit                 93

#define FD_STDOUT                1

.typestr,@object
.section.rodata
str:
.string"Hello!\n"
.sizestr, 8

.text
.global _start
_start:
r6 = #SYS_write
r0 = #FD_STDOUT
r1 = ##str
r2 = #7
trap0(#1)

r6 = #SYS_exit_group
trap0(#1)

r6 = #SYS_exit
r0 = #0
trap0(#1)
Without that code, it prints "Hello!" twice.  With the full set of patches, I 
get segfaults ☹

>> +
>> +    for (i = 0; i < 4; i++) {
>> +        words[i] = cpu_ldl_code(env, ctx->base.pc_next + i * 
>> sizeof(size4u_t));
>> +    }
>
> And this?
> [Taylor] It's reading from the instruction memory.  The switch statement 
> below uses it.

I thought packets are variable length and ended by a bit set.  Therefore why 
the fixed iteration to 4?  Also...


[Taylor] The maximum size of a packet is 4 words, so I go ahead and read that 
much.  Once the packet is decoded, I set ctx->base.pc_next appropriately.  Note 
that most of the instructions in the switch add 4, but one of them adds 8.

>
>> +    /*
>> +     * Brute force just enough to get the first program to execute.
>> +     */
>> +    switch (words[0]) {

... you only use 1 word, but you read 4.

>> +static void hexagon_tr_init_disas_context(DisasContextBase *dcbase,
>> +                                          CPUState *cs) {
>> +    DisasContext *ctx = container_of(dcbase, DisasContext, base);
>> +
>> +    ctx->mem_idx = ctx->base.tb->flags & TB_FLAGS_MMU_MASK;
>
> Since you're not initializing this in cpu_get_tb_cpu_state, you might
> as well just use
>
>     ctx->mem_idx = MMU_USER_IDX;
> [Taylor] Should I be initialize this in cpu_get_bt_cpu_state?

Not until there's something more interesting to put there, when you implement 
system state.  The constant initialization should be fine.


r~

Reply via email to