Here’s a more extreme example:

- https://cx.rv8.io/g/2HWQje

The bitfield type is unsigned int, so one or two 32-bit loads should suffice 
(depending on register pressure). GCC is issuing a lw at some point in the asm.

struct foo {
  unsigned int a : 3;
  unsigned int b : 3;
  unsigned int c : 3;
  unsigned int d : 3;
  unsigned int e : 3;
  unsigned int f : 3;
  unsigned int g : 3;
  unsigned int h : 3;
  unsigned int i : 3;
  unsigned int j : 3;
};

unsigned int proc_foo(struct foo *p)
{
    return p->a + p->b + p->c + p->d + p->d + p->e + p->f + p->g + p->h + p->i 
+ p->j;
}

> On 17 Aug 2017, at 10:29 AM, Michael Clark <michaeljcl...@mac.com> wrote:
> 
> Hi,
> 
> Is there any reason for 3 loads being issued for these bitfield accesses, 
> given two of the loads are bytes, and one is a half; the compiler appears to 
> know the structure is aligned at a half word boundary. Secondly, the riscv 
> code is using a mixture of 32-bit and 64-bit adds and shifts. Thirdly, with 
> -Os the riscv code size is the same, but the schedule is less than optimal. 
> i.e. the 3rd load is issued much later.
> 
> - https://cx.rv8.io/g/2YDLTA
> 
> code:
> 
>       struct foo {
>         unsigned int a : 5;
>         unsigned int b : 5;
>         unsigned int c : 5;
>       };
> 
>       unsigned int proc_foo(struct foo *p)
>       {
>           return p->a + p->b + p->c;
>       }
> 
> riscv asm:
> 
>       proc_foo(foo*):
>         lhu a3,0(a0)
>         lbu a4,0(a0)
>         lbu a5,1(a0)
>         srliw a3,a3,5
>         andi a0,a4,31
>         srli a5,a5,2
>         andi a4,a3,31
>         addw a0,a0,a4
>         andi a5,a5,31
>         add a0,a0,a5
>         ret
> 
> x86_64 asm:
> 
>       proc_foo(foo*):
>         movzx edx, BYTE PTR [rdi]
>         movzx eax, WORD PTR [rdi]
>         mov ecx, edx
>         shr ax, 5
>         and eax, 31
>         and ecx, 31
>         lea edx, [rcx+rax]
>         movzx eax, BYTE PTR [rdi+1]
>         shr al, 2
>         and eax, 31
>         add eax, edx
>         ret
> 
> hand coded riscv asm:
> 
>       proc_foo(foo*):
>         lhu a1,0(a0)
>         srli a2,a1,5
>         srli a3,a1,10
>         andi a0,a1,31
>         andi a2,a2,31
>         andi a3,a3,31
>         add a0,a0,a2
>         add a0,a0,a3
>         ret
> 
> Michael

Reply via email to