Re: Redundant loads for bitfield accesses

Andrew Waterman Wed, 16 Aug 2017 16:33:10 -0700

When implementing the RISC-V port, I took the name of this macro at
face value.  It does seem that we should follow Andrew's advice and
set it to 1.  (For your examples, doing so does improve code
generation.)


We'll submit a patch if the change doesn't regress.

On Wed, Aug 16, 2017 at 4:00 PM, Michael Clark <michaeljcl...@mac.com> wrote:
> ‘cc’ing Andrew Waterman
>
> I see this comment in SPARC:
>
> /* Nonzero if access to memory by bytes is slow and undesirable.
>    For RISC chips, it means that access to memory by bytes is no
>    better than access by words when possible, so grab a whole word
>    and maybe make use of that.  */
> #define SLOW_BYTE_ACCESS 1
>
>
> The description says that byte access is no better than words, so as you
> mention, the macro seems to be misnamed. I think this should be set to 1 on
> RISC-V. I’m going to try it on the RISC-V backend.
>
> Andrew W, here is the example code-gen:
>
> - https://cx.rv8.io/g/2YDLTA
> - https://cx.rv8.io/g/2HWQje
>
> On 17 Aug 2017, at 10:52 AM, Michael Clark <michaeljcl...@mac.com> wrote:
>
>
> On 17 Aug 2017, at 10:41 AM, Andrew Pinski <pins...@gmail.com> wrote:
>
> On Wed, Aug 16, 2017 at 3:29 PM, Michael Clark <michaeljcl...@mac.com>
> wrote:
>
> Hi,
>
> Is there any reason for 3 loads being issued for these bitfield accesses,
> given two of the loads are bytes, and one is a half; the compiler appears to
> know the structure is aligned at a half word boundary. Secondly, the riscv
> code is using a mixture of 32-bit and 64-bit adds and shifts. Thirdly, with
> -Os the riscv code size is the same, but the schedule is less than optimal.
> i.e. the 3rd load is issued much later.
>
>
>
> Well one thing is most likely SLOW_BYTE_ACCESS is set to 0.  This
> forces byte access for bit-field accesses.  The macro is misnamed now
> as it only controls bit-field accesses right now (and one thing in
> dojump dealing with comparisons with and and a constant but that might
> be dead code).  This should allow for you to get the code in hand
> written form.
> I suspect SLOW_BYTE_ACCESS support should be removed and be assumed to
> be 1 but I have not time to look into each backend to see if it is
> correct to do or not.  Maybe it is wrong for AVR.
>
>
> Thanks, that’s interesting.
>
> So I should try compiling the riscv backend with SLOW_BYTE_ACCESS = 1? Less
> risk than making a change to x86.
>
> This is clearly distinct from slow unaligned access. It seems odd that O3
> doesn’t coalesce loads even if byte access is slow as one would expect the
> additional cost of the additional loads would outweigh the fact that byte
> accesses are not slow unless something weird is happening with the costs of
> loads of different widths.
>
> x86 could also be helped here too. I guess subsequent loads will be served
> from L1, but that’s not really an excuse for this codegen when the element
> is 32-bits aligned (unsigned int).
>
>

Re: Redundant loads for bitfield accesses

Reply via email to