On Tue, Jul 11, 2017 at 3:09 AM, Richard Earnshaw (lists) <richard.earns...@arm.com> wrote: > On 11/07/17 05:16, Andrew Pinski wrote: >> I was looking into some bitfield code for aarch64 and was wondering >> why SLOW_BYTE_ACCESS is set to 0. I can't seem to figure out why >> though. >> The header says: >> Although there's no difference in instruction count or cycles, >> in AArch64 we don't want to expand to a sub-word to a 64-bit access >> if we don't have to, for power-saving reasons. */ >> >> But that does not make sense because with SLOW_BYTE_ACCESS to 0, GCC >> expands a sub-word access to a 64bit access. >>> When I set to SLOW_BYTE_ACCESS to 1, I get between 38% to 208% speed >> up for accesses of a bitfields inside a loop on ThunderX CN88xx. > > What's the test case? > >> >> Should we change SLOW_BYTE_ACCESS (or maybe better yet get rid of it)? >> > > The documentation for SLOW_BYTE_ACCESS is just plain confusing, IMO. > And your comment above seems to be contrary to the documentation as well.
Here is the testcase which shows the issue: typedef unsigned long long u64; typedef struct { u64 a:10; u64 b:10; u64 c:9; u64 d:7; u64 e:14; u64 f:14; }s_t; void setting(s_t *a) { a->a = 0x2AA; a->b = 0x2AA; a->c = 0x155; a->d = 0x2A; a->e = 0x2AAA; a->f = 0x2AAA; } void set(s_t *a, int b, int c, int d, int e, int f, int g) { a->a = b; a->b = c; a->c = d; a->d = e; a->e = f; a->f = g; } --- CUT --- If SLOW_BYTE_ACCESS is set to 0, we get many more instructions. See the logic in bit_field_mode_iterator::next_mode (which calls bit_field_mode_iterator::prefer_smaller_modes which checks SLOW_BYTE_ACCESS). Note the only other place which checks SLOW_BYTE_ACCESS is dojump.c and I think that code might be dead due to expand directly from SSA. Thanks, Andrew Pinski > > R.