On 02/19/2015 12:25 PM, Ramana Radhakrishnan wrote:
On Thu, Feb 19, 2015 at 9:17 AM, Marat Zakirov <m.zaki...@samsung.com> wrote:
Hi all!
During my investigation I found that GCC does not performs load/store
widening (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65088). Could you
please answer is it so? And is there any plans to make it? I also would like
to know is there any need to make load/store widening exclusively in ASan
phase just for reducing number of ASAN_CHECKS?
Example from the bug:
$ cat t2.c
int a[2];
int b[2];
int main ()
{
b[0] = a[0];
b[1] = a[1];
return 0;
}
The answer is it depends. GCC can have SLP spot this in a generic form
across ports as in the example below.
AArch64 :
main:
adrp x0, a // 5 *movdi_aarch64/11 [length = 4]
add x0, x0, :lo12:a // 6 add_losym_di [length = 4]
adrp x1, b // 8 *movdi_aarch64/11 [length = 4]
add x1, x1, :lo12:b // 9 add_losym_di [length = 4]
ldr d0, [x0] // 7 *aarch64_simd_movv2si/1 [length = 4]
mov w0, 0 // 15 *movsi_aarch64/4 [length = 4]
str d0, [x1] // 10 *aarch64_simd_movv2si/2 [length = 4]
ret // 40 simple_return [length = 4]
Or AArch32 without neon, the standard ldm peepholes / ldrd peepholes spot this.
main:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movw r2, #:lower16:a
movw r3, #:lower16:b
movt r2, #:upper16:a
movt r3, #:upper16:b
ldmia r2, {r1, r2}
mov r0, #0
stmia r3, {r1, r2}
bx lr
It will be interesting to see if the number of checks can be reduced
but I suspect you'll hit quite a few phase ordering issues and you'll
have quite a few variances between ports to make this work sensibly.
regards
Ramana
$ gcc t2.c -O3 -S
$ cat t2.s
...
main:
.LFB0:
.cfi_startproc
movl a(%rip), %eax
movl %eax, b(%rip)
movl a+4(%rip), %eax
movl %eax, b+4(%rip)
xorl %eax, %eax
ret
.cfi_endproc
I will be very appreciate for your answers and thoughts.
--Marat
Thank you very much Ramana.
I also would like x86 maintainers to explain why x86 GCC didn't handle
given example?
--Marat