Hi all,
I've recently noticed that GCC generates suboptimal code for Asan on ARM
targets. E.g. for a 4-byte memory access check
(shadow_val != 0) & (last_byte >= shadow_val)
we get the following sequence:
mov r2, r0, lsr #3
and r3, r0, #7
add r3, r3, #3
add r2, r2, #536870912
ldrb r2, [r2] @ zero_extendqisi2
sxtb r2, r2
cmp r3, r2
movlt r3, #0
movge r3, #1
cmp r2, #0
moveq r3, #0
cmp r3, #0
bne .L5
ldr r0, [r0]
Obviously a shorter code is possible:
mov r3, r0, lsr #3
and r1, r0, #7
add r1, r1, #4
add r3, r3, #536870912
ldrb r3, [r3] @ zero_extendqisi2
sxtb r3, r3
cmp r3, #0
cmpne r1, r3
bgt .L5
ldr r0, [r0]
A 30% improvement looked quite important given that Asan usually
increases code-size by 1.5-2x so I decided to investigate this. It
turned out that ARM backend already has full support for dominated
comparisons (cmp-cmpne-bgt sequence above) and can generate efficient
code if we provide it with a slightly more explicit gimple sequence:
(shadow_val != 0) & (last_byte + 1 > shadow_val)
Ideally backend should be able perform this transform itself. But I'm
not sure this is possible: it needs to know that last_range + 1 can not
overflow and this info is not available in RTL (because we don't have
VRP pass there).
I have attached a simple patch which changes Asan pass to generate the
ARM-friendly code. I've only bootstrapped/regtested on x64 but I can
perform additional tests on ARM if the patch make sense. As far as I can
tell it does not worsen sanitized code on other platforms (x86/x64)
while significantly improving ARM (15% less code for bzip).
The patch is certainly not ideal:
* it makes target-specific changes in machine-independent code
* it does not help with 1-byte accesses (forwprop pass thinks that it's
always beneficial to convert x + 1 > y to x >= y so it reverts my change)
* it only improves Asan code whereas it would be great if ARM backend
could improve generic RTL code
but it achieves significant improvement on ARM without hurting other
platforms.
So my questions are:
* is this kind of target-specific tweaking acceptable in middle-end?
* if not - what would be a better option?
-Y
2014-04-29 Yury Gribov <y.gri...@samsung.com>
* asan.c (build_check_stmt): Change generated code to improve
code generated for ARM.
diff --git a/gcc/asan.c b/gcc/asan.c
index d7c282e..f00705a 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1543,18 +1543,17 @@ build_check_stmt (location_t location, tree base, gimple_stmt_iterator *iter,
{
/* Slow path for 1, 2 and 4 byte accesses.
Test (shadow != 0)
- & ((base_addr & 7) + (size_in_bytes - 1)) >= shadow). */
+ & ((base_addr & 7) + size_in_bytes) > shadow). */
gimple_seq seq = NULL;
gimple shadow_test = build_assign (NE_EXPR, shadow, 0);
gimple_seq_add_stmt (&seq, shadow_test);
gimple_seq_add_stmt (&seq, build_assign (BIT_AND_EXPR, base_addr, 7));
gimple_seq_add_stmt (&seq, build_type_cast (shadow_type,
gimple_seq_last (seq)));
- if (size_in_bytes > 1)
- gimple_seq_add_stmt (&seq,
- build_assign (PLUS_EXPR, gimple_seq_last (seq),
- size_in_bytes - 1));
- gimple_seq_add_stmt (&seq, build_assign (GE_EXPR, gimple_seq_last (seq),
+ gimple_seq_add_stmt (&seq,
+ build_assign (PLUS_EXPR, gimple_seq_last (seq),
+ size_in_bytes));
+ gimple_seq_add_stmt (&seq, build_assign (GT_EXPR, gimple_seq_last (seq),
shadow));
gimple_seq_add_stmt (&seq, build_assign (BIT_AND_EXPR, shadow_test,
gimple_seq_last (seq)));