+ eugeni.stepanov
On Tue, Apr 29, 2014 at 10:04 AM, Andrew Pinski <pins...@gmail.com> wrote: > On Mon, Apr 28, 2014 at 10:50 PM, Yury Gribov <y.gri...@samsung.com> wrote: >> Hi all, >> >> I've recently noticed that GCC generates suboptimal code for Asan on ARM >> targets. E.g. for a 4-byte memory access check >> >> (shadow_val != 0) & (last_byte >= shadow_val) >> >> we get the following sequence: >> >> mov r2, r0, lsr #3 >> and r3, r0, #7 >> add r3, r3, #3 >> add r2, r2, #536870912 >> ldrb r2, [r2] @ zero_extendqisi2 >> sxtb r2, r2 >> cmp r3, r2 >> movlt r3, #0 >> movge r3, #1 >> cmp r2, #0 >> moveq r3, #0 >> cmp r3, #0 >> bne .L5 >> ldr r0, [r0] >> >> Obviously a shorter code is possible: >> >> mov r3, r0, lsr #3 >> and r1, r0, #7 >> add r1, r1, #4 >> add r3, r3, #536870912 >> ldrb r3, [r3] @ zero_extendqisi2 >> sxtb r3, r3 >> cmp r3, #0 >> cmpne r1, r3 >> bgt .L5 >> ldr r0, [r0] > > Does the patch series at located at: > http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html > http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01405.html > Fix this code generation issue? I suspect it does and improves more > than just the above code. > > Thanks, > Andrew Pinski > >> >> A 30% improvement looked quite important given that Asan usually increases >> code-size by 1.5-2x so I decided to investigate this. It turned out that ARM >> backend already has full support for dominated comparisons (cmp-cmpne-bgt >> sequence above) and can generate efficient code if we provide it with a >> slightly more explicit gimple sequence: >> >> (shadow_val != 0) & (last_byte + 1 > shadow_val) >> >> Ideally backend should be able perform this transform itself. But I'm not >> sure this is possible: it needs to know that last_range + 1 can not overflow >> and this info is not available in RTL (because we don't have VRP pass >> there). >> >> I have attached a simple patch which changes Asan pass to generate the >> ARM-friendly code. I've only bootstrapped/regtested on x64 but I can perform >> additional tests on ARM if the patch make sense. As far as I can tell it >> does not worsen sanitized code on other platforms (x86/x64) while >> significantly improving ARM (15% less code for bzip). >> >> The patch is certainly not ideal: >> * it makes target-specific changes in machine-independent code >> * it does not help with 1-byte accesses (forwprop pass thinks that it's >> always beneficial to convert x + 1 > y to x >= y so it reverts my change) >> * it only improves Asan code whereas it would be great if ARM backend could >> improve generic RTL code >> but it achieves significant improvement on ARM without hurting other >> platforms. >> >> So my questions are: >> * is this kind of target-specific tweaking acceptable in middle-end? >> * if not - what would be a better option? >> >> -Y