+ eugeni.stepanov

On Tue, Apr 29, 2014 at 10:04 AM, Andrew Pinski <pins...@gmail.com> wrote:
> On Mon, Apr 28, 2014 at 10:50 PM, Yury Gribov <y.gri...@samsung.com> wrote:
>> Hi all,
>>
>> I've recently noticed that GCC generates suboptimal code for Asan on ARM
>> targets. E.g. for a 4-byte memory access check
>>
>>     (shadow_val != 0) & (last_byte >= shadow_val)
>>
>> we get the following sequence:
>>
>>     mov    r2, r0, lsr #3
>>     and    r3, r0, #7
>>     add    r3, r3, #3
>>     add    r2, r2, #536870912
>>     ldrb    r2, [r2]    @ zero_extendqisi2
>>     sxtb    r2, r2
>>     cmp    r3, r2
>>     movlt    r3, #0
>>     movge    r3, #1
>>     cmp    r2, #0
>>     moveq    r3, #0
>>     cmp    r3, #0
>>     bne    .L5
>>     ldr    r0, [r0]
>>
>> Obviously a shorter code is possible:
>>
>>     mov    r3, r0, lsr #3
>>     and    r1, r0, #7
>>     add    r1, r1, #4
>>     add    r3, r3, #536870912
>>     ldrb    r3, [r3]    @ zero_extendqisi2
>>     sxtb    r3, r3
>>     cmp    r3, #0
>>     cmpne    r1, r3
>>     bgt    .L5
>>     ldr    r0, [r0]
>
> Does the patch series at located at:
> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01407.html
> http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01405.html
> Fix this code generation issue?  I suspect it does and improves more
> than just the above code.
>
> Thanks,
> Andrew Pinski
>
>>
>> A 30% improvement looked quite important given that Asan usually increases
>> code-size by 1.5-2x so I decided to investigate this. It turned out that ARM
>> backend already has full support for dominated comparisons (cmp-cmpne-bgt
>> sequence above) and can generate efficient code if we provide it with a
>> slightly more explicit gimple sequence:
>>
>>     (shadow_val != 0) & (last_byte + 1 > shadow_val)
>>
>> Ideally backend should be able perform this transform itself. But I'm not
>> sure this is possible: it needs to know that last_range + 1 can not overflow
>> and this info is not available in RTL (because we don't have VRP pass
>> there).
>>
>> I have attached a simple patch which changes Asan pass to generate the
>> ARM-friendly code. I've only bootstrapped/regtested on x64 but I can perform
>> additional tests on ARM if the patch make sense. As far as I can tell it
>> does not worsen sanitized code on other platforms (x86/x64) while
>> significantly improving ARM (15% less code for bzip).
>>
>> The patch is certainly not ideal:
>> * it makes target-specific changes in machine-independent code
>> * it does not help with 1-byte accesses (forwprop pass thinks that it's
>> always beneficial to convert x + 1 > y to x >= y so it reverts my change)
>> * it only improves Asan code whereas it would be great if ARM backend could
>> improve generic RTL code
>> but it achieves significant improvement on ARM without hurting other
>> platforms.
>>
>> So my questions are:
>> * is this kind of target-specific tweaking acceptable in middle-end?
>> * if not - what would be a better option?
>>
>> -Y

Reply via email to