vtrnq

ramana at gcc dot gnu.org Fri, 30 Mar 2012 01:18:30 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980


--- Comment #5 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 
08:17:21 UTC ---
Experimenting with : 

Applying the patch of PR48941 and the patch for lower-subreg here

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01886.html

I now see : We still have too many moves for my liking but the gratuituous
spilling is now gone. 

      .cpu cortex-a9
        .eabi_attribute 27, 3
        .fpu neon
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "t2.c"
        .text
        .align  2
        .global sqrlen4D_16u8
        .type   sqrlen4D_16u8, %function
sqrlen4D_16u8:
        @ args = 16, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vmov    d16, r0, r1  @ v16qi
        vmov    d17, r2, r3
        vldmia  sp, {d18-d19}
        vabd.u8 q10, q8, q9
        vmull.u8        q11, d20, d20
        vmull.u8        q10, d21, d21
        vmov    q8, q11  @ v4si  -- unnecessary ? 
        vmov    q9, q10  @ v4si  -- unnecessary ? 
        vuzp.32 q8, q9
        vpaddl.u16      q10, q8
        vmov    q11, q10  @ v4si  -- unnecessary
        vpadal.u16      q11, q9
        vmov    r0, r1, d22  @ v4si
        vmov    r2, r3, d23
        bx      lr
        .size   sqrlen4D_16u8, .-sqrlen4D_16u8
        .ident  "GCC: (GNU) 4.8.0 20120330 (experimental)"
        .section        .note.GNU-stack,"",%progbits

This probably makes it a dup of PR48941 but it's starting to look more
promising now. 

Eric, could you try the 2 patches and see what you get - This isn't something
to be gratuitously backported as we still have to see the effects elsewhere but
it would be worth seeing if this helps on your intrinsics testcases. 

Ramana

[Bug target/51980] ARM - Neon code polluted by useless stores to the stack with vuzpq / vzipq / vtrnq

Reply via email to