http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51980

--- Comment #5 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> 2012-03-30 
08:17:21 UTC ---
Experimenting with : 

Applying the patch of PR48941 and the patch for lower-subreg here

http://gcc.gnu.org/ml/gcc-patches/2012-03/msg01886.html

I now see : We still have too many moves for my liking but the gratuituous
spilling is now gone. 

      .cpu cortex-a9
        .eabi_attribute 27, 3
        .fpu neon
        .eabi_attribute 20, 1
        .eabi_attribute 21, 1
        .eabi_attribute 23, 3
        .eabi_attribute 24, 1
        .eabi_attribute 25, 1
        .eabi_attribute 26, 2
        .eabi_attribute 30, 2
        .eabi_attribute 34, 1
        .eabi_attribute 18, 4
        .file   "t2.c"
        .text
        .align  2
        .global sqrlen4D_16u8
        .type   sqrlen4D_16u8, %function
sqrlen4D_16u8:
        @ args = 16, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vmov    d16, r0, r1  @ v16qi
        vmov    d17, r2, r3
        vldmia  sp, {d18-d19}
        vabd.u8 q10, q8, q9
        vmull.u8        q11, d20, d20
        vmull.u8        q10, d21, d21
        vmov    q8, q11  @ v4si  -- unnecessary ? 
        vmov    q9, q10  @ v4si  -- unnecessary ? 
        vuzp.32 q8, q9
        vpaddl.u16      q10, q8
        vmov    q11, q10  @ v4si  -- unnecessary
        vpadal.u16      q11, q9
        vmov    r0, r1, d22  @ v4si
        vmov    r2, r3, d23
        bx      lr
        .size   sqrlen4D_16u8, .-sqrlen4D_16u8
        .ident  "GCC: (GNU) 4.8.0 20120330 (experimental)"
        .section        .note.GNU-stack,"",%progbits

This probably makes it a dup of PR48941 but it's starting to look more
promising now. 

Eric, could you try the 2 patches and see what you get - This isn't something
to be gratuitously backported as we still have to see the effects elsewhere but
it would be worth seeing if this helps on your intrinsics testcases. 

Ramana

Reply via email to