On 25/10/13 19:04, Kyrill Tkachov wrote:
> On 24/10/13 20:03, Kugan wrote:
>>
>> Hi Kyrill,
>>
>> It happens for armv5te arm-none-linux-gnueabi. --with-mode=arm
>> --with-arch=armv5te --with-float=soft
> 
> Ah ok, I can reproduce it now. So, while I agree that we add a scan for
> vbit and vbif to these testcases, there seems to be something dodgy
> going on with the register allocation.
> 
> With -march=armv5te I'm getting the following snippet of code in the
> ltgt case:
> 
> .L12:
>         ldr     r4, [ip]
>         ldr     r5, [ip, #4]
>         ldr     r6, [ip, #8]
>         ldr     r7, [ip, #12]
>         vmov    d20, r4, r5  @ v4sf
>         vmov    d21, r6, r7
>         vcgt.f32        q8, q10, q9
>         vcgt.f32        q10, q9, q10
>         vorr    q8, q8, q10
>         vmov    d22, r4, r5  @ v4sf
>         vmov    d23, r6, r7
>         vbit    q11, q9, q8
>         vmov    r4, r5, d22  @ v4sf
>         vmov    r6, r7, d23
> 
> The second vcgt.f32 trashes q10, then recreates it in q11 with:
> vmov    d22, r4, r5  @ v4sf
> vmov    d23, r6, r7
> 
> so it can do the vbit. Surely there's something better that can be done?
> 
> In contrast, with -march=armv7-a we get:
> 
> .L12:
>         vld1.32 {q9}, [r4]!
>         vcgt.f32        q8, q9, q10
>         vcgt.f32        q11, q10, q9
>         vorr    q8, q8, q11
>         vbsl    q8, q10, q9
>         vst1.32 {q8}, [lr]!
> 

This is because  of the unaligned access done for armv7-a. arm.c has the
following comment:

  /* Enable -munaligned-access by default for
     - all ARMv6 architecture-based processors
     - ARMv7-A, ARMv7-R, and ARMv7-M architecture-based processors.
     - ARMv8 architecture-base processors.

     Disable -munaligned-access by default for
     - all pre-ARMv6 architecture-based processors
     - ARMv6-M architecture-based processors.  */

Please look at the rtl difference.
- is armv7-a
+ is armv5te

;; vect_var_.18_61 = MEM[(float *)vect_pw2.14_59];

-(insn 71 70 72 (set (reg:V4SF 192)
-        (unspec:V4SF [
-                (mem:V4SF (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S16 A32])
-            ] UNSPEC_MISALIGNED_ACCESS)) neon-vcond-ltgt.c:12 -1
+(insn 71 70 72 (clobber (reg:V4SF 168 [ vect_var_.18 ]))
neon-vcond-ltgt.c:12 -1
+     (nil))
+
+(insn 72 71 73 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 0)
+        (mem:SI (reg:SI 163 [ ivtmp.47 ]) [0 MEM[(float
*)vect_pw2.14_59]+0 S4 A32])) neon-vcond-ltgt.c:12 -1
+     (nil))
+
+(insn 73 72 74 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 4)
+        (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+                (const_int 4 [0x4])) [0 MEM[(float *)vect_pw2.14_59]+4
S4 A32])) neon-vcond-ltgt.c:12 -1
+     (nil))
+
+(insn 74 73 75 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 8)
+        (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+                (const_int 8 [0x8])) [0 MEM[(float *)vect_pw2.14_59]+8
S4 A32])) neon-vcond-ltgt.c:12 -1
      (nil))

-(insn 72 71 0 (set (reg:V4SF 168 [ vect_var_.18 ])
-        (reg:V4SF 192)) neon-vcond-ltgt.c:12 -1
+(insn 75 74 0 (set (subreg:SI (reg:V4SF 168 [ vect_var_.18 ]) 12)
+        (mem:SI (plus:SI (reg:SI 163 [ ivtmp.47 ])
+                (const_int 12 [0xc])) [0 MEM[(float
*)vect_pw2.14_59]+12 S4 A32])) neon-vcond-ltgt.c:12 -1
      (nil))


Reply via email to