http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362

Greta Yorsh <Greta.Yorsh at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |Greta.Yorsh at arm dot com

--- Comment #3 from Greta Yorsh <Greta.Yorsh at arm dot com> 2011-06-14 
12:59:11 UTC ---
It looks like the problem you described has already been fixed.

When the example is compiled with gcc from trunk (gcc version 4.7.0 with -O2),
vld1q variant has 15 instructions and vld2q variant has 13 instructions (see
below).
The version of gcc you use is 4.4.1. The issue hasn't been fixed in the latest
gcc release 4.6, but the fix should be included in the next release and
probably won't be backported to 4.5 and 4.6 releases.


Disassembly of section .text:

00000000 <hamming_distance_vld2q>:
   0:    f460438f     vld2.32    {d20-d23}, [r0]
   4:    f461038f     vld2.32    {d16-d19}, [r1]
   8:    f34481f0     veor    q12, q10, q8
   c:    f34601f2     veor    q8, q11, q9
  10:    f3f02568     vcnt.8    q9, q12
  14:    f3f00560     vcnt.8    q8, q8
  18:    f24208e0     vadd.i8    q8, q9, q8
  1c:    f3f002e0     vpaddl.u8    q8, q8
  20:    f3f402e0     vpaddl.u16    q8, q8
  24:    f26121b1     vorr    d18, d17, d17
  28:    f2620bb0     vpadd.i32    d16, d18, d16
  2c:    f2600bb0     vpadd.i32    d16, d16, d16
  30:    ee100b90     vmov.32    r0, d16[0]
  34:    e12fff1e     bx    lr

00000038 <hamming_distance_vld1q>:
  38:    f4602a8d     vld1.32    {d18-d19}, [r0]!
  3c:    f4610a8d     vld1.32    {d16-d17}, [r1]!
  40:    f34221f0     veor    q9, q9, q8
  44:    f4604a8f     vld1.32    {d20-d21}, [r0]
  48:    f3f02562     vcnt.8    q9, q9
  4c:    f4610a8f     vld1.32    {d16-d17}, [r1]
  50:    f34401f0     veor    q8, q10, q8
  54:    f3f00560     vcnt.8    q8, q8
  58:    f24208e0     vadd.i8    q8, q9, q8
  5c:    f3f002e0     vpaddl.u8    q8, q8
  60:    f3f402e0     vpaddl.u16    q8, q8
  64:    f26121b1     vorr    d18, d17, d17
  68:    f2620bb0     vpadd.i32    d16, d18, d16
  6c:    f2600bb0     vpadd.i32    d16, d16, d16
  70:    ee100b90     vmov.32    r0, d16[0]
  74:    e12fff1e     bx    lr

Reply via email to