http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49362
Greta Yorsh <Greta.Yorsh at arm dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |Greta.Yorsh at arm dot com --- Comment #3 from Greta Yorsh <Greta.Yorsh at arm dot com> 2011-06-14 12:59:11 UTC --- It looks like the problem you described has already been fixed. When the example is compiled with gcc from trunk (gcc version 4.7.0 with -O2), vld1q variant has 15 instructions and vld2q variant has 13 instructions (see below). The version of gcc you use is 4.4.1. The issue hasn't been fixed in the latest gcc release 4.6, but the fix should be included in the next release and probably won't be backported to 4.5 and 4.6 releases. Disassembly of section .text: 00000000 <hamming_distance_vld2q>: 0: f460438f vld2.32 {d20-d23}, [r0] 4: f461038f vld2.32 {d16-d19}, [r1] 8: f34481f0 veor q12, q10, q8 c: f34601f2 veor q8, q11, q9 10: f3f02568 vcnt.8 q9, q12 14: f3f00560 vcnt.8 q8, q8 18: f24208e0 vadd.i8 q8, q9, q8 1c: f3f002e0 vpaddl.u8 q8, q8 20: f3f402e0 vpaddl.u16 q8, q8 24: f26121b1 vorr d18, d17, d17 28: f2620bb0 vpadd.i32 d16, d18, d16 2c: f2600bb0 vpadd.i32 d16, d16, d16 30: ee100b90 vmov.32 r0, d16[0] 34: e12fff1e bx lr 00000038 <hamming_distance_vld1q>: 38: f4602a8d vld1.32 {d18-d19}, [r0]! 3c: f4610a8d vld1.32 {d16-d17}, [r1]! 40: f34221f0 veor q9, q9, q8 44: f4604a8f vld1.32 {d20-d21}, [r0] 48: f3f02562 vcnt.8 q9, q9 4c: f4610a8f vld1.32 {d16-d17}, [r1] 50: f34401f0 veor q8, q10, q8 54: f3f00560 vcnt.8 q8, q8 58: f24208e0 vadd.i8 q8, q9, q8 5c: f3f002e0 vpaddl.u8 q8, q8 60: f3f402e0 vpaddl.u16 q8, q8 64: f26121b1 vorr d18, d17, d17 68: f2620bb0 vpadd.i32 d16, d18, d16 6c: f2600bb0 vpadd.i32 d16, d16, d16 70: ee100b90 vmov.32 r0, d16[0] 74: e12fff1e bx lr