------- Comment #5 from siarhei dot siamashka at gmail dot com 2010-03-14 12:23 ------- Do you want to force data into specific neon registers because of the restriction on the neon registers which can be used as scalar operand for multiplication?
It works for me. /**************************/ #include <stdint.h> #include <arm_neon.h> void f(int16_t *ptr) { register int16x4_t mul_consts asm ("d0"); int16x4_t data; int32x4_t tmp; mul_consts = vset_lane_s16(0x1234, mul_consts, 0); asm volatile ( "vld1.16 {%P1}, [%2]\n" "vmull.s16 %q0, %P1, %P3[0]\n" "vshrn.s32 %P1, %q0, #15\n" "vst1.16 {%P1}, [%2]\n" : "=&w" (tmp), "=&w" (data) : "r" (ptr), "w" (mul_consts) : "memory" ); } /**************************/ While not forcing 'mul_consts' variable into 'd0' register fails as expected: /tmp/ccvzAXVb.s: Assembler messages: /tmp/ccvzAXVb.s:27: Error: scalar out of range for multiply instruction -- `vmull.s16 q9,d17,d16[0]' So I don't see any problem here. Tested with gcc 4.3.4 and 4.4.3 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41538