This is an automated email from the ASF dual-hosted git repository. xiaoxiang pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/nuttx-apps.git
commit 958d8e03eb4837d47a3e35f6a33de65883749931 Author: xinhaiteng <xinhait...@xiaomi.com> AuthorDate: Thu Aug 15 20:50:10 2024 +0800 Modify the usage error of neon instruction set The second argument of vgetq_lane_s32(__a, __b) needs to be initialized before compilation, so unroll the for loop. and correct the passed parameters. Signed-off-by: xinhaiteng <xinhait...@xiaomi.com> --- .../operators/neon/arm_nn_mat_mult_kernel_s8_s16.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c b/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c index 6cbea4de0..43cd33b30 100644 --- a/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c +++ b/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c @@ -322,11 +322,15 @@ int8_t *arm_nn_mat_mult_kernel_s8_s16(const int8_t *input_a, col_count --; } - for (int i = 0; i < 4; i++) - { - ch_out[0] += vgetq_lane_s32(res[0], i); - ch_out[1] += vgetq_lane_s32(res[1], 1); - } + ch_out[0] += vgetq_lane_s32(res[0], 0); + ch_out[0] += vgetq_lane_s32(res[0], 1); + ch_out[0] += vgetq_lane_s32(res[0], 2); + ch_out[0] += vgetq_lane_s32(res[0], 3); + + ch_out[1] += vgetq_lane_s32(res[1], 0); + ch_out[1] += vgetq_lane_s32(res[1], 1); + ch_out[1] += vgetq_lane_s32(res[1], 2); + ch_out[1] += vgetq_lane_s32(res[1], 3); col_count = num_col_a % 8; while (col_count)