This is an automated email from the ASF dual-hosted git repository.

xiaoxiang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nuttx-apps.git

commit 958d8e03eb4837d47a3e35f6a33de65883749931
Author: xinhaiteng <xinhait...@xiaomi.com>
AuthorDate: Thu Aug 15 20:50:10 2024 +0800

    Modify the usage error of neon instruction set
    
    The second argument of vgetq_lane_s32(__a, __b) needs to be initialized 
before compilation, so unroll the for loop. and correct the passed parameters.
    
    Signed-off-by: xinhaiteng <xinhait...@xiaomi.com>
---
 .../operators/neon/arm_nn_mat_mult_kernel_s8_s16.c         | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git 
a/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c 
b/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c
index 6cbea4de0..43cd33b30 100644
--- a/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c
+++ b/mlearning/tflite-micro/operators/neon/arm_nn_mat_mult_kernel_s8_s16.c
@@ -322,11 +322,15 @@ int8_t *arm_nn_mat_mult_kernel_s8_s16(const int8_t 
*input_a,
             col_count --;
         }
 
-        for (int i = 0; i < 4; i++)
-        {
-            ch_out[0] += vgetq_lane_s32(res[0], i);
-            ch_out[1] += vgetq_lane_s32(res[1], 1);
-        }
+        ch_out[0] += vgetq_lane_s32(res[0], 0);
+        ch_out[0] += vgetq_lane_s32(res[0], 1);
+        ch_out[0] += vgetq_lane_s32(res[0], 2);
+        ch_out[0] += vgetq_lane_s32(res[0], 3);
+
+        ch_out[1] += vgetq_lane_s32(res[1], 0);
+        ch_out[1] += vgetq_lane_s32(res[1], 1);
+        ch_out[1] += vgetq_lane_s32(res[1], 2);
+        ch_out[1] += vgetq_lane_s32(res[1], 3);
 
         col_count = num_col_a % 8;
         while (col_count)

Reply via email to