On 7/11/2023 11:24 AM, Dongdong Liu wrote: > From: Huisong Li <lihuis...@huawei.com> > > Currently, hns3 SVE Tx checks the valid bits of all descriptors > in a batch and then determines whether to release the corresponding > mbufs. Actually, once the valid bit of any descriptor in a batch > isn't cleared, driver does not need to scan the rest of descriptors. > > If we optimize SVE codes algorithm about this function, the performance > of a single queue for 64B packet is improved by ~2% on txonly forwarding > mode. And if use C code to scan all descriptors, the performance is > improved by ~8%. > > So this patch selects C code to optimize this code to improve the SVE > Tx performance. > > Signed-off-by: Huisong Li <lihuis...@huawei.com> > Signed-off-by: Dongdong Liu <liudongdo...@huawei.com> >
SVE Tx optimized by removing SVE implementation :) Do you have any insight why generic vector implementation is faster?