Le keskiviikkona 27. syyskuuta 2023, 4.47.26 EEST flow gg a écrit : > >>> please pad mnemonics to at least 8 columns for consistency > > okay, changed > > >>> It seems that you could just as well use vlseg2 without register > > stride, no? > > yes, vlseg will better, changed > > >>> Note that you could do the double versions with very little extra > > efforts. > > okay > > >>> But really, DO NOT use a fixed vector length here. At best, you're > > wasting half > > >>> the vector width. Your input has a variable size, use it. > > okay, changed > > >>> I'm a bit surprised that the performance improves this much, > > considering that > > >>> the C910 is notoriously bad at both segmented strided loads. It might > > be that > > >>> the C versions is just very bad due to lack of aliasing optimisations. > > thanks, You reminded me. > Sorry I had forgotten that there was a problem.. > A few days ago, I wanted to try running some existing benchmarks, > > ``` > tests/checkasm/checkasm --bench --test=aacpsdsp > tests/checkasm/checkasm --bench --test=alacdsp > tests/checkasm/checkasm --bench --test=audiodsp > tests/checkasm/checkasm --bench --test=g722dsp > tests/checkasm/checkasm --bench --test=vorbisdsp > tests/checkasm/checkasm --bench --test=float_dsp > tests/checkasm/checkasm --bench --test=fixed_dsp > tests/checkasm/checkasm --bench --test=af_afir > ``` > > but they all returned 0.0. > > For example, > > ``` > butterflies_float_c: 0.0 > butterflies_float_rvv_f32: 0.0 > scalarproduct_float_c: 0.0 > scalarproduct_float_rvv_f32: 0.0 > vector_dmac_scalar_c: 0.0 > vector_dmac_scalar_rvv_f64: 0.0 > ...
OK, this reproduces on both SiFive and T-Head hardware here. You need to revert 09731fbfc3a914ec4f6ffad60aa9062db6a8f6aa. -- レミ・デニ-クールモン http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".