Thanks for the review. I will fix the failing checkasm first and then take care of the minor issues. I will try to to resend fixed versions this week.
Regards, Hubert On Mon, Oct 24, 2022 at 3:19 PM Martin Storsjö <mar...@martin.st> wrote: > On Mon, 17 Oct 2022, Hubert Mazur wrote: > > > Provide arm64 neon optimized implementations for hscale16To19 with > > filter sizes 4, 8 and X4. > > > > The tests and benchmarks run on AWS Graviton 2 instances. > > The results from a checkasm tool are shown below. > > > > hscale_16_to_19__fs_4_dstW_512_c: 6216.0 > > hscale_16_to_19__fs_4_dstW_512_neon: 2257.0 > > hscale_16_to_19__fs_8_dstW_512_c: 10417.7 > > hscale_16_to_19__fs_8_dstW_512_neon: 3112.5 > > hscale_16_to_19__fs_12_dstW_512_c: 14890.5 > > hscale_16_to_19__fs_12_dstW_512_neon: 3899.0 > > hscale_16_to_19__fs_16_dstW_512_c: 19006.5 > > hscale_16_to_19__fs_16_dstW_512_neon: 5341.2 > > hscale_16_to_19__fs_32_dstW_512_c: 36629.5 > > hscale_16_to_19__fs_32_dstW_512_neon: 9502.7 > > hscale_16_to_19__fs_40_dstW_512_c: 45477.5 > > hscale_16_to_19__fs_40_dstW_512_neon: 11552.0 > > > > Signed-off-by: Hubert Mazur <h...@semihalf.com> > > --- > > libswscale/aarch64/hscale.S | 402 +++++++++++++++++++++++++++++++++++ > > libswscale/aarch64/swscale.c | 70 +++++- > > 2 files changed, 471 insertions(+), 1 deletion(-) > > > +void ff_hscale16to19_4_neon_asm(int shift, int16_t *_dst, int dstW, > > + const uint8_t *_src, const int16_t *filter, > > + const int32_t *filterPos, int filterSize); > > +void ff_hscale16to19_X8_neon_asm(int shift, int16_t *_dst, int dstW, > > + const uint8_t *_src, const int16_t *filter, > > + const int32_t *filterPos, int filterSize); > > +void ff_hscale16to19_X4_neon_asm(int shift, int16_t *_dst, int dstW, > > + const uint8_t *_src, const int16_t *filter, > > + const int32_t *filterPos, int filterSize); > > + > > #define SCALE_FUNC(filter_n, from_bpc, to_bpc, opt) \ > > void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## filter_n ## _ ## opt( > \ > > SwsContext *c, int16_t > *data, \ > > @@ -43,7 +53,8 @@ void ff_hscale ## from_bpc ## to ## to_bpc ## _ ## > filter_n ## _ ## opt( \ > > #define SCALE_FUNCS(filter_n, opt) \ > > SCALE_FUNC(filter_n, 8, 15, opt); \ > > SCALE_FUNC(filter_n, 8, 19, opt); \ > > - SCALE_FUNC(filter_n, 16, 15, opt); > > + SCALE_FUNC(filter_n, 16, 15, opt); \ > > + SCALE_FUNC(filter_n, 16, 19, opt); > > So this declares the functions we're implementing as C wrappers below, and > the manual declarations further up declare the actual asm functions? > > I guess that works, although it makes unnecessary extern functions. In > such cases, we usually have the C functions be static functions, placed > above the code that uses them. But it's not a big deal. > > Other than that, this patchset mostly seems fine. > > However, I tested the patches on x86, and the new checkasm tests do fail > on x86 (both i386 and x86_64) - so that needs to be fixed anyway. So since > we'll need to do a new round anyway, please do try to fix up the minor > cosmetics I mentioned. > > // Martin > > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".