Hi, 2015-02-04 4:55 GMT+01:00 James Almer <jamr...@gmail.com>: > Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. > Refactoring and optimizations by James Almer.
Add your own copyright to this file then. > Width 32 > 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips > 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips > 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips > > Width 64 > 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips > 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips > 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Are the first number for each case from before you split out the restore part? Otherwise, that's gruesome. > - void (*sao_edge_filter)(uint8_t *_dst, uint8_t *_src, ptrdiff_t > stride_dst, > - ptrdiff_t stride_src, int16_t *sao_offset_val, > int sao_eo_class, > - int width, int height); > + void (*sao_edge_filter[5])(uint8_t *_dst, uint8_t *_src, ptrdiff_t > stride_dst, > + ptrdiff_t stride_src, int16_t > *sao_offset_val, int sao_eo_class, > + int width, int height); Maybe add a comment on top of that to indicate that _dst is 16-byte-aligned? Also, src and stride_src are so that the buffer is 32-byte-aligned, because of: stride_dst = 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE; dst = lc->edge_emu_buffer + stride_dst + FF_INPUT_BUFFER_PADDING_SIZE; in hevc_filter.c, but I'm not sure how much it is a benefit here, or often it is helping here. Don't hesitate to modify them if need be. > +%else ; ARCH_X86_32 > +cglobal hevc_sao_edge_filter_%1_8, 1, 7, 8, dst, src, dststride, srcstride, > a_stride, b_stride, height As seen from above, srcstride is constant and is 2*MAX_PB_SIZE + FF_INPUT_BUFFER_PADDING_SIZE. That may save you one whole gpr. Not really useful here, but I think you are more limited for the>8 bits case. If you want to exploit this, also add it above void (*sao_edge_filter[5]) No comment on the actual assembly, it looks fine. -- Christophe _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel