On 22/02/15 9:13 AM, Paul B Mahol wrote: > On 2/21/15, James Almer <jamr...@gmail.com> wrote: >> On 21/02/15 3:47 PM, Paul B Mahol wrote: >>> On 2/21/15, James Almer <jamr...@gmail.com> wrote: >>>> On 21/02/15 8:49 AM, Paul B Mahol wrote: >>>>> Have you measured performance drop before and after? >>>> >>>> filter_order 8 in decorrelate() >>>> >>>> Before >>>> 903 decicycles in scalarproduct, 8388364 runs, 244 skips >>>> After >>>> 858 decicycles in scalarproduct, 8388215 runs, 393 skips >>>> >>>> >>>> filter_order 24 in decode_subframe() >>>> >>>> Before >>>> 993 decicycles in scalarproduct, 16776849 runs, 367 skips >>>> After >>>> 887 decicycles in scalarproduct, 16776783 runs, 433 skips >>>> >>> >>> But what about other filter orders? >> >> filter order 12 in decode_subframe() >> >> Before >> 963 decicycles in scalarproduct, 8388426 runs, 182 skips >> After >> 873 decicycles in scalarproduct, 8388410 runs, 198 skips >> >> >> filter_order 8 in decode_subframe() >> >> Before >> 900 decicycles in scalarproduct, 4194020 runs, 284 skips >> After >> 858 decicycles in scalarproduct, 4194198 runs, 106 skips >> >> >> filter order 4 in decode_subframe() >> >> Before >> 827 decicycles in scalarproduct, 1048561 runs, 15 skips >> After >> 876 decicycles in scalarproduct, 1048556 runs, 20 skips >> >> >> Seems like only filter_order 4 is slower. I could leave the C code for that >> one >> case if you prefer. > > I'm more afraid of overhead that memset() does. > > Feel free to apply patch.
Pushed a version Christophe wrote some time ago that uses AV_ZERO instead of memset(), so that should not be a problem now. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel