On 7/12/15, Ronald S. Bultje <rsbul...@gmail.com> wrote: > Hi, > > On Sun, Jul 12, 2015 at 10:29 AM, Paul B Mahol <one...@gmail.com> wrote: > >> Dana 12. 7. 2015. 14:18 osoba "Ronald S. Bultje" <rsbul...@gmail.com> >> napisala je: >> > >> > Hi, >> > >> > On Sun, Jul 12, 2015 at 6:48 AM, Paul B Mahol <one...@gmail.com> wrote: >> > >> > > Dana 12. 7. 2015. 01:56 osoba "Ronald S. Bultje" <rsbul...@gmail.com> >> > > napisala je: >> > > > >> > > > --- >> > > > libavfilter/vf_ssim.c | 5 ++--- >> > > > 1 file changed, 2 insertions(+), 3 deletions(-) >> > > > >> > > > diff --git a/libavfilter/vf_ssim.c b/libavfilter/vf_ssim.c >> > > > index 0721ddd..3ef122f 100644 >> > > > --- a/libavfilter/vf_ssim.c >> > > > +++ b/libavfilter/vf_ssim.c >> > > > @@ -134,7 +134,7 @@ static float ssim_end1(int s1, int s2, int ss, >> int >> > > s12) >> > > > / ((float)(fs1 * fs1 + fs2 * fs2 + ssim_c1) * (float)(vars >> + >> > > ssim_c2)); >> > > > } >> > > > >> > > > -static float ssim_end4(int sum0[5][4], int sum1[5][4], int width) >> > > > +static float ssim_endn(int (*sum0)[4], int (*sum1)[4], int width) >> > > > { >> > > > float ssim = 0.0; >> > > > int i; >> > > > @@ -169,8 +169,7 @@ static float ssim_plane(uint8_t *main, int >> > > main_stride, >> > > > &sum0[x]); >> > > > } >> > > > >> > > > - for (x = 0; x < width - 1; x += 4) >> > > > - ssim += ssim_end4(sum0 + x, sum1 + x, FFMIN(4, width - >> > > > x >> - >> > > 1)); >> > > > + ssim += ssim_endn(sum0, sum1, width - 1); >> > > > } >> > > > >> > > > return ssim / ((height - 1) * (width - 1)); >> > > > -- >> > > > 2.1.2 >> > > > >> > > > >> > > >> > > Why? There was reason behind this code I guess. >> > > >> > >> > I think it's for simd code simplification. See, I'm guessing the code >> > you >> > took from libvpx had an extra condition to do only 4-sized chunks >> > through >> a >> > function pointer, and then the odd tail in c code. If you do this, the >> simd >> > code has a fixed size (always 4), which makes the implementation much >> more >> > trivial: 4 16-byte loads, add, transpose4x4d, and then ssim_end1 to get >> > 4 >> > results, which you horizontal-add and return. >> > >> >> I took this from tiny_ssim.c as pengvado said its ok to relicense to lgpl. > > > I think the same reasoning still applies - this will get better > performance, particularly if we consider avx2.
OK, patch lgtm. > > Ronald > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel