Dana 12. 7. 2015. 14:18 osoba "Ronald S. Bultje" <rsbul...@gmail.com> napisala je: > > Hi, > > On Sun, Jul 12, 2015 at 6:48 AM, Paul B Mahol <one...@gmail.com> wrote: > > > Dana 12. 7. 2015. 01:56 osoba "Ronald S. Bultje" <rsbul...@gmail.com> > > napisala je: > > > > > > --- > > > libavfilter/vf_ssim.c | 5 ++--- > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > diff --git a/libavfilter/vf_ssim.c b/libavfilter/vf_ssim.c > > > index 0721ddd..3ef122f 100644 > > > --- a/libavfilter/vf_ssim.c > > > +++ b/libavfilter/vf_ssim.c > > > @@ -134,7 +134,7 @@ static float ssim_end1(int s1, int s2, int ss, int > > s12) > > > / ((float)(fs1 * fs1 + fs2 * fs2 + ssim_c1) * (float)(vars + > > ssim_c2)); > > > } > > > > > > -static float ssim_end4(int sum0[5][4], int sum1[5][4], int width) > > > +static float ssim_endn(int (*sum0)[4], int (*sum1)[4], int width) > > > { > > > float ssim = 0.0; > > > int i; > > > @@ -169,8 +169,7 @@ static float ssim_plane(uint8_t *main, int > > main_stride, > > > &sum0[x]); > > > } > > > > > > - for (x = 0; x < width - 1; x += 4) > > > - ssim += ssim_end4(sum0 + x, sum1 + x, FFMIN(4, width - x - > > 1)); > > > + ssim += ssim_endn(sum0, sum1, width - 1); > > > } > > > > > > return ssim / ((height - 1) * (width - 1)); > > > -- > > > 2.1.2 > > > > > > > > > > Why? There was reason behind this code I guess. > > > > I think it's for simd code simplification. See, I'm guessing the code you > took from libvpx had an extra condition to do only 4-sized chunks through a > function pointer, and then the odd tail in c code. If you do this, the simd > code has a fixed size (always 4), which makes the implementation much more > trivial: 4 16-byte loads, add, transpose4x4d, and then ssim_end1 to get 4 > results, which you horizontal-add and return. >
I took this from tiny_ssim.c as pengvado said its ok to relicense to lgpl. > The disadvantage is overhead. First, call overhead since each 4-element > chunk requires a function call, second overhead for function initialization > (anything outside the main loop, either before or after). This includes the > horizontal-add, which is relatively expensive. Third, it limits us to > 16-byte: no avx(2). Doing a variable-size function makes the simd slightly > more complex, but is more future-proof (avx/2) and theoretically faster. > > Does this change results? > > > No. > > Ronald > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel