On Sun, Jul 12, 2015 at 10:07:16PM +0000, Paul B Mahol wrote: > On 7/12/15, Ronald S. Bultje <rsbul...@gmail.com> wrote: > > Hi, > > > > On Sun, Jul 12, 2015 at 10:29 AM, Paul B Mahol <one...@gmail.com> wrote: > > > >> Dana 12. 7. 2015. 14:18 osoba "Ronald S. Bultje" <rsbul...@gmail.com> > >> napisala je: > >> > > >> > Hi, > >> > > >> > On Sun, Jul 12, 2015 at 6:48 AM, Paul B Mahol <one...@gmail.com> wrote: > >> > > >> > > Dana 12. 7. 2015. 01:56 osoba "Ronald S. Bultje" <rsbul...@gmail.com> > >> > > napisala je: > >> > > > > >> > > > --- > >> > > > libavfilter/vf_ssim.c | 5 ++--- > >> > > > 1 file changed, 2 insertions(+), 3 deletions(-) > >> > > > > >> > > > diff --git a/libavfilter/vf_ssim.c b/libavfilter/vf_ssim.c > >> > > > index 0721ddd..3ef122f 100644 > >> > > > --- a/libavfilter/vf_ssim.c > >> > > > +++ b/libavfilter/vf_ssim.c > >> > > > @@ -134,7 +134,7 @@ static float ssim_end1(int s1, int s2, int ss, > >> int > >> > > s12) > >> > > > / ((float)(fs1 * fs1 + fs2 * fs2 + ssim_c1) * (float)(vars > >> + > >> > > ssim_c2)); > >> > > > } > >> > > > > >> > > > -static float ssim_end4(int sum0[5][4], int sum1[5][4], int width) > >> > > > +static float ssim_endn(int (*sum0)[4], int (*sum1)[4], int width) > >> > > > { > >> > > > float ssim = 0.0; > >> > > > int i; > >> > > > @@ -169,8 +169,7 @@ static float ssim_plane(uint8_t *main, int > >> > > main_stride, > >> > > > &sum0[x]); > >> > > > } > >> > > > > >> > > > - for (x = 0; x < width - 1; x += 4) > >> > > > - ssim += ssim_end4(sum0 + x, sum1 + x, FFMIN(4, width - > >> > > > x > >> - > >> > > 1)); > >> > > > + ssim += ssim_endn(sum0, sum1, width - 1); > >> > > > } > >> > > > > >> > > > return ssim / ((height - 1) * (width - 1)); > >> > > > -- > >> > > > 2.1.2 > >> > > > > >> > > > > >> > > > >> > > Why? There was reason behind this code I guess. > >> > > > >> > > >> > I think it's for simd code simplification. See, I'm guessing the code > >> > you > >> > took from libvpx had an extra condition to do only 4-sized chunks > >> > through > >> a > >> > function pointer, and then the odd tail in c code. If you do this, the > >> simd > >> > code has a fixed size (always 4), which makes the implementation much > >> more > >> > trivial: 4 16-byte loads, add, transpose4x4d, and then ssim_end1 to get > >> > 4 > >> > results, which you horizontal-add and return. > >> > > >> > >> I took this from tiny_ssim.c as pengvado said its ok to relicense to lgpl. > > > > > > I think the same reasoning still applies - this will get better > > performance, particularly if we consider avx2. > > OK, patch lgtm.
applied thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB Everything should be made as simple as possible, but not simpler. -- Albert Einstein
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel