On Sun, Aug 03, 2014 at 12:36:19AM +0200, Michael Niedermayer wrote: > On Sat, Aug 02, 2014 at 11:34:07PM +0200, Clément Bœsch wrote: [...] > > +#ifdef TEST > > +#define W1 320 > > +#define H1 240 > > +#define W2 640 > > +#define H2 480 > > +int main(void) > > +{ > > + int i, a, ret = 0; > > + DECLARE_ALIGNED(32, uint32_t, buf1)[W1*H1]; > > + DECLARE_ALIGNED(32, uint32_t, buf2)[W2*H2]; > > + uint32_t state = 0; > > + > > + for (i = 0; i < W1*H1; i++) { > > + buf1[i] = state; > > + state = state * 1664525 + 1013904223; > > + } > > + > > + for (i = 0; i < W2*H2; i++) { > > + buf2[i] = state; > > + state = state * 1664525 + 1013904223; > > + } > > the code should in addition be tested with maximal and minimal > difference cases >
Tests added. > > [...] > > +;------------------------------------------------------------------------------- > > +; int ff_pixelutils_sad_[au]_16x16_sse(const uint8_t *src1, ptrdiff_t > > stride1, > > +; const uint8_t *src2, ptrdiff_t > > stride2); > > +;------------------------------------------------------------------------------- > > +%macro SAD_XMM_16x16 1 > > +INIT_XMM sse2 > > +cglobal pixelutils_sad_%1_16x16, 4,4,3, src1, stride1, src2, stride2 > > + pxor m2, m2 > > +%rep 8 > > + mov%1 m0, [src2q] > > + mov%1 m1, [src2q + stride2q] > > + psadbw m0, [src1q] > > + psadbw m1, [src1q + stride1q] > > + paddw m2, m0 > > + paddw m2, m1 > > + lea src1q, [src1q + 2*stride1q] > > + lea src2q, [src2q + 2*stride2q] > > +%endrep > > + movhlps m0, m2 > > + paddw m2, m0 > > + movd eax, m2 > > + RET > > +%endmacro > > there are various improvments possible, though these should be in > a seperate patch and not in gcc->yasm but > the pxor can be avoided by lifting the first iteration out and > using m2 as destination > > it might be faster to use 2 accumulator registers as that way both > could execute with no dependancies on the other > > as you unroll the loop, addressing can be done with fewer instructions > I left the ASM as is since it was kind of simple and parallel to the API itself; we can iterate from here with benchmarks > LGTM otherwise > Patchset applied, thanks [...] -- Clément B.
pgpGMcRddT6k3.pgp
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel