On 02/08/14 6:13 PM, Clément Bœsch wrote: > On Sat, Aug 02, 2014 at 04:29:39PM -0300, James Almer wrote: >> On 02/08/14 3:20 PM, Clément Bœsch wrote: >>> + psrlq m0, m6, 32 >>> + paddw m6, m0 >>> + psrlq m0, m6, 16 >>> + paddw m6, m0 >>> + movd eax, m6 >>> + movzx eax, ax >> >> You could use the HADDW macro here. >> > > error: undefined symbol `pw_1' (first use) > > sounds somehow constraining. I'll keep my version until you benchmark to > prove me HADDW is faster on an old MMX cpu ;)
I have no idea if it's faster, nor a way to test that for that matter. It's four instructions instead of six, but pmaddwd + memory operand is probably not fast enough on old cpus. > >>> +;------------------------------------------------------------------------------- >>> +; int ff_pixelutils_sad_8x8_mmxext(const uint8_t *src1, ptrdiff_t stride1, >>> +; const uint8_t *src2, ptrdiff_t stride2); >>> +;------------------------------------------------------------------------------- >>> +INIT_MMX mmxext >>> +cglobal pixelutils_sad_8x8, 4,4,0, src1, stride1, src2, stride2 >>> + pxor m2, m2 >>> +%rep 4 >>> + mova m0, [src1q] >>> + mova m1, [src1q + stride1q] >>> + psadbw m0, [src2q] >>> + psadbw m1, [src2q + stride2q] >>> + paddw m2, m0 >>> + paddw m2, m1 >>> + lea src1q, [src1q + 2*stride1q] >>> + lea src2q, [src2q + 2*stride2q] >>> +%endrep >>> + movd eax, m2 >>> + RET >> >> Adding sad16x16 mmxext should be a matter of using add instead of lea, >> changing >> the %rep amount, and using 8 instead of stride[12]q for the mova and psadbw. >> > > Yeah right, added. Thanks. > >>> --- /dev/null >>> +++ b/libavutil/x86/pixelutils.h >>> @@ -0,0 +1,26 @@ >>> +/* >>> + * This file is part of FFmpeg. >>> + * >>> + * FFmpeg is free software; you can redistribute it and/or >>> + * modify it under the terms of the GNU Lesser General Public >>> + * License as published by the Free Software Foundation; either >>> + * version 2.1 of the License, or (at your option) any later version. >>> + * >>> + * FFmpeg is distributed in the hope that it will be useful, >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU >>> + * Lesser General Public License for more details. >>> + * >>> + * You should have received a copy of the GNU Lesser General Public >>> + * License along with FFmpeg; if not, write to the Free Software >>> + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA >>> 02110-1301 USA >>> + */ >>> + >>> +#ifndef AVUTIL_X86_PIXELUTILS_H >>> +#define AVUTIL_X86_PIXELUTILS_H >>> + >>> +#include "libavutil/pixelutils.h" >>> + >>> +void ff_pixelutils_init_x86(AVPixelUtils *s); >> >> This prototype should be in libavutil/pixelutils.h >> No need to make a whole new header just for it. >> > > No, libavutil/pixelutils.h is public, I don't want to have private > prototypes in it. Right, forgot it was public. I had lavc dsp stuff in mind when i said that. > >> Maybe you could add a quick test for these functions? Look at >> lavc/motion-test.c and >> lavu/float-dsp.c > > Added. > > I'll resubmit a patchset in a moment. > > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel