On Thu, Apr 23, 2015 at 12:20:38AM -0400, Tucker DiNapoli wrote: > I added a new file with the sse2/avx2 code for do_a_deblock. > I also moved the code for running vertical deblock filters into it's own > function, both to clean up the postprocess funciton and to make it > easier to integrate the new sse2/avx2 versions of these filters. > --- > libpostproc/postprocess_template.c | 123 +++++++--- > libpostproc/x86/Makefile | 1 + > libpostproc/x86/deblock.asm | 454 > +++++++++++++++++++++++++++++++++++++ > 3 files changed, 545 insertions(+), 33 deletions(-) > create mode 100644 libpostproc/x86/deblock.asm
putting a av_log() before the old inline asm for do_a_deblock*() and a jump to NULL in the yasm code shows that only the old code is executed when testing as in: ./ffplay matrixbench_mpeg2.mpg -vf pp=ha/va postproc clearly does not use the new code so i have no idea how to test it tested both on AVX and AVX2 machines also there is: In file included from libpostproc/postprocess.c:538:0: libpostproc/postprocess_template.c: In function ‘deblock_MMX’: libpostproc/postprocess_template.c:3414:20: note: The ABI for passing parameters with 32-byte alignment has changed in GCC 4.6 static inline void RENAME(deblock)(uint8_t *dstBlock, int stride, ^ diff --git a/libpostproc/postprocess_template.c b/libpostproc/postprocess_template.c index 9bff458..f98a00c 100644 --- a/libpostproc/postprocess_template.c +++ b/libpostproc/postprocess_template.c @@ -2649,6 +2649,7 @@ static av_always_inline void RENAME(do_a_deblock)(uint8_t *src, int step, int st int64_t dc_mask, eq_mask, both_masks; int64_t sums[10*8*2]; src+= step*3; // src points to begin of the 8x8 Block + av_log(0,0, "Old do_a_deblock\n"); //{ START_TIMER __asm__ volatile( "movq %0, %%mm7 \n\t" diff --git a/libpostproc/x86/deblock.asm b/libpostproc/x86/deblock.asm index fbee291..1aa91f5 100644 --- a/libpostproc/x86/deblock.asm +++ b/libpostproc/x86/deblock.asm @@ -28,6 +28,9 @@ cglobal do_a_deblock, 5, 6, 7, 22 * mmsize ;src, step, stride, ppcontext, mode ;; stride, mode arguments are unused, but kept for compatability with ;; existing c version. They will be removed eventually +xor r0, r0 +jmp r0 + lea r0, [r0 + r1*2] add r0, r1 [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB it is not once nor twice but times without number that the same ideas make their appearance in the world. -- Aristotle
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel