On date Tuesday 2015-06-16 14:16:11 +0200, Gwenole Beauchesne encoded: > Hi, > > 2015-06-16 14:03 GMT+02:00 Michael Niedermayer <michae...@gmx.at>: [...] > >> +#if HAVE_SSE2 > >> +/* Copy 16/64 bytes from srcp to dstp loading data with the SSE>=2 > >> instruction > >> + * load and storing data with the SSE>=2 instruction store. > >> + */ > >> +#define COPY16(dstp, srcp, load, store) \ > >> + __asm__ volatile ( \ > >> + load " 0(%[src]), %%xmm1\n" \ > >> + store " %%xmm1, 0(%[dst])\n" \ > >> + : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1") > >> + > >> +#define COPY64(dstp, srcp, load, store) \ > >> + __asm__ volatile ( \ > >> + load " 0(%[src]), %%xmm1\n" \ > >> + load " 16(%[src]), %%xmm2\n" \ > >> + load " 32(%[src]), %%xmm3\n" \ > >> + load " 48(%[src]), %%xmm4\n" \ > >> + store " %%xmm1, 0(%[dst])\n" \ > >> + store " %%xmm2, 16(%[dst])\n" \ > >> + store " %%xmm3, 32(%[dst])\n" \ > >> + store " %%xmm4, 48(%[dst])\n" \ > >> + : : [dst]"r"(dstp), [src]"r"(srcp) : "memory", "xmm1", "xmm2", > >> "xmm3", "xmm4") > >> +#endif > >> + > >> +#define COPY_LINE(dstp, srcp, size, load) \ > >> + const unsigned unaligned = (-(uintptr_t)srcp) & 0x0f; \ > >> + unsigned x = unaligned; \ > >> + \ > >> + av_assert0(((intptr_t)dstp & 0x0f) == 0); \ > >> + \ > >> + __asm__ volatile ("mfence"); \ > >> + if (!unaligned) { \ > >> + for (; x+63 < size; x += 64) \ > >> + COPY64(&dstp[x], &srcp[x], load, "movdqa"); \ > >> + } else { \ > >> + COPY16(dst, src, "movdqu", "movdqa"); \ > >> + for (; x+63 < size; x += 64) \ > >> + COPY64(&dstp[x], &srcp[x], load, "movdqu"); \ > > > > to use SSE registers in inline asm operands or clobber list you need > > to build with -msse (which probably is default on on x86-64) > > > > files build with -msse will result in undefined behavior if anything > > in them is executed on a pre SSE cpu, as these allow gcc to put > > SSE instructions directly in the code where it likes > > > > The way out of this "design" is not to tell gcc that it passes > > a string with SSE code to the assembler > > that is not to use SSE registers in operands and not to put them > > on the clobber list unless gcc actually is in SSE mode and can use and > > need them there. > > see XMM_CLOBBERS* > > Well, from past experience, lying to gcc is generally not a good thing > either. There are multiple interesting ways it could fail from time to > time. :) > > Other approaches: > - With GCC >= 4.4, you can use __attribute__((target(T))) where T = > "ssse3", "sse4.1", etc. This is the easiest way ; > - Split into several separate files per target. Though, one would then > argue that while we are at it why not just start moving to yasm. >
> The former approach looks more appealing to me, considering there may > be an effort to migrate to yasm afterwards. I plan to port this patch to yasm. I'll ask for help on IRC since probably it will take too much time otherwise without any guidance. -- FFmpeg = Friendly and Fancy Mind-dumbing Pacific Easy Generator _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel