[FFmpeg-devel] [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads.
I initially discovered a signed integer overflow on this line. Since this value is updated in multiple threads, I use an atomic update and as it happens atomic addition is defined to wrap around. However, there's still a potential bug in that the error_count may wrap around and equal zero again causing problems down the line. --- libavcodec/mpeg12dec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c index d5bc5f21b2..b7c3b5106e 100644 --- a/libavcodec/mpeg12dec.c +++ b/libavcodec/mpeg12dec.c @@ -28,6 +28,7 @@ #define UNCHECKED_BITSTREAM_READER 1 #include +#include "libavutil/atomic.h" #include "libavutil/attributes.h" #include "libavutil/imgutils.h" #include "libavutil/internal.h" @@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext *avctx, AVFrame *picture, &s2->thread_context[0], NULL, s->slice_count, sizeof(void *)); for (i = 0; i < s->slice_count; i++) -s2->er.error_count += s2->thread_context[i]->er.error_count; + avpriv_atomic_int_add_and_fetch(&s2->er.error_count, s2->thread_context[i]->er.error_count); } ret = slice_end(avctx, picture); -- 2.15.0.448.gf294e3d99a-goog smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads.
Sorry! Let's try an attachment then. On 16 November 2017 at 14:36, Michael Niedermayer wrote: > On Thu, Nov 16, 2017 at 12:41:32PM -0800, Nick Lewycky wrote: >> I initially discovered a signed integer overflow on this line. Since >> this value is updated in multiple threads, I use an atomic update and >> as it happens atomic addition is defined to wrap around. However, >> there's still a potential bug in that the error_count may wrap around >> and equal zero again causing problems down the line. >> >> --- >> libavcodec/mpeg12dec.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c >> index d5bc5f21b2..b7c3b5106e 100644 >> --- a/libavcodec/mpeg12dec.c >> +++ b/libavcodec/mpeg12dec.c >> @@ -28,6 +28,7 @@ >> #define UNCHECKED_BITSTREAM_READER 1 >> #include >> >> +#include "libavutil/atomic.h" >> #include "libavutil/attributes.h" >> #include "libavutil/imgutils.h" >> #include "libavutil/internal.h" >> @@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext *avctx, >> AVFrame *picture, >> &s2->thread_context[0], NULL, >> s->slice_count, sizeof(void *)); >> for (i = 0; i < s->slice_count; i++) >> -s2->er.error_count += >> s2->thread_context[i]->er.error_count; >> + >> avpriv_atomic_int_add_and_fetch(&s2->er.error_count, >> s2->thread_context[i]->er.error_count); >> } > > This patch is corrupted by newlines > > [...] > > -- > Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB > > Dictatorship naturally arises out of democracy, and the most aggravated > form of tyranny and slavery out of the most extreme liberty. -- Plato > > ___ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > From cd8ed0ee35853ae089df3d904846879ce4f00c4a Mon Sep 17 00:00:00 2001 From: Nick Lewycky Date: Thu, 16 Nov 2017 11:50:38 -0800 Subject: [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads. --- libavcodec/mpeg12dec.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c index d5bc5f21b2..b7c3b5106e 100644 --- a/libavcodec/mpeg12dec.c +++ b/libavcodec/mpeg12dec.c @@ -28,6 +28,7 @@ #define UNCHECKED_BITSTREAM_READER 1 #include +#include "libavutil/atomic.h" #include "libavutil/attributes.h" #include "libavutil/imgutils.h" #include "libavutil/internal.h" @@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext *avctx, AVFrame *picture, &s2->thread_context[0], NULL, s->slice_count, sizeof(void *)); for (i = 0; i < s->slice_count; i++) -s2->er.error_count += s2->thread_context[i]->er.error_count; +avpriv_atomic_int_add_and_fetch(&s2->er.error_count, s2->thread_context[i]->er.error_count); } ret = slice_end(avctx, picture); -- 2.15.0.448.gf294e3d99a-goog smime.p7s Description: S/MIME Cryptographic Signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads.
Sorry-- what should I do now? Wait for another patch to go in first then rebase on top of it? Attempt to migrate error_count to C11 atomics myself? If I'm migrating, is there a primer on how ffmpeg uses C11 atomics? For instance, given an atomic_int, should I assign to it with equality, atomic_store, or atomic_store_explicit and if picking atomic_store_explicit how clever should I be when picking memory orderings? Also, ERContext also has a non-volatile non-atomic int error_occurred. Sometimes we update the error_count without adjusting error_occurred. Is the idea that error_count is shared across threads and error_count is checked at the end? Given that error_count could wrap around and equal zero, should I make it atomic too, and set it to 1 every time we set error_count to non-zero? I've attached a patch with my initial attempt to use C11 atomics. Nick On 17 November 2017 at 12:49, Michael Niedermayer wrote: > On Fri, Nov 17, 2017 at 05:10:17PM -0300, James Almer wrote: > > On 11/17/2017 4:20 PM, James Almer wrote: > > > On 11/17/2017 4:16 PM, Michael Niedermayer wrote: > > >> On Thu, Nov 16, 2017 at 03:07:54PM -0800, Nick Lewycky wrote: > > >>> Sorry! Let's try an attachment then. > > >>> > > >>> On 16 November 2017 at 14:36, Michael Niedermayer > > >>> wrote: > > >>>> On Thu, Nov 16, 2017 at 12:41:32PM -0800, Nick Lewycky wrote: > > >>>>> I initially discovered a signed integer overflow on this line. > Since > > >>>>> this value is updated in multiple threads, I use an atomic update > and > > >>>>> as it happens atomic addition is defined to wrap around. However, > > >>>>> there's still a potential bug in that the error_count may wrap > around > > >>>>> and equal zero again causing problems down the line. > > >>>>> > > >>>>> --- > > >>>>> libavcodec/mpeg12dec.c | 3 ++- > > >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) > > >>>>> > > >>>>> diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c > > >>>>> index d5bc5f21b2..b7c3b5106e 100644 > > >>>>> --- a/libavcodec/mpeg12dec.c > > >>>>> +++ b/libavcodec/mpeg12dec.c > > >>>>> @@ -28,6 +28,7 @@ > > >>>>> #define UNCHECKED_BITSTREAM_READER 1 > > >>>>> #include > > >>>>> > > >>>>> +#include "libavutil/atomic.h" > > >>>>> #include "libavutil/attributes.h" > > >>>>> #include "libavutil/imgutils.h" > > >>>>> #include "libavutil/internal.h" > > >>>>> @@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext > *avctx, > > >>>>> AVFrame *picture, > > >>>>> &s2->thread_context[0], NULL, > > >>>>> s->slice_count, sizeof(void > *)); > > >>>>> for (i = 0; i < s->slice_count; i++) > > >>>>> -s2->er.error_count += > > >>>>> s2->thread_context[i]->er.error_count; > > >>>>> + > > >>>>> avpriv_atomic_int_add_and_fetch(&s2->er.error_count, > > >>>>> s2->thread_context[i]->er.error_count); > > >>>>> } > > >>>> > > >>>> This patch is corrupted by newlines > > >>>> > > >>>> [...] > > >>>> > > >>>> -- > > >>>> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC7 > 87040B0FAB > > >>>> > > >>>> Dictatorship naturally arises out of democracy, and the most > aggravated > > >>>> form of tyranny and slavery out of the most extreme liberty. -- > Plato > > >>>> > > >>>> ___ > > >>>> ffmpeg-devel mailing list > > >>>> ffmpeg-devel@ffmpeg.org > > >>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > >>>> > > >> > > >>> mpeg12dec.c |3 ++- > > >>> 1 file changed, 2 insertions(+), 1 deletion(-) > > >>> eed6baa2f9ae5b6fcd227e4b7fa0e87e9ca55b64 > 0001-libavcodec-mpeg12dec-Use-atomic-addition-for-value-u.patch > &g
[FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
Newer versions of clang will allocate %rbx as one of the inline asm inputs, even in PIC. This occurs when building ffmpeg with clang -fsanitize=address -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or off, just include %bx in the clobber list regardless of whether PIC is on or off. Patch attached, please review and apply on my behalf. Nick PS. While looking at this code, I have a couple more comments: a) What is supposed to be at -8(%rsp) that the asm saves and restores? Doesn't this depend on whether the compiler is using %sp or %bp? A few of us compiler engineers couldn't figure out what this is trying to do (save a return value to %rax? but it returns void? or to save the return address of the function? but it's never clobbered?) b) There's an inline comment suggesting that one variable in the inline asm is "buf2". There is no "buf2" (did you mean "dst2" or "src2"?) diff --git a/libswscale/x86/hscale_fast_bilinear_simd.c b/libswscale/x86/hscale_fast_bilinear_simd.c index 103793d..bfb23e2 100644 --- a/libswscale/x86/hscale_fast_bilinear_simd.c +++ b/libswscale/x86/hscale_fast_bilinear_simd.c @@ -198,26 +198,15 @@ void ff_hyscale_fast_mmxext(SwsContext *c, int16_t *dst, int16_t *filter= c->hLumFilter; void*mmxextFilterCode = c->lumMmxextFilterCode; int i; -#if defined(PIC) -uint64_t ebxsave; -#endif #if ARCH_X86_64 uint64_t retsave; #endif __asm__ volatile( -#if defined(PIC) -"mov %%"REG_b", %5\n\t" -#if ARCH_X86_64 -"mov -8(%%rsp), %%"REG_a" \n\t" -"mov %%"REG_a", %6\n\t" -#endif -#else #if ARCH_X86_64 "mov -8(%%rsp), %%"REG_a" \n\t" "mov %%"REG_a", %5\n\t" #endif -#endif "pxor %%mm7, %%mm7 \n\t" "mov %0, %%"REG_c" \n\t" "mov %1, %%"REG_D" \n\t" @@ -256,30 +245,16 @@ void ff_hyscale_fast_mmxext(SwsContext *c, int16_t *dst, CALL_MMXEXT_FILTER_CODE CALL_MMXEXT_FILTER_CODE -#if defined(PIC) -"mov %5, %%"REG_b" \n\t" -#if ARCH_X86_64 -"mov %6, %%"REG_a" \n\t" -"mov %%"REG_a", -8(%%rsp) \n\t" -#endif -#else #if ARCH_X86_64 "mov %5, %%"REG_a" \n\t" "mov %%"REG_a", -8(%%rsp) \n\t" #endif -#endif :: "m" (src), "m" (dst), "m" (filter), "m" (filterPos), "m" (mmxextFilterCode) -#if defined(PIC) - ,"m" (ebxsave) -#endif #if ARCH_X86_64 ,"m"(retsave) #endif -: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D -#if !defined(PIC) - ,"%"REG_b -#endif +: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D ,"%"REG_b ); for (i=dstWidth-1; (i*xInc)>>16 >=srcW-1; i--) @@ -294,26 +269,15 @@ void ff_hcscale_fast_mmxext(SwsContext *c, int16_t *dst1, int16_t *dst2, int16_t *filter= c->hChrFilter; void*mmxextFilterCode = c->chrMmxextFilterCode; int i; -#if defined(PIC) -DECLARE_ALIGNED(8, uint64_t, ebxsave); -#endif #if ARCH_X86_64 DECLARE_ALIGNED(8, uint64_t, retsave); #endif __asm__ volatile( -#if defined(PIC) -"mov %%"REG_b", %7 \n\t" -#if ARCH_X86_64 -"mov -8(%%rsp), %%"REG_a" \n\t" -"mov %%"REG_a", %8 \n\t" -#endif -#else #if ARCH_X86_64 "mov -8(%%rsp), %%"REG_a" \n\t" "mov %%"REG_a", %7 \n\t" #endif -#endif "pxor %%mm7, %%mm7 \n\t" "mov %0, %%"REG_c" \n\t" "mov %1, %%"REG_D" \n\t" @@ -340,30 +304,16 @@ void ff_hcscale_fast_mmxext(SwsContext *c, int16_t *dst1, int16_t *dst2, CALL_MMXEXT_FILTER_CODE CALL_MMXEXT_FILTER_CODE -#if defined(PIC) -"mov %7, %%"REG_b"\n\t" -#if ARCH_X86_64 -"mov %8, %%"REG_a" \n\t" -"mov %%"REG_a", -8(%%rsp) \n\t" -#endif -#else #if ARCH_X86_64 "mov %7, %%"REG_a" \n\t" "mov %%"REG_a", -8(%%rsp) \n\t" #endif -#endif :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos), "m" (mmxextFilterCode), "m" (src2), "m"(dst2) -#if defined(PIC) - ,"m" (ebxsave) -#endif #if ARCH_X86_64 ,"m"(retsave) #endif -: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D -#if !defined(PIC) - ,"%"REG_b -#endif +: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D ,"%"REG_b ); for (i=dstWidth-1; (i*xInc)>>16 >=srcW-1; i--) { ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 6 May 2015 at 11:58, Carl Eugen Hoyos wrote: > Nick Lewycky google.com> writes: > > > This occurs when building ffmpeg with clang > > -fsanitize=address -O1 -fPIE. > > What is the usecase for -O1? > It's very important for asan. ASan instruments every memory access (outside inline asm, for now) and if we don't run the optimizer then every local variable is a stack allocation with loads and stores, and all those loads and stores get instrumented leading to both crazy slowdown and binary size increase. Did you check performance with and without your patch? > No. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 6 May 2015 at 13:08, Carl Eugen Hoyos wrote: > Nick Lewycky google.com> writes: > > > On 6 May 2015 at 11:58, Carl Eugen Hoyos: > > > > > Nick Lewycky google.com> writes: > > > > > > > This occurs when building ffmpeg with clang > > > > -fsanitize=address -O1 -fPIE. > > > > > > What is the usecase for -O1? > > > > It's very important for asan. ASan instruments > > every memory access (outside inline asm, for now) > > and if we don't run the optimizer then every local > > variable is a stack allocation with loads and > > stores, and all those loads and stores get > > instrumented leading to both crazy slowdown and > > binary size increase. > > What I meant was (and I believe I wasn't clear): > What is the usecase for -O1 as opposed to default > optimization? > Or in other words: Is the issue with clang also > reproducible with -O2 and -O3? > Oh I see. Yes, it does. I happened to use clang -O1 because that's our normal "unoptimized" ASan configuration where I first found the bug. I assume it doesn't happen at -O0 but I haven't checked, it does happen at higher optimization levels too. > > Did you check performance with and without > > > your patch? > > > > No. > > I believe such a performance test is a precondition > for an asm patch (but I am not the maintainer). > I'm new here, I don't even know to test ffmpeg performance. I'm happy to run something if you have a suggestion? My two arguments are that correctness trumps performance (otherwise ffmpeg could be "return 0" -- don't tempt me, I'm already a compiler optimization engineer ;-) ), and that this is exactly what ffmpeg already did with PIC disabled. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 6 May 2015 at 15:06, Michael Niedermayer wrote: > Hi > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote: > > Newer versions of clang will allocate %rbx as one of the inline asm > inputs, > > Thats great > > > > even in PIC. This occurs when building ffmpeg with clang > -fsanitize=address > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or off, > just > > include %bx in the clobber list regardless of whether PIC is on or off. > > you cant include *bx in the clobber list with PIC > it breaks compilers that are less great, like gcc > Doh!! I did check for this, but only tried x86-64, not x86-32. Sorry! gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > ‘ff_hyscale_fast_mmxext’: > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC register > clobbered by ‘%ebx’ in ‘asm’ > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > ‘ff_hcscale_fast_mmxext’: > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC register > clobbered by ‘%ebx’ in ‘asm’ > > > also what exactly are you trying to fix ? > or rather what exactly goes how exactly wrong with the code as it is > if rbx is used ? > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives evaluated in PIC x86-64, the inline constraints work out to: :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos), "m" (mmxextFilterCode), "m" (src2), "m"(dst2) ,"m" (ebxsave) ,"m"(retsave) : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D so clang looks at that and decides that it can pick src1 = (%r10), dst1 = (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode = (%r15), src2 = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The problem there is src2 being (%rbx). Now let's look at how we use them: "mov %0, %%"REG_c" \n\t" "mov %1, %%"REG_D" \n\t" "mov %2, %%"REG_d" \n\t" "mov %3, %%"REG_b" \n\t" // Clobbers %rbx / src2 / %5 here "xor %%"REG_a", %%"REG_a" \n\t" PREFETCH" (%%"REG_c") \n\t" PREFETCH" 32(%%"REG_c") \n\t" PREFETCH" 64(%%"REG_c") \n\t" CALL_MMXEXT_FILTER_CODE CALL_MMXEXT_FILTER_CODE CALL_MMXEXT_FILTER_CODE CALL_MMXEXT_FILTER_CODE "xor %%"REG_a", %%"REG_a" \n\t" "mov %5, %%"REG_c" \n\t" // %5 is read too late, we get %3 / filterPos instead "mov %6, %%"REG_D" \n\t" PREFETCH" (%%"REG_c") \n\t" PREFETCH" 32(%%"REG_c") \n\t" PREFETCH" 64(%%"REG_c") \n\t" CALL_MMXEXT_FILTER_CODE // Crash... The two marked instructions are intended to refer to different contents. CALL_MMXEXT_FILTER_CODE moves RBX to ESI and calls mmxextFilterCode: "movl(%%"REG_b"), %%esi \n\t"\ "call*%4\n\t"\ // Crash... That callee ultimately segfaults: => 0x7fffefa45000: movq (%rdx,%rax,1),%mm3 => 0x7fffefa45004: movd (%rcx,%rsi,1),%mm0 Program received signal SIGSEGV, Segmentation fault. 0x7fffefa45004 in ?? () => 0x7fffefa45004: movd (%rcx,%rsi,1),%mm0 I'm trying to fix this segfault. also ideally changes to this code should be tested against gcc/clang/icc > of various versions with and without PIC, 32 and 64 bit > this code is more tricky than than the average so theres a good > change changes to it will break some compiler > I understand that, but I may not be able to test as many environments as you like. I'm going to give testing it my best effort, but I can tell you up front that I'm only going to test gcc and clang, on an x86 Ubuntu machine. I don't have ICC, so I can't test that. ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 6 May 2015 at 18:03, Michael Niedermayer wrote: > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote: > > On 6 May 2015 at 15:06, Michael Niedermayer wrote: > > > > > Hi > > > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote: > > > > Newer versions of clang will allocate %rbx as one of the inline asm > > > inputs, > > > > > > Thats great > > > > > > > > > > even in PIC. This occurs when building ffmpeg with clang > > > -fsanitize=address > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or off, > > > just > > > > include %bx in the clobber list regardless of whether PIC is on or > off. > > > > > > you cant include *bx in the clobber list with PIC > > > it breaks compilers that are less great, like gcc > > > > > > > Doh!! I did check for this, but only tried x86-64, not x86-32. Sorry! > > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > ‘ff_hyscale_fast_mmxext’: > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC > register > > > clobbered by ‘%ebx’ in ‘asm’ > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > ‘ff_hcscale_fast_mmxext’: > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC > register > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > > > > > also what exactly are you trying to fix ? > > > or rather what exactly goes how exactly wrong with the code as it is > > > if rbx is used ? > > > > > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives > evaluated > > in PIC x86-64, the inline constraints work out to: > > > > :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos), > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2) > > ,"m" (ebxsave) > > ,"m"(retsave) > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D > > > > so clang looks at that and decides that it can pick src1 = (%r10), dst1 = > > (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode = (%r15), > src2 > > = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The problem > > there is src2 being (%rbx). > > > > Now let's look at how we use them: > > > > "mov %0, %%"REG_c" \n\t" > > "mov %1, %%"REG_D" \n\t" > > "mov %2, %%"REG_d" \n\t" > > "mov %3, %%"REG_b" \n\t" // Clobbers %rbx / > src2 > > / %5 here > > "xor %%"REG_a", %%"REG_a" \n\t" > > PREFETCH" (%%"REG_c") \n\t" > > PREFETCH" 32(%%"REG_c") \n\t" > > PREFETCH" 64(%%"REG_c") \n\t" > > > > CALL_MMXEXT_FILTER_CODE > > CALL_MMXEXT_FILTER_CODE > > CALL_MMXEXT_FILTER_CODE > > CALL_MMXEXT_FILTER_CODE > > "xor %%"REG_a", %%"REG_a" \n\t" > > "mov %5, %%"REG_c" \n\t" // %5 is read too > late, > > we get %3 / filterPos instead > > "mov %6, %%"REG_D" \n\t" > > PREFETCH" (%%"REG_c") \n\t" > > PREFETCH" 32(%%"REG_c") \n\t" > > PREFETCH" 64(%%"REG_c") \n\t" > > > > CALL_MMXEXT_FILTER_CODE // Crash... > > > > The two marked instructions are intended to refer to different contents. > > CALL_MMXEXT_FILTER_CODE moves RBX to ESI and calls mmxextFilterCode: > > > > "movl(%%"REG_b"), %%esi \n\t"\ > > "call*%4\n\t"\ // Crash... > > > > That callee ultimately segfaults: > > > > => 0x7fffefa45000: movq (%rdx,%rax,1),%mm3 > > => 0x7fffefa45004: movd (%rcx,%rsi,1),%mm0 > > Program received signal SIGSEGV, Segmentation fault. > > 0x7fffefa45004 in ?? () > > => 0x7fffefa45004: movd
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 8 May 2015 at 12:06, Michael Niedermayer wrote: > On Fri, May 08, 2015 at 10:50:49AM -0700, Nick Lewycky wrote: > > On 6 May 2015 at 18:03, Michael Niedermayer wrote: > > > > > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote: > > > > On 6 May 2015 at 15:06, Michael Niedermayer > wrote: > > > > > > > > > Hi > > > > > > > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote: > > > > > > Newer versions of clang will allocate %rbx as one of the inline > asm > > > > > inputs, > > > > > > > > > > Thats great > > > > > > > > > > > > > > > > even in PIC. This occurs when building ffmpeg with clang > > > > > -fsanitize=address > > > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or > off, > > > > > just > > > > > > include %bx in the clobber list regardless of whether PIC is on > or > > > off. > > > > > > > > > > you cant include *bx in the clobber list with PIC > > > > > it breaks compilers that are less great, like gcc > > > > > > > > > > > > > Doh!! I did check for this, but only tried x86-64, not x86-32. Sorry! > > > > > > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > > > ‘ff_hyscale_fast_mmxext’: > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC > > > register > > > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > > > ‘ff_hcscale_fast_mmxext’: > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC > > > register > > > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > > > > > > > > > > > also what exactly are you trying to fix ? > > > > > or rather what exactly goes how exactly wrong with the code as it > is > > > > > if rbx is used ? > > > > > > > > > > > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives > > > evaluated > > > > in PIC x86-64, the inline constraints work out to: > > > > > > > > :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos), > > > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2) > > > > ,"m" (ebxsave) > > > > ,"m"(retsave) > > > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D > > > > > > > > so clang looks at that and decides that it can pick src1 = (%r10), > dst1 = > > > > (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode = > (%r15), > > > src2 > > > > = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The > problem > > > > there is src2 being (%rbx). > > > > > > > > Now let's look at how we use them: > > > > > > > > "mov %0, %%"REG_c" \n\t" > > > > "mov %1, %%"REG_D" \n\t" > > > > "mov %2, %%"REG_d" \n\t" > > > > "mov %3, %%"REG_b" \n\t" // Clobbers %rbx / > > > src2 > > > > / %5 here > > > > "xor %%"REG_a", %%"REG_a" \n\t" > > > > PREFETCH" (%%"REG_c") \n\t" > > > > PREFETCH" 32(%%"REG_c") \n\t" > > > > PREFETCH" 64(%%"REG_c") \n\t" > > > > > > > > CALL_MMXEXT_FILTER_CODE > > > > CALL_MMXEXT_FILTER_CODE > > > > CALL_MMXEXT_FILTER_CODE > > > > CALL_MMXEXT_FILTER_CODE > > > > "xor %%"REG_a", %%"REG_a" \n\t" > > > > "mov %5, %%"REG_c" \n\t" // %5 is read too > > > late, > > > > we get %3 / filterPos instead > > > > "mov %6, %%"REG_D" \n
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 8 May 2015 at 14:28, Michael Niedermayer wrote: > On Fri, May 08, 2015 at 12:43:20PM -0700, Nick Lewycky wrote: > > On 8 May 2015 at 12:06, Michael Niedermayer wrote: > > > > > On Fri, May 08, 2015 at 10:50:49AM -0700, Nick Lewycky wrote: > > > > On 6 May 2015 at 18:03, Michael Niedermayer > wrote: > > > > > > > > > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote: > > > > > > On 6 May 2015 at 15:06, Michael Niedermayer > > > wrote: > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote: > > > > > > > > Newer versions of clang will allocate %rbx as one of the > inline > > > asm > > > > > > > inputs, > > > > > > > > > > > > > > Thats great > > > > > > > > > > > > > > > > > > > > > > even in PIC. This occurs when building ffmpeg with clang > > > > > > > -fsanitize=address > > > > > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is > on or > > > off, > > > > > > > just > > > > > > > > include %bx in the clobber list regardless of whether PIC is > on > > > or > > > > > off. > > > > > > > > > > > > > > you cant include *bx in the clobber list with PIC > > > > > > > it breaks compilers that are less great, like gcc > > > > > > > > > > > > > > > > > > > Doh!! I did check for this, but only tried x86-64, not x86-32. > Sorry! > > > > > > > > > > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > > > > > ‘ff_hyscale_fast_mmxext’: > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC > > > > > register > > > > > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > > > > > ‘ff_hcscale_fast_mmxext’: > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC > > > > > register > > > > > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > > > > > > > > > > > > > > > > > also what exactly are you trying to fix ? > > > > > > > or rather what exactly goes how exactly wrong with the code as > it > > > is > > > > > > > if rbx is used ? > > > > > > > > > > > > > > > > > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives > > > > > evaluated > > > > > > in PIC x86-64, the inline constraints work out to: > > > > > > > > > > > > :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos), > > > > > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2) > > > > > > ,"m" (ebxsave) > > > > > > ,"m"(retsave) > > > > > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D > > > > > > > > > > > > so clang looks at that and decides that it can pick src1 = > (%r10), > > > dst1 = > > > > > > (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode = > > > (%r15), > > > > > src2 > > > > > > = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The > > > problem > > > > > > there is src2 being (%rbx). > > > > > > > > > > > > Now let's look at how we use them: > > > > > > > > > > > > "mov %0, %%"REG_c" \n\t" > > > > > > "mov %1, %%"REG_D" \n\t" > > > > > > "mov %2, %%"REG_d" \n\t" > > > > > > "mov %3, %%"REG_b" \n\t" // Clobbers > %rbx / > > > > > src2 > > > > > > / %5 h
Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx
On 11 May 2015 at 16:18, Michael Niedermayer wrote: > On Mon, May 11, 2015 at 03:55:40PM -0700, Nick Lewycky wrote: > > On 8 May 2015 at 14:28, Michael Niedermayer wrote: > > > > > On Fri, May 08, 2015 at 12:43:20PM -0700, Nick Lewycky wrote: > > > > On 8 May 2015 at 12:06, Michael Niedermayer > wrote: > > > > > > > > > On Fri, May 08, 2015 at 10:50:49AM -0700, Nick Lewycky wrote: > > > > > > On 6 May 2015 at 18:03, Michael Niedermayer > > > wrote: > > > > > > > > > > > > > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote: > > > > > > > > On 6 May 2015 at 15:06, Michael Niedermayer < > michae...@gmx.at> > > > > > wrote: > > > > > > > > > > > > > > > > > Hi > > > > > > > > > > > > > > > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky > wrote: > > > > > > > > > > Newer versions of clang will allocate %rbx as one of the > > > inline > > > > > asm > > > > > > > > > inputs, > > > > > > > > > > > > > > > > > > Thats great > > > > > > > > > > > > > > > > > > > > > > > > > > > > even in PIC. This occurs when building ffmpeg with clang > > > > > > > > > -fsanitize=address > > > > > > > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC > is > > > on or > > > > > off, > > > > > > > > > just > > > > > > > > > > include %bx in the clobber list regardless of whether > PIC is > > > on > > > > > or > > > > > > > off. > > > > > > > > > > > > > > > > > > you cant include *bx in the clobber list with PIC > > > > > > > > > it breaks compilers that are less great, like gcc > > > > > > > > > > > > > > > > > > > > > > > > > Doh!! I did check for this, but only tried x86-64, not > x86-32. > > > Sorry! > > > > > > > > > > > > > > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3 > > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > > > > > > > ‘ff_hyscale_fast_mmxext’: > > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: > error: PIC > > > > > > > register > > > > > > > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function > > > > > > > > > ‘ff_hcscale_fast_mmxext’: > > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: > error: PIC > > > > > > > register > > > > > > > > > clobbered by ‘%ebx’ in ‘asm’ > > > > > > > > > > > > > > > > > > > > > > > > > > > also what exactly are you trying to fix ? > > > > > > > > > or rather what exactly goes how exactly wrong with the > code as > > > it > > > > > is > > > > > > > > > if rbx is used ? > > > > > > > > > > > > > > > > > > > > > > > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor > directives > > > > > > > evaluated > > > > > > > > in PIC x86-64, the inline constraints work out to: > > > > > > > > > > > > > > > > :: "m" (src1), "m" (dst1), "m" (filter), "m" > (filterPos), > > > > > > > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2) > > > > > > > > ,"m" (ebxsave) > > > > > > > > ,"m"(retsave) > > > > > > > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D > > > > > > > > > > > > > > > > so clang looks at that and decides that it can pick src1 = > > > (%r10), > > > > >