[FFmpeg-devel] [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads.

2017-11-16 Thread Nick Lewycky
I initially discovered a signed integer overflow on this line. Since
this value is updated in multiple threads, I use an atomic update and
as it happens atomic addition is defined to wrap around. However,
there's still a potential bug in that the error_count may wrap around
and equal zero again causing problems down the line.

---
 libavcodec/mpeg12dec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index d5bc5f21b2..b7c3b5106e 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -28,6 +28,7 @@
 #define UNCHECKED_BITSTREAM_READER 1
 #include 

+#include "libavutil/atomic.h"
 #include "libavutil/attributes.h"
 #include "libavutil/imgutils.h"
 #include "libavutil/internal.h"
@@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext *avctx,
AVFrame *picture,
&s2->thread_context[0], NULL,
s->slice_count, sizeof(void *));
 for (i = 0; i < s->slice_count; i++)
-s2->er.error_count +=
s2->thread_context[i]->er.error_count;
+
avpriv_atomic_int_add_and_fetch(&s2->er.error_count,
s2->thread_context[i]->er.error_count);
 }

 ret = slice_end(avctx, picture);
-- 
2.15.0.448.gf294e3d99a-goog


smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads.

2017-11-16 Thread Nick Lewycky
Sorry! Let's try an attachment then.

On 16 November 2017 at 14:36, Michael Niedermayer
 wrote:
> On Thu, Nov 16, 2017 at 12:41:32PM -0800, Nick Lewycky wrote:
>> I initially discovered a signed integer overflow on this line. Since
>> this value is updated in multiple threads, I use an atomic update and
>> as it happens atomic addition is defined to wrap around. However,
>> there's still a potential bug in that the error_count may wrap around
>> and equal zero again causing problems down the line.
>>
>> ---
>>  libavcodec/mpeg12dec.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
>> index d5bc5f21b2..b7c3b5106e 100644
>> --- a/libavcodec/mpeg12dec.c
>> +++ b/libavcodec/mpeg12dec.c
>> @@ -28,6 +28,7 @@
>>  #define UNCHECKED_BITSTREAM_READER 1
>>  #include 
>>
>> +#include "libavutil/atomic.h"
>>  #include "libavutil/attributes.h"
>>  #include "libavutil/imgutils.h"
>>  #include "libavutil/internal.h"
>> @@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext *avctx,
>> AVFrame *picture,
>> &s2->thread_context[0], NULL,
>> s->slice_count, sizeof(void *));
>>  for (i = 0; i < s->slice_count; i++)
>> -s2->er.error_count +=
>> s2->thread_context[i]->er.error_count;
>> +
>> avpriv_atomic_int_add_and_fetch(&s2->er.error_count,
>> s2->thread_context[i]->er.error_count);
>>  }
>
> This patch is corrupted by newlines
>
> [...]
>
> --
> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
>
> Dictatorship naturally arises out of democracy, and the most aggravated
> form of tyranny and slavery out of the most extreme liberty. -- Plato
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
From cd8ed0ee35853ae089df3d904846879ce4f00c4a Mon Sep 17 00:00:00 2001
From: Nick Lewycky 
Date: Thu, 16 Nov 2017 11:50:38 -0800
Subject: [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated
 in multiple threads.

---
 libavcodec/mpeg12dec.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
index d5bc5f21b2..b7c3b5106e 100644
--- a/libavcodec/mpeg12dec.c
+++ b/libavcodec/mpeg12dec.c
@@ -28,6 +28,7 @@
 #define UNCHECKED_BITSTREAM_READER 1
 #include 
 
+#include "libavutil/atomic.h"
 #include "libavutil/attributes.h"
 #include "libavutil/imgutils.h"
 #include "libavutil/internal.h"
@@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext *avctx, AVFrame *picture,
&s2->thread_context[0], NULL,
s->slice_count, sizeof(void *));
 for (i = 0; i < s->slice_count; i++)
-s2->er.error_count += s2->thread_context[i]->er.error_count;
+avpriv_atomic_int_add_and_fetch(&s2->er.error_count, s2->thread_context[i]->er.error_count);
 }
 
 ret = slice_end(avctx, picture);
-- 
2.15.0.448.gf294e3d99a-goog



smime.p7s
Description: S/MIME Cryptographic Signature
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] libavcodec/mpeg12dec: Use atomic addition for value updated in multiple threads.

2017-11-19 Thread Nick Lewycky
Sorry-- what should I do now? Wait for another patch to go in first then
rebase on top of it? Attempt to migrate error_count to C11 atomics myself?
If I'm migrating, is there a primer on how ffmpeg uses C11 atomics? For
instance, given an atomic_int, should I assign to it with equality,
atomic_store, or atomic_store_explicit and if picking atomic_store_explicit
how clever should I be when picking memory orderings?

Also, ERContext also has a non-volatile non-atomic int error_occurred.
Sometimes we update the error_count without adjusting error_occurred. Is
the idea that error_count is shared across threads and error_count is
checked at the end? Given that error_count could wrap around and equal
zero, should I make it atomic too, and set it to 1 every time we set
error_count to non-zero?

I've attached a patch with my initial attempt to use C11 atomics.

Nick

On 17 November 2017 at 12:49, Michael Niedermayer 
wrote:

> On Fri, Nov 17, 2017 at 05:10:17PM -0300, James Almer wrote:
> > On 11/17/2017 4:20 PM, James Almer wrote:
> > > On 11/17/2017 4:16 PM, Michael Niedermayer wrote:
> > >> On Thu, Nov 16, 2017 at 03:07:54PM -0800, Nick Lewycky wrote:
> > >>> Sorry! Let's try an attachment then.
> > >>>
> > >>> On 16 November 2017 at 14:36, Michael Niedermayer
> > >>>  wrote:
> > >>>> On Thu, Nov 16, 2017 at 12:41:32PM -0800, Nick Lewycky wrote:
> > >>>>> I initially discovered a signed integer overflow on this line.
> Since
> > >>>>> this value is updated in multiple threads, I use an atomic update
> and
> > >>>>> as it happens atomic addition is defined to wrap around. However,
> > >>>>> there's still a potential bug in that the error_count may wrap
> around
> > >>>>> and equal zero again causing problems down the line.
> > >>>>>
> > >>>>> ---
> > >>>>>  libavcodec/mpeg12dec.c | 3 ++-
> > >>>>>  1 file changed, 2 insertions(+), 1 deletion(-)
> > >>>>>
> > >>>>> diff --git a/libavcodec/mpeg12dec.c b/libavcodec/mpeg12dec.c
> > >>>>> index d5bc5f21b2..b7c3b5106e 100644
> > >>>>> --- a/libavcodec/mpeg12dec.c
> > >>>>> +++ b/libavcodec/mpeg12dec.c
> > >>>>> @@ -28,6 +28,7 @@
> > >>>>>  #define UNCHECKED_BITSTREAM_READER 1
> > >>>>>  #include 
> > >>>>>
> > >>>>> +#include "libavutil/atomic.h"
> > >>>>>  #include "libavutil/attributes.h"
> > >>>>>  #include "libavutil/imgutils.h"
> > >>>>>  #include "libavutil/internal.h"
> > >>>>> @@ -2476,7 +2477,7 @@ static int decode_chunks(AVCodecContext
> *avctx,
> > >>>>> AVFrame *picture,
> > >>>>> &s2->thread_context[0], NULL,
> > >>>>> s->slice_count, sizeof(void
> *));
> > >>>>>  for (i = 0; i < s->slice_count; i++)
> > >>>>> -s2->er.error_count +=
> > >>>>> s2->thread_context[i]->er.error_count;
> > >>>>> +
> > >>>>> avpriv_atomic_int_add_and_fetch(&s2->er.error_count,
> > >>>>> s2->thread_context[i]->er.error_count);
> > >>>>>  }
> > >>>>
> > >>>> This patch is corrupted by newlines
> > >>>>
> > >>>> [...]
> > >>>>
> > >>>> --
> > >>>> Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC7
> 87040B0FAB
> > >>>>
> > >>>> Dictatorship naturally arises out of democracy, and the most
> aggravated
> > >>>> form of tyranny and slavery out of the most extreme liberty. --
> Plato
> > >>>>
> > >>>> ___
> > >>>> ffmpeg-devel mailing list
> > >>>> ffmpeg-devel@ffmpeg.org
> > >>>> http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> > >>>>
> > >>
> > >>>  mpeg12dec.c |3 ++-
> > >>>  1 file changed, 2 insertions(+), 1 deletion(-)
> > >>> eed6baa2f9ae5b6fcd227e4b7fa0e87e9ca55b64
> 0001-libavcodec-mpeg12dec-Use-atomic-addition-for-value-u.patch
> &g

[FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-06 Thread Nick Lewycky
Newer versions of clang will allocate %rbx as one of the inline asm inputs,
even in PIC. This occurs when building ffmpeg with clang -fsanitize=address
-O1 -fPIE. Because the asm does clobber %bx whether PIC is on or off, just
include %bx in the clobber list regardless of whether PIC is on or off.

Patch attached, please review and apply on my behalf.

Nick

PS. While looking at this code, I have a couple more comments:
  a) What is supposed to be at -8(%rsp) that the asm saves and restores?
Doesn't this depend on whether the compiler is using %sp or %bp? A few of
us compiler engineers couldn't figure out what this is trying to do (save a
return value to %rax? but it returns void? or to save the return address of
the function? but it's never clobbered?)
  b) There's an inline comment suggesting that one variable in the inline
asm is "buf2". There is no "buf2" (did you mean "dst2" or "src2"?)
diff --git a/libswscale/x86/hscale_fast_bilinear_simd.c b/libswscale/x86/hscale_fast_bilinear_simd.c
index 103793d..bfb23e2 100644
--- a/libswscale/x86/hscale_fast_bilinear_simd.c
+++ b/libswscale/x86/hscale_fast_bilinear_simd.c
@@ -198,26 +198,15 @@ void ff_hyscale_fast_mmxext(SwsContext *c, int16_t *dst,
 int16_t *filter= c->hLumFilter;
 void*mmxextFilterCode = c->lumMmxextFilterCode;
 int i;
-#if defined(PIC)
-uint64_t ebxsave;
-#endif
 #if ARCH_X86_64
 uint64_t retsave;
 #endif
 
 __asm__ volatile(
-#if defined(PIC)
-"mov   %%"REG_b", %5\n\t"
-#if ARCH_X86_64
-"mov   -8(%%rsp), %%"REG_a" \n\t"
-"mov   %%"REG_a", %6\n\t"
-#endif
-#else
 #if ARCH_X86_64
 "mov   -8(%%rsp), %%"REG_a" \n\t"
 "mov   %%"REG_a", %5\n\t"
 #endif
-#endif
 "pxor  %%mm7, %%mm7 \n\t"
 "mov  %0, %%"REG_c" \n\t"
 "mov  %1, %%"REG_D" \n\t"
@@ -256,30 +245,16 @@ void ff_hyscale_fast_mmxext(SwsContext *c, int16_t *dst,
 CALL_MMXEXT_FILTER_CODE
 CALL_MMXEXT_FILTER_CODE
 
-#if defined(PIC)
-"mov  %5, %%"REG_b" \n\t"
-#if ARCH_X86_64
-"mov  %6, %%"REG_a" \n\t"
-"mov   %%"REG_a", -8(%%rsp) \n\t"
-#endif
-#else
 #if ARCH_X86_64
 "mov  %5, %%"REG_a" \n\t"
 "mov   %%"REG_a", -8(%%rsp) \n\t"
 #endif
-#endif
 :: "m" (src), "m" (dst), "m" (filter), "m" (filterPos),
"m" (mmxextFilterCode)
-#if defined(PIC)
-  ,"m" (ebxsave)
-#endif
 #if ARCH_X86_64
   ,"m"(retsave)
 #endif
-: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D
-#if !defined(PIC)
- ,"%"REG_b
-#endif
+: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D ,"%"REG_b
 );
 
 for (i=dstWidth-1; (i*xInc)>>16 >=srcW-1; i--)
@@ -294,26 +269,15 @@ void ff_hcscale_fast_mmxext(SwsContext *c, int16_t *dst1, int16_t *dst2,
 int16_t *filter= c->hChrFilter;
 void*mmxextFilterCode = c->chrMmxextFilterCode;
 int i;
-#if defined(PIC)
-DECLARE_ALIGNED(8, uint64_t, ebxsave);
-#endif
 #if ARCH_X86_64
 DECLARE_ALIGNED(8, uint64_t, retsave);
 #endif
 
 __asm__ volatile(
-#if defined(PIC)
-"mov  %%"REG_b", %7 \n\t"
-#if ARCH_X86_64
-"mov  -8(%%rsp), %%"REG_a"  \n\t"
-"mov  %%"REG_a", %8 \n\t"
-#endif
-#else
 #if ARCH_X86_64
 "mov  -8(%%rsp), %%"REG_a"  \n\t"
 "mov  %%"REG_a", %7 \n\t"
 #endif
-#endif
 "pxor %%mm7, %%mm7  \n\t"
 "mov %0, %%"REG_c"  \n\t"
 "mov %1, %%"REG_D"  \n\t"
@@ -340,30 +304,16 @@ void ff_hcscale_fast_mmxext(SwsContext *c, int16_t *dst1, int16_t *dst2,
 CALL_MMXEXT_FILTER_CODE
 CALL_MMXEXT_FILTER_CODE
 
-#if defined(PIC)
-"mov %7, %%"REG_b"\n\t"
-#if ARCH_X86_64
-"mov %8, %%"REG_a"  \n\t"
-"mov  %%"REG_a", -8(%%rsp)  \n\t"
-#endif
-#else
 #if ARCH_X86_64
 "mov %7, %%"REG_a"  \n\t"
 "mov  %%"REG_a", -8(%%rsp)  \n\t"
 #endif
-#endif
 :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos),
"m" (mmxextFilterCode), "m" (src2), "m"(dst2)
-#if defined(PIC)
-  ,"m" (ebxsave)
-#endif
 #if ARCH_X86_64
   ,"m"(retsave)
 #endif
-: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D
-#if !defined(PIC)
- ,"%"REG_b
-#endif
+: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D ,"%"REG_b
 );
 
 for (i=dstWidth-1; (i*xInc)>>16 >=srcW-1; i--) {
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-06 Thread Nick Lewycky
On 6 May 2015 at 11:58, Carl Eugen Hoyos  wrote:

> Nick Lewycky  google.com> writes:
>
> > This occurs when building ffmpeg with clang
> > -fsanitize=address -O1 -fPIE.
>
> What is the usecase for -O1?
>

It's very important for asan. ASan instruments every memory access (outside
inline asm, for now) and if we don't run the optimizer then every local
variable is a stack allocation with loads and stores, and all those loads
and stores get instrumented leading to both crazy slowdown and binary size
increase.

Did you check performance with and without your patch?
>

No.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-06 Thread Nick Lewycky
On 6 May 2015 at 13:08, Carl Eugen Hoyos  wrote:

> Nick Lewycky  google.com> writes:
>
> > On 6 May 2015 at 11:58, Carl Eugen Hoyos:
> >
> > > Nick Lewycky  google.com> writes:
> > >
> > > > This occurs when building ffmpeg with clang
> > > > -fsanitize=address -O1 -fPIE.
> > >
> > > What is the usecase for -O1?
> >
> > It's very important for asan. ASan instruments
> > every memory access (outside inline asm, for now)
> > and if we don't run the optimizer then every local
> > variable is a stack allocation with loads and
> > stores, and all those loads and stores get
> > instrumented leading to both crazy slowdown and
> > binary size increase.
>
> What I meant was (and I believe I wasn't clear):
> What is the usecase for -O1 as opposed to default
> optimization?
> Or in other words: Is the issue with clang also
> reproducible with -O2 and -O3?
>

Oh I see. Yes, it does.

I happened to use clang -O1 because that's our normal "unoptimized" ASan
configuration where I first found the bug. I assume it doesn't happen at
-O0 but I haven't checked, it does happen at higher optimization levels too.

> > Did you check performance with and without
> > > your patch?
> >
> > No.
>
> I believe such a performance test is a precondition
> for an asm patch (but I am not the maintainer).
>

I'm new here, I don't even know to test ffmpeg performance. I'm happy to
run something if you have a suggestion?

My two arguments are that correctness trumps performance (otherwise ffmpeg
could be "return 0" -- don't tempt me, I'm already a compiler optimization
engineer ;-) ), and that this is exactly what ffmpeg already did with PIC
disabled.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-06 Thread Nick Lewycky
On 6 May 2015 at 15:06, Michael Niedermayer  wrote:

> Hi
>
> On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote:
> > Newer versions of clang will allocate %rbx as one of the inline asm
> inputs,
>
> Thats great
>
>
> > even in PIC. This occurs when building ffmpeg with clang
> -fsanitize=address
> > -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or off,
> just
> > include %bx in the clobber list regardless of whether PIC is on or off.
>
> you cant include *bx in the clobber list with PIC
> it breaks compilers that are less great, like gcc
>

Doh!! I did check for this, but only tried x86-64, not x86-32. Sorry!

gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> ‘ff_hyscale_fast_mmxext’:
> ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC register
> clobbered by ‘%ebx’ in ‘asm’
> ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> ‘ff_hcscale_fast_mmxext’:
> ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC register
> clobbered by ‘%ebx’ in ‘asm’
>
>
> also what exactly are you trying to fix ?
> or rather what exactly goes how exactly wrong with the code as it is
> if rbx is used ?
>

Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives evaluated
in PIC x86-64, the inline constraints work out to:

:: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos),
   "m" (mmxextFilterCode), "m" (src2), "m"(dst2)
  ,"m" (ebxsave)
  ,"m"(retsave)
: "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D

so clang looks at that and decides that it can pick src1 = (%r10), dst1 =
(%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode = (%r15), src2
= (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The problem
there is src2 being (%rbx).

Now let's look at how we use them:

"mov %0, %%"REG_c"  \n\t"
"mov %1, %%"REG_D"  \n\t"
"mov %2, %%"REG_d"  \n\t"
"mov %3, %%"REG_b"  \n\t"  // Clobbers %rbx / src2
/ %5 here
"xor  %%"REG_a", %%"REG_a"  \n\t"
PREFETCH"   (%%"REG_c") \n\t"
PREFETCH" 32(%%"REG_c") \n\t"
PREFETCH" 64(%%"REG_c") \n\t"

CALL_MMXEXT_FILTER_CODE
CALL_MMXEXT_FILTER_CODE
CALL_MMXEXT_FILTER_CODE
CALL_MMXEXT_FILTER_CODE
"xor  %%"REG_a", %%"REG_a"  \n\t"
"mov %5, %%"REG_c"  \n\t"  // %5 is read too late,
we get %3 / filterPos instead
"mov %6, %%"REG_D"  \n\t"
PREFETCH"   (%%"REG_c") \n\t"
PREFETCH" 32(%%"REG_c") \n\t"
PREFETCH" 64(%%"REG_c") \n\t"

CALL_MMXEXT_FILTER_CODE  // Crash...

The two marked instructions are intended to refer to different contents.
CALL_MMXEXT_FILTER_CODE moves RBX to ESI and calls mmxextFilterCode:

"movl(%%"REG_b"), %%esi \n\t"\
"call*%4\n\t"\  // Crash...

That callee ultimately segfaults:

=> 0x7fffefa45000:  movq   (%rdx,%rax,1),%mm3
=> 0x7fffefa45004:  movd   (%rcx,%rsi,1),%mm0
Program received signal SIGSEGV, Segmentation fault.
0x7fffefa45004 in ?? ()
=> 0x7fffefa45004:  movd   (%rcx,%rsi,1),%mm0

I'm trying to fix this segfault.

also ideally changes to this code should be tested against gcc/clang/icc
> of various versions with and without PIC, 32 and 64 bit
> this code is more tricky than than the average so theres a good
> change changes to it will break some compiler
>

I understand that, but I may not be able to test as many environments as
you like. I'm going to give testing it my best effort, but I can tell you
up front that I'm only going to test gcc and clang, on an x86 Ubuntu
machine. I don't have ICC, so I can't test that.
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel


Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-08 Thread Nick Lewycky
On 6 May 2015 at 18:03, Michael Niedermayer  wrote:

> On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote:
> > On 6 May 2015 at 15:06, Michael Niedermayer  wrote:
> >
> > > Hi
> > >
> > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote:
> > > > Newer versions of clang will allocate %rbx as one of the inline asm
> > > inputs,
> > >
> > > Thats great
> > >
> > >
> > > > even in PIC. This occurs when building ffmpeg with clang
> > > -fsanitize=address
> > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or off,
> > > just
> > > > include %bx in the clobber list regardless of whether PIC is on or
> off.
> > >
> > > you cant include *bx in the clobber list with PIC
> > > it breaks compilers that are less great, like gcc
> > >
> >
> > Doh!! I did check for this, but only tried x86-64, not x86-32. Sorry!
> >
> > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > ‘ff_hyscale_fast_mmxext’:
> > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC
> register
> > > clobbered by ‘%ebx’ in ‘asm’
> > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > ‘ff_hcscale_fast_mmxext’:
> > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC
> register
> > > clobbered by ‘%ebx’ in ‘asm’
> > >
> > >
> > > also what exactly are you trying to fix ?
> > > or rather what exactly goes how exactly wrong with the code as it is
> > > if rbx is used ?
> > >
> >
> > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives
> evaluated
> > in PIC x86-64, the inline constraints work out to:
> >
> > :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos),
> >"m" (mmxextFilterCode), "m" (src2), "m"(dst2)
> >   ,"m" (ebxsave)
> >   ,"m"(retsave)
> > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D
> >
> > so clang looks at that and decides that it can pick src1 = (%r10), dst1 =
> > (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode = (%r15),
> src2
> > = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The problem
> > there is src2 being (%rbx).
> >
> > Now let's look at how we use them:
> >
> > "mov %0, %%"REG_c"  \n\t"
> > "mov %1, %%"REG_D"  \n\t"
> > "mov %2, %%"REG_d"  \n\t"
> > "mov %3, %%"REG_b"  \n\t"  // Clobbers %rbx /
> src2
> > / %5 here
> > "xor  %%"REG_a", %%"REG_a"  \n\t"
> > PREFETCH"   (%%"REG_c") \n\t"
> > PREFETCH" 32(%%"REG_c") \n\t"
> > PREFETCH" 64(%%"REG_c") \n\t"
> >
> > CALL_MMXEXT_FILTER_CODE
> > CALL_MMXEXT_FILTER_CODE
> > CALL_MMXEXT_FILTER_CODE
> > CALL_MMXEXT_FILTER_CODE
> > "xor  %%"REG_a", %%"REG_a"  \n\t"
> > "mov %5, %%"REG_c"  \n\t"  // %5 is read too
> late,
> > we get %3 / filterPos instead
> > "mov %6, %%"REG_D"  \n\t"
> > PREFETCH"   (%%"REG_c") \n\t"
> > PREFETCH" 32(%%"REG_c") \n\t"
> > PREFETCH" 64(%%"REG_c") \n\t"
> >
> > CALL_MMXEXT_FILTER_CODE  // Crash...
> >
> > The two marked instructions are intended to refer to different contents.
> > CALL_MMXEXT_FILTER_CODE moves RBX to ESI and calls mmxextFilterCode:
> >
> > "movl(%%"REG_b"), %%esi \n\t"\
> > "call*%4\n\t"\  // Crash...
> >
> > That callee ultimately segfaults:
> >
> > => 0x7fffefa45000:  movq   (%rdx,%rax,1),%mm3
> > => 0x7fffefa45004:  movd   (%rcx,%rsi,1),%mm0
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x7fffefa45004 in ?? ()
> > => 0x7fffefa45004:  movd

Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-08 Thread Nick Lewycky
On 8 May 2015 at 12:06, Michael Niedermayer  wrote:

> On Fri, May 08, 2015 at 10:50:49AM -0700, Nick Lewycky wrote:
> > On 6 May 2015 at 18:03, Michael Niedermayer  wrote:
> >
> > > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote:
> > > > On 6 May 2015 at 15:06, Michael Niedermayer 
> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote:
> > > > > > Newer versions of clang will allocate %rbx as one of the inline
> asm
> > > > > inputs,
> > > > >
> > > > > Thats great
> > > > >
> > > > >
> > > > > > even in PIC. This occurs when building ffmpeg with clang
> > > > > -fsanitize=address
> > > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is on or
> off,
> > > > > just
> > > > > > include %bx in the clobber list regardless of whether PIC is on
> or
> > > off.
> > > > >
> > > > > you cant include *bx in the clobber list with PIC
> > > > > it breaks compilers that are less great, like gcc
> > > > >
> > > >
> > > > Doh!! I did check for this, but only tried x86-64, not x86-32. Sorry!
> > > >
> > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > > > ‘ff_hyscale_fast_mmxext’:
> > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC
> > > register
> > > > > clobbered by ‘%ebx’ in ‘asm’
> > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > > > ‘ff_hcscale_fast_mmxext’:
> > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC
> > > register
> > > > > clobbered by ‘%ebx’ in ‘asm’
> > > > >
> > > > >
> > > > > also what exactly are you trying to fix ?
> > > > > or rather what exactly goes how exactly wrong with the code as it
> is
> > > > > if rbx is used ?
> > > > >
> > > >
> > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives
> > > evaluated
> > > > in PIC x86-64, the inline constraints work out to:
> > > >
> > > > :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos),
> > > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2)
> > > >   ,"m" (ebxsave)
> > > >   ,"m"(retsave)
> > > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D
> > > >
> > > > so clang looks at that and decides that it can pick src1 = (%r10),
> dst1 =
> > > > (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode =
> (%r15),
> > > src2
> > > > = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The
> problem
> > > > there is src2 being (%rbx).
> > > >
> > > > Now let's look at how we use them:
> > > >
> > > > "mov %0, %%"REG_c"  \n\t"
> > > > "mov %1, %%"REG_D"  \n\t"
> > > > "mov %2, %%"REG_d"  \n\t"
> > > > "mov %3, %%"REG_b"  \n\t"  // Clobbers %rbx /
> > > src2
> > > > / %5 here
> > > > "xor  %%"REG_a", %%"REG_a"  \n\t"
> > > > PREFETCH"   (%%"REG_c") \n\t"
> > > > PREFETCH" 32(%%"REG_c") \n\t"
> > > > PREFETCH" 64(%%"REG_c") \n\t"
> > > >
> > > > CALL_MMXEXT_FILTER_CODE
> > > > CALL_MMXEXT_FILTER_CODE
> > > > CALL_MMXEXT_FILTER_CODE
> > > > CALL_MMXEXT_FILTER_CODE
> > > > "xor  %%"REG_a", %%"REG_a"  \n\t"
> > > > "mov %5, %%"REG_c"  \n\t"  // %5 is read too
> > > late,
> > > > we get %3 / filterPos instead
> > > > "mov %6, %%"REG_D"  \n

Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-11 Thread Nick Lewycky
On 8 May 2015 at 14:28, Michael Niedermayer  wrote:

> On Fri, May 08, 2015 at 12:43:20PM -0700, Nick Lewycky wrote:
> > On 8 May 2015 at 12:06, Michael Niedermayer  wrote:
> >
> > > On Fri, May 08, 2015 at 10:50:49AM -0700, Nick Lewycky wrote:
> > > > On 6 May 2015 at 18:03, Michael Niedermayer 
> wrote:
> > > >
> > > > > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote:
> > > > > > On 6 May 2015 at 15:06, Michael Niedermayer 
> > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky wrote:
> > > > > > > > Newer versions of clang will allocate %rbx as one of the
> inline
> > > asm
> > > > > > > inputs,
> > > > > > >
> > > > > > > Thats great
> > > > > > >
> > > > > > >
> > > > > > > > even in PIC. This occurs when building ffmpeg with clang
> > > > > > > -fsanitize=address
> > > > > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC is
> on or
> > > off,
> > > > > > > just
> > > > > > > > include %bx in the clobber list regardless of whether PIC is
> on
> > > or
> > > > > off.
> > > > > > >
> > > > > > > you cant include *bx in the clobber list with PIC
> > > > > > > it breaks compilers that are less great, like gcc
> > > > > > >
> > > > > >
> > > > > > Doh!! I did check for this, but only tried x86-64, not x86-32.
> Sorry!
> > > > > >
> > > > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > > > > > ‘ff_hyscale_fast_mmxext’:
> > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5: error: PIC
> > > > > register
> > > > > > > clobbered by ‘%ebx’ in ‘asm’
> > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > > > > > ‘ff_hcscale_fast_mmxext’:
> > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5: error: PIC
> > > > > register
> > > > > > > clobbered by ‘%ebx’ in ‘asm’
> > > > > > >
> > > > > > >
> > > > > > > also what exactly are you trying to fix ?
> > > > > > > or rather what exactly goes how exactly wrong with the code as
> it
> > > is
> > > > > > > if rbx is used ?
> > > > > > >
> > > > > >
> > > > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor directives
> > > > > evaluated
> > > > > > in PIC x86-64, the inline constraints work out to:
> > > > > >
> > > > > > :: "m" (src1), "m" (dst1), "m" (filter), "m" (filterPos),
> > > > > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2)
> > > > > >   ,"m" (ebxsave)
> > > > > >   ,"m"(retsave)
> > > > > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D
> > > > > >
> > > > > > so clang looks at that and decides that it can pick src1 =
> (%r10),
> > > dst1 =
> > > > > > (%r8), filter = (%r11), filterPos = (%r12), mmxextFilterCode =
> > > (%r15),
> > > > > src2
> > > > > > = (%rbx), dst2 = (%r14), ebxsave = (%r13), retsave = (%r9). The
> > > problem
> > > > > > there is src2 being (%rbx).
> > > > > >
> > > > > > Now let's look at how we use them:
> > > > > >
> > > > > > "mov %0, %%"REG_c"  \n\t"
> > > > > > "mov %1, %%"REG_D"  \n\t"
> > > > > > "mov %2, %%"REG_d"  \n\t"
> > > > > > "mov %3, %%"REG_b"  \n\t"  // Clobbers
> %rbx /
> > > > > src2
> > > > > > / %5 h

Re: [FFmpeg-devel] [PATCH] ff_h[yc]scale_fast_mmext always clobbers %rbx

2015-05-12 Thread Nick Lewycky
On 11 May 2015 at 16:18, Michael Niedermayer  wrote:

> On Mon, May 11, 2015 at 03:55:40PM -0700, Nick Lewycky wrote:
> > On 8 May 2015 at 14:28, Michael Niedermayer  wrote:
> >
> > > On Fri, May 08, 2015 at 12:43:20PM -0700, Nick Lewycky wrote:
> > > > On 8 May 2015 at 12:06, Michael Niedermayer 
> wrote:
> > > >
> > > > > On Fri, May 08, 2015 at 10:50:49AM -0700, Nick Lewycky wrote:
> > > > > > On 6 May 2015 at 18:03, Michael Niedermayer 
> > > wrote:
> > > > > >
> > > > > > > On Wed, May 06, 2015 at 04:08:09PM -0700, Nick Lewycky wrote:
> > > > > > > > On 6 May 2015 at 15:06, Michael Niedermayer <
> michae...@gmx.at>
> > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > > On Wed, May 06, 2015 at 11:52:59AM -0700, Nick Lewycky
> wrote:
> > > > > > > > > > Newer versions of clang will allocate %rbx as one of the
> > > inline
> > > > > asm
> > > > > > > > > inputs,
> > > > > > > > >
> > > > > > > > > Thats great
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > even in PIC. This occurs when building ffmpeg with clang
> > > > > > > > > -fsanitize=address
> > > > > > > > > > -O1 -fPIE. Because the asm does clobber %bx whether PIC
> is
> > > on or
> > > > > off,
> > > > > > > > > just
> > > > > > > > > > include %bx in the clobber list regardless of whether
> PIC is
> > > on
> > > > > or
> > > > > > > off.
> > > > > > > > >
> > > > > > > > > you cant include *bx in the clobber list with PIC
> > > > > > > > > it breaks compilers that are less great, like gcc
> > > > > > > > >
> > > > > > > >
> > > > > > > > Doh!! I did check for this, but only tried x86-64, not
> x86-32.
> > > Sorry!
> > > > > > > >
> > > > > > > > gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
> > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > > > > > > > ‘ff_hyscale_fast_mmxext’:
> > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:205:5:
> error: PIC
> > > > > > > register
> > > > > > > > > clobbered by ‘%ebx’ in ‘asm’
> > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c: In function
> > > > > > > > > ‘ff_hcscale_fast_mmxext’:
> > > > > > > > > ../libswscale/x86/hscale_fast_bilinear_simd.c:276:5:
> error: PIC
> > > > > > > register
> > > > > > > > > clobbered by ‘%ebx’ in ‘asm’
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > also what exactly are you trying to fix ?
> > > > > > > > > or rather what exactly goes how exactly wrong with the
> code as
> > > it
> > > > > is
> > > > > > > > > if rbx is used ?
> > > > > > > > >
> > > > > > > >
> > > > > > > > Ok, let's look at ff_hcscale_fast_mmext. Preprocessor
> directives
> > > > > > > evaluated
> > > > > > > > in PIC x86-64, the inline constraints work out to:
> > > > > > > >
> > > > > > > > :: "m" (src1), "m" (dst1), "m" (filter), "m"
> (filterPos),
> > > > > > > >"m" (mmxextFilterCode), "m" (src2), "m"(dst2)
> > > > > > > >   ,"m" (ebxsave)
> > > > > > > >   ,"m"(retsave)
> > > > > > > > : "%"REG_a, "%"REG_c, "%"REG_d, "%"REG_S, "%"REG_D
> > > > > > > >
> > > > > > > > so clang looks at that and decides that it can pick src1 =
> > > (%r10),
> > > > >