Re: [FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-07 Thread James Almer
On 07/11/14 2:56 PM, Michael Niedermayer wrote: > On Fri, Nov 07, 2014 at 01:19:22PM -0300, James Almer wrote: >> On 07/11/14 6:05 AM, Christophe Gisquet wrote: >>> Hi, >>> >>> 2014-11-06 23:04 GMT+01:00 James Almer : No, the function checks for alignment and jumps to a branch that uses

Re: [FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-07 Thread Michael Niedermayer
On Fri, Nov 07, 2014 at 01:19:22PM -0300, James Almer wrote: > On 07/11/14 6:05 AM, Christophe Gisquet wrote: > > Hi, > > > > 2014-11-06 23:04 GMT+01:00 James Almer : > >> No, the function checks for alignment and jumps to a branch that uses > >> movdqu if needed. > >> ff_int32_to_float_a_avx als

Re: [FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-07 Thread James Almer
On 07/11/14 6:05 AM, Christophe Gisquet wrote: > Hi, > > 2014-11-06 23:04 GMT+01:00 James Almer : >> No, the function checks for alignment and jumps to a branch that uses movdqu >> if needed. >> ff_int32_to_float_a_avx also uses ymm regs and this same macro. > > OK, so nothing new here, same 32-

Re: [FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-07 Thread Christophe Gisquet
Hi, 2014-11-06 23:04 GMT+01:00 James Almer : > No, the function checks for alignment and jumps to a branch that uses movdqu > if needed. > ff_int32_to_float_a_avx also uses ymm regs and this same macro. OK, so nothing new here, same 32-bytes alignment. > when "mulps m0, m1, [mem]" would work ju

Re: [FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-06 Thread James Almer
On 06/11/14 6:35 PM, Christophe Gisquet wrote: > Hi, > > 2014-11-06 21:48 GMT+01:00 James Almer : >> 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips >> 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips > > A couple of naïve questions (I haven't checked): > Does it

Re: [FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-06 Thread Christophe Gisquet
Hi, 2014-11-06 21:48 GMT+01:00 James Almer : > 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips > 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips A couple of naïve questions (I haven't checked): Does it increase the alignment requirement? If yes, should it be not

[FFmpeg-devel] [PATCH] x86/swr: add ff_float_to_int32_a_avx2

2014-11-06 Thread James Almer
13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips Signed-off-by: James Almer --- x86inc.asm doesn't seem to handle cmpps or its aliases properly when using avx. libswresample/x86/audio_convert.asm| 10 -