On 08/02/15 8:21 AM, Christophe Gisquet wrote:
> Hi,
> 
> 2015-02-07 23:06 GMT+01:00 James Almer <jamr...@gmail.com>:
>> Signed-off-by: James Almer <jamr...@gmail.com>
>> ---
>>  libavcodec/x86/hevc_sao.asm   | 40 ++++++++++++++++++++++++++++++++++++----
>>  libavcodec/x86/hevcdsp_init.c | 24 ++++++++++++------------
>>  2 files changed, 48 insertions(+), 16 deletions(-)
> 
> Passes fate here for Win32/64, but that's probably not the platforms
> you interested in hearing feedback from.
> Looks OK to commit.
> 
> A few questions though:
> 
>> +    %assign MMSIZE mmsize
> 
> Why do that? Not a big deal: it's only for my education, if there's
> something I'm missing.

For width 48, the COMPUTE macro is last run after an INIT_XMM cpuname, so 
mmsize becomes 
16 and in the avx2 version the instructions would access the wrong data in 
stack.
Doing %assign MMSIZE mmsize at the beginning of the function and using it here 
makes sure 
it's always 32 in avx2.
sse2 is unaffected by this, of course.

And the reason I'm using INIT_XMM in the middle of the function for the avx2 
width 48 case 
is because i couldn't find a nice and clean way to use the xm* reg aliases with 
the COMPUTE 
macros.

> 
>> +    pcmpeqw           m4, %2, [rsp+MMSIZE*0]
>> +    pcmpeqw           m5, %2, [rsp+MMSIZE*1]
>> +    pcmpeqw           m6, %2, [rsp+MMSIZE*2]
>> +    pcmpeqw           %2, [rsp+MMSIZE*3]
>> +    pand              m4, [rsp+MMSIZE*4]
>> +    pand              m5, [rsp+MMSIZE*5]
>> +    pand              m6, [rsp+MMSIZE*6]
> [...]
>> -cglobal hevc_sao_band_filter_%1_8, 6, 6, 15, dst, src, dststride, 
>> srcstride, offset, left
>> +cglobal hevc_sao_band_filter_%1_8, 6, 6, 15, 8*mmsize*ARCH_X86_32, dst, 
>> src, dststride, srcstride, offset, left
>>      HEVC_SAO_BAND_FILTER_INIT 8
> 
> Why do you need room for 8 regs, and not 7?

Remnant from before i realized i could keep m7 untouched. I'll change it.

> 
> Setting this to 7 and doing the required changes to remove MMSIZE pass
> here on Win32.

Again, MMSIZE is for avx2 and you're probably running the sse2 version, where 
it's always 
the same as mmsize.

> 
>> -                SAO_BAND_INIT(10, sse2);
>>                  SAO_EDGE_INIT(10, sse2);
>>              }
>> +            SAO_BAND_INIT(10, sse2);
> 
> On a side note, you haven't ported >8 bits edge filter to x86_32. I
> guess because it wouldn't play real time anyway?

I was going to give edge >8bit a try next.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel

Reply via email to