Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread Christophe Gisquet
Hi, 2014-08-23 17:48 GMT+02:00 Mickaël Raulet : > For avx2 I have some to push to the trunk, I did merge it yesterday with > all recent changes. But I don t remember what those tables looks like. Well, my point was hypothetical, but I guess this means some conflicts are to be expected when either

Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread Mickaël Raulet
For avx2 I have some to push to the trunk, I did merge it yesterday with all recent changes. But I don t remember what those tables looks like. For 10 and 12bits, ssse3 should slow down the decoding since it uses 4 more instructions in the loop. Le samedi 23 août 2014, Christophe Gisquet a écrit

Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread James Almer
On 23/08/14 12:20 PM, Christophe Gisquet wrote: > Hi, > > 2014-08-23 17:16 GMT+02:00 James Almer : >>> What do you mean by duplicated? That tables for 10 and 12 are? > [...] >> I was talking about the opt suffix since both the ssse3 and sse4 tables will >> be the same. > > Oh ok, in case we have

Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread Christophe Gisquet
Hi, 2014-08-23 17:16 GMT+02:00 James Almer : >> What do you mean by duplicated? That tables for 10 and 12 are? [...] > I was talking about the opt suffix since both the ssse3 and sse4 tables will > be the same. Oh ok, in case we have to instantiate sse4 versions. Because at the moment there are o

Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread James Almer
On 23/08/14 12:11 PM, Christophe Gisquet wrote: > Hi, > > 2014-08-23 16:52 GMT+02:00 James Almer : >>> -QPEL_TABLE 8, 8, b, sse4 >>> -QPEL_TABLE 10, 4, w, sse4 >>> -QPEL_TABLE 12, 4, w, sse4 >>> +QPEL_TABLE 8, 8, b, ssse3 >>> +QPEL_TABLE 10, 4, w, ssse3 >>> +QPEL_TABLE 12, 4, w, ssse3 >> >> Do t

Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread Christophe Gisquet
Hi, 2014-08-23 16:52 GMT+02:00 James Almer : >> -QPEL_TABLE 8, 8, b, sse4 >> -QPEL_TABLE 10, 4, w, sse4 >> -QPEL_TABLE 12, 4, w, sse4 >> +QPEL_TABLE 8, 8, b, ssse3 >> +QPEL_TABLE 10, 4, w, ssse3 >> +QPEL_TABLE 12, 4, w, ssse3 > > Do these need to be duplicated? You could just remove the suffix a

Re: [FFmpeg-devel] [PATCH 2/2] x86: hevc_mc: convert to ssse3

2014-08-23 Thread James Almer
On 23/08/14 10:22 AM, Christophe Gisquet wrote: > The only sse4 instruction is pextrw, which is used on rather minor > functions for small blocks. Therefore use whichever GPR is available > to extract the output word. > > Before (sse4), for block_w == 6: > 4627 decicycles in epel_uni, 16377 runs,