Re: [FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

2017-06-06 Thread Ronald S. Bultje
Hi, On Mon, Jun 5, 2017 at 8:02 AM, Ronald S. Bultje wrote: > On Mon, Jun 5, 2017 at 7:23 AM, James Darnley wrote: > >> I forgot to mention in my cover letter that although the dct test >> passes, fate does not. As I mentioned on IRC, changing them causes >> errors elsewhere in fate. I am cur

Re: [FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

2017-06-05 Thread Ronald S. Bultje
Hi, On Mon, Jun 5, 2017 at 7:23 AM, James Darnley wrote: > I forgot to mention in my cover letter that although the dct test > passes, fate does not. As I mentioned on IRC, changing them causes > errors elsewhere in fate. I am currently looking into this problem and > I'm sure I will speak to

Re: [FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

2017-06-05 Thread James Darnley
To answer the couple of questions that were asked over the weekend. Rostislav, about the performance. I can see how to force a particular IDCT implementation for real world decoding (the -idct option) but the MPEG2 HD sample I've been working with mostly uses the "idct add" function which doesn't

[FFmpeg-devel] [WIP] [PATCH 0/6] sse2/xmm version of 8-bit simple_idct

2017-06-02 Thread James Darnley
Two ideas here. The first 3 patches alter the old mmx code so that it can use xmm registers. It still only uses half the available width and adds a few shuffles meaning it isn't an ideal solution. Though it is exact compared with the mmx version. Seems to be moderately faster of Skylake despite