Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
2015-10-13 17:44 GMT+02:00 Christophe Gisquet : > But I'll check. Indeed not bit-exact to faani and C simple idct: stddev:0.00 PSNR:163.48 MAXDIFF:1 This would at least result in fate no longer passing as this stands. I don't think it's worth the speed difference. No change on dct-test r

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
2015-10-13 15:43 GMT+02:00 Michael Niedermayer : > On Tue, Oct 13, 2015 at 01:33:07PM +0200, Christophe Gisquet wrote: >> Hi, >> >> 2015-10-13 13:10 GMT+02:00 Michael Niedermayer : >> > hmm, iam a bit concerned that adding the rounder (which effectively is >> > 0.5) causes a overflow, that would if

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Michael Niedermayer
On Tue, Oct 13, 2015 at 01:33:07PM +0200, Christophe Gisquet wrote: > Hi, > > 2015-10-13 13:10 GMT+02:00 Michael Niedermayer : > > hmm, iam a bit concerned that adding the rounder (which effectively is > > 0.5) causes a overflow, that would if iam not mistaken imlpy that > > things are very close

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
Hi, 2015-10-13 13:10 GMT+02:00 Michael Niedermayer : > hmm, iam a bit concerned that adding the rounder (which effectively is > 0.5) causes a overflow, that would if iam not mistaken imlpy that > things are very close to overflowing already without it It's true, but the immediate cause here is th

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Michael Niedermayer
On Tue, Oct 13, 2015 at 09:01:44AM +0200, Christophe Gisquet wrote: > Hi, > > 2015-10-13 2:26 GMT+02:00 Michael Niedermayer : > > On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: > >> When the input of a pass has 15 or 16 bits of precision (in particular > >> the column pass), t

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-13 Thread Christophe Gisquet
Hi, 2015-10-13 2:26 GMT+02:00 Michael Niedermayer : > On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: >> When the input of a pass has 15 or 16 bits of precision (in particular >> the column pass), the addition of a bias to W4 may lead to overflows >> in the input to pmaddwd. >>

Re: [FFmpeg-devel] [PATCH 5/9] x86: simple_idct10_template: fix overflow in pass

2015-10-12 Thread Michael Niedermayer
On Mon, Oct 12, 2015 at 07:37:46PM +0200, Christophe Gisquet wrote: > When the input of a pass has 15 or 16 bits of precision (in particular > the column pass), the addition of a bias to W4 may lead to overflows > in the input to pmaddwd. > > This requires postponing the adding of the bias to afte