On Thu, Dec 17, 2015 at 07:47:08PM +0100, Michael Niedermayer wrote: > On Thu, Dec 17, 2015 at 04:54:31PM +0100, Matthieu Bouron wrote: > > On Tue, Dec 15, 2015 at 06:22:43PM +0100, Michael Niedermayer wrote: > > > On Tue, Dec 15, 2015 at 05:46:09PM +0100, Matthieu Bouron wrote: > > > > From: Matthieu Bouron <matthieu.bou...@stupeflix.com> > > > > > > > > --- > > > > > > > > Hi, > > > > > > > > This commit is likely to break fate on arm since the current C code path > > > > seems to use less precision. > > > > > > > > How should I proceed to fix it ? > > > > > > hmm > > > can the precission of the C code path and any asm impl of it under > > > bitexact (if they exist), be changed to higher precission without > > > speedloss? > > > if so that would be an option > > > > We are currently facing 4 cases (with this patch applied) > > > > * [1] ARM +ACCURATE_RND: uses neon, 13bit coefficients and 32bit > > precision overall > > * [2] ARM -ACCURATE_RND: uses neon, 6bit coefficients and 16bit > > precision overall > > > * [3] X86 +ACCURATE_RND: uses a C code path with lookup tables > > which LUT do you mean here ?
The table filled by ff_yuv2rgb_c_init_tables. Not sure if it's used though, I haven't looked at the C code that actually does the conversion (yet). > > > > * [4] X86 -ACCURATE_RND: uses MMX+MMXEXT with apparently 13bit > > coefficients (libswscale/yuv2rgb.c around line 800). > > > > Notes: > > * The 4 outputs are different with [3] being ugly (ghosting/non-sharp > > edges). > > > > * [1] and [4] (13bit coefficient accuracy) should be the same but have > > slight differences. > > > > Questions: > > > > > * What is the meaning of the ACCURATE_RND flag ? > > it should enable accurate rounding > > > > * Does [3] use some kind of interpolation instead of duplicating > > chroma lines ? Its output seems inferior to the other code paths. > > are you sure that is true for real images? > its easy to end up with wrong conclusions with synthetic inputs > unless you want to use the scaler only for such inputs. > > either way line duplication is likely not optimal for real images > iam not made of constant color blocks that are aligned to some cammeras > 2x2 samples > > > > * Is [3] the output that should be taken as reference ? > > id say, the reference is reality, making the output as close as a > image of the new resolution would be if it had been taken that way > > > > * Should we use BITEXACT instead of ACCURATE_RND to determine the > > precision used ? > > BITEXACT is to avoid platform differences and allow regression tests > > if all else is equal it would be best if C and asm matches, and if > C is bad then it should be improved Here are the C, MMX and NEON outputs from a photo: http://0x5c.me/yuv2rgb/photos The C and NEON outputs look almost the same. The MMX one have slightly different "colors" overall. Since figuring out what the C code is actually doing and have the neon asm matches its output is likely to take some time. Would you be OK if, on the ARM platform, +ACCURATE_RND uses the C code path (so fate is not broken), and -ACCURATE_RND uses the neon code path with a precision of 16bit (IMHO, speed is preferred over the slight quality gain of the 32bit version on this platform) ? This behaviour will affect yuv420p but also nv12 and nv21 inputs. This is a kind of a temporary (and lame) solution until I have some time to work on it. Matthieu [...] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel