On Thu, Nov 19, 2015 at 11:48:53AM +0100, Clément Bœsch wrote: > From: Matthieu Bouron <matthieu.bou...@stupeflix.com> > > Signed-off-by: Matthieu Bouron <matthieu.bou...@stupeflix.com> > Signed-off-by: Clément Bœsch <clem...@stupeflix.com> > > --- > The function takes about 29ms with a 1080p source (testsrc2) on a > cortex-a8. Though, 16ms (more than half the time!) is spend in the vst2 > call. Any suggestion on how to speed up this? > > Also, the reference code seems to cause some kind of ringing, while our > ASM doesn't: > http://b.pkh.me/nv12-rgba-ref.png > http://b.pkh.me/nv12-rgba-neon.png
what did you test exactly here ? but there are several codepathes for rgb output, one uses LUTs and not all use full resolution chroma > > Last, we noticed that the y_offset is scaled to 1<<9 for some reason we > couldn't figure out. Hopefully we're doing it correctly here. [...] > +.macro compute_half_line dst half_y ofmt > + vmovl.u8 q7, \half_y @ 8px > of Y > + vdup.16 q5, r9 > + vsub.s16 q7, q5 > + vmull.s16 q1, d14, d0 @ q1 > = (srcY - y_offset) * y_coeff (left) > + vmull.s16 q2, d15, d0 @ q2 > = (srcY - y_offset) * y_coeff (right) if you do something like (srcY) * y_coeff - y_offset2 then you could keep a bit more precission in the requested brightness correction OTOH maybe you want to be bitexact to some existing codepath either way, your patch passes fate with arm qemu here so i have no objections if you also tested it and it works but maybe others have more comments about the asm ... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The misfortune of the wise is better than the prosperity of the fool. -- Epicurus
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel