On Mon, Jul 13, 2015 at 11:39:15PM -0300, James Almer wrote: > On 12/07/15 8:33 PM, Ronald S. Bultje wrote: > > +INIT_XMM sse4 > > +cglobal ssim_end_line, 3, 3, 6, sum0, sum1, w > > + pxor m0, m0 > > +.loop: > > + mova m1, [sum0q+mmsize*0] > > + mova m2, [sum0q+mmsize*1] > > + mova m3, [sum0q+mmsize*2] > > + mova m4, [sum0q+mmsize*3] > > + paddd m1, [sum1q+mmsize*0] > > + paddd m2, [sum1q+mmsize*1] > > + paddd m3, [sum1q+mmsize*2] > > + paddd m4, [sum1q+mmsize*3] > > + paddd m1, m2 > > + paddd m2, m3 > > + paddd m3, m4 > > + paddd m4, [sum0q+mmsize*4] > > + paddd m4, [sum1q+mmsize*4] > > + TRANSPOSE4x4D 1, 2, 3, 4, 5 > > + > > + ; m1 = fs1, m2 = fs2, m3 = fss, m4 = fs12 > > + pslld m3, 6 > > + pslld m4, 6 > > + pmulld m5, m1, m2 ; fs1 * fs2 > > + pmulld m1, m1 ; fs1 * fs1 > > + pmulld m2, m2 ; fs2 * fs2 > > If these values are guaranteed to be always positive then this could also > be implemented with pmuludq to get an sse2 version working. Although I'm > not sure if it's worth doing. It will be six pmuludq and an awful lot of > shuffling and unpacking when the speed up of the sse4 version is already > only ~2x the C version. >
> This was already oked (Same with the psnr sse2 code), so it should be > pushed already. /me wonders a little bit why noone else applied it yet, but applied thanks [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB No human being will ever know the Truth, for even if they happen to say it by chance, they would not even known they had done so. -- Xenophanes
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel