On Mon, Aug 04, 2014 at 05:08:21PM +0200, Michael Niedermayer wrote: [...] > > > > + const float xd_2 = 1.306562964876380*xc_2 + > > > > 0.541196100146197*xc_3; > > > > + const float xd_3 = 0.541196100146197*xc_2 - > > > > 1.306562964876380*xc_3; > > > > + const float x1_9 = 0.707106781186547*xb_2 - > > > > 0.707106781186547*xd_3; > > > > + const float x1_a = 0.707106781186547*xb_2 + > > > > 0.707106781186547*xd_3; > > > > + const float x1_b = 0.707106781186547*xb_1 + > > > > 0.707106781186547*xd_1; > > > > + const float x1_c = 0.707106781186547*xb_1 - > > > > 0.707106781186547*xd_1; > > > > + const float x1_d = 0.707106781186547*xb_3 - > > > > 0.707106781186547*xd_2; > > > > + const float x1_e = 0.707106781186547*xb_3 + > > > > 0.707106781186547*xd_2; > > > > + dst[ 0*dst_stridea] = 0.25*x5_0; > > > > + dst[ 1*dst_stridea] = 0.25*xb_0; > > > > + dst[ 2*dst_stridea] = 0.25*x7_0; > > > > + dst[ 3*dst_stridea] = 0.25*x1_9; > > > > + dst[ 4*dst_stridea] = 0.25*x5_2; > > > > + dst[ 5*dst_stridea] = 0.25*x1_a; > > > > + dst[ 6*dst_stridea] = 0.25*x3_5; > > > > + dst[ 7*dst_stridea] = 0.25*x1_b; > > > > + dst[ 8*dst_stridea] = 0.25*x5_1; > > > > + dst[ 9*dst_stridea] = 0.25*x1_c; > > > > + dst[10*dst_stridea] = 0.25*x3_6; > > > > + dst[11*dst_stridea] = 0.25*x1_d; > > > > + dst[12*dst_stridea] = 0.25*x5_3; > > > > + dst[13*dst_stridea] = 0.25*x1_e; > > > > + dst[14*dst_stridea] = 0.25*x7_2; > > > > + dst[15*dst_stridea] = 0.25*xd_0; > > > > > > many of these multiplies look like they can be merged into other > > > multiplies > > > > > > for example see: > > > > > > > > > const float xd_2 = 1.306562964876380*xc_2 + 0.541196100146197*xc_3; > > > const float xb_3 = 0.541196100146197*xa_2 - 1.306562964876380*xa_3; > > > const float x1_d = 0.707106781186547*xb_3 - 0.707106781186547*xd_2; > > > const float x1_e = 0.707106781186547*xb_3 + 0.707106781186547*xd_2; > > > dst[11*dst_stridea] = 0.25*x1_d; > > > dst[13*dst_stridea] = 0.25*x1_e; > > > > > > vs. > > > > > > const float xd_2 = (0.25*0.707106781186547*1.306562964876380)*xc_2 + > > > (0.25*0.707106781186547*0.541196100146197)*xc_3; > > > const float xb_3 = (0.25*0.707106781186547*0.541196100146197)*xa_2 - > > > (0.25*0.707106781186547*1.306562964876380)*xa_3; > > > dst[11*dst_stridea] = xb_3 - xd_2; > > > dst[13*dst_stridea] = xb_3 + xd_2; > > > > also more generally > > if you have 2 stages of butterflies each with 4 multiplies and 2 adds > > in each butterfly > > > > a----\-/--\---/----------a' > > X \ / > > b----/-\----------\---/--b' > > / \ \ / > > c----\-/--/---\----------c' > > X / \ > > d----/-\----------/---\--d' > > > > > > of additions > > the first stage can scale their output arbitrarily for free by > > changing the respective coefficients > > the second stage can use any scaled input for free by adjusting their > > coefficients similarly, this gives you 4 free parameters in the > > example above which can be > > choosen so as to make some coefficients trivial like +-1.0 > > this also works accorss 2D (I)DCTs or with other things before or > > after the (i)dct which can absorb such rescaling > > also i suggest that the patch is applied before time is > spend optimizing it further, > also the moving around of multiplies can probably affect numerical > stability if overdone
I factorized obvious cases as you suggested. I also made the generated code more readable. Further optimizations are postponed. Applied, thanks! -- Clément B.
pgpLxNSfmpPMC.pgp
Description: PGP signature
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel