Hi Federico, I don't know if that will help much, but: > volk_32fc_magnitude_squared_32f(&mag_sq_b[0], &b[0], N); // mag_sq_b = > |b|^2 Maybe doing it in-place, i.e. > volk_32fc_magnitude_squared_32f(&[0], &b[0], N); // b = |b|^2 might be even faster; just don't forget that you're then treating the first half of b as floats instead of complexes.
I just realized there's the __mm_rcp_ps SSE1 intrinsic... maybe that complex/complex VOLK kernel is closer than I thought. Cheers, Marcus On 13.05.2016 20:59, Federico Larroca wrote: > Thank you Andy. However, I only need the division, although this is > indeed a good idea if more operations were needed. > > So far, I've applied the following lines with some significant savings > (w.r.t. a loop): > > volk_32fc_x2_multiply_conjugate_32fc(&c[0], &a[0], &b[0], N); // c = > a*conj(b) > volk_32fc_magnitude_squared_32f(&mag_sq_b[0], &b[0], N); // mag_sq_b = > |b|^2 > volk_32f_x2_divide_32f(&inv_mag_sq_b[0], &ones[0], &mag_sq_b[0], N); > // inv_mag_sq_b = 1/|b|^2, since I've previously defined ones as an > array containing N ones. > volk_32fc_32f_multiply_32fc(&out[0], &c[0], &inv_mag_sq_b[0], N); // > out = c*inv_mag_sq_b = a*conj(b)/|b|^2 = a/b > > The idea of using VOLK's pow operator is significantly slower. > > I've also experienced interesting performance improvements by > simplifying some for loops not amenable to VOLK, as suggested by > Marcus. On the other hand, I'm crazy enough to try to implement a VOLK > kernel that performs the division. I've just started, don't know if > I'll be successful, but guess I'll learn something anyhow. > > best > Federico > > 2016-05-13 15:14 GMT-03:00 Andy Walls <a...@silverblocksystems.net > <mailto:a...@silverblocksystems.net>>: > > On Thu, 2016-05-12 at 16:24 -0400, > discuss-gnuradio-requ...@gnu.org > <mailto:discuss-gnuradio-requ...@gnu.org> > wrote: > > Date: Wed, 11 May 2016 16:09:56 -0300 > > From: Federico Larroca > > To: discuss-gnuradio@gnu.org <mailto:discuss-gnuradio@gnu.org> > > Subject: [Discuss-gnuradio] VOLK division between complexes > > > Hello everyone, > > We are on the stage of optimizing our project (gr-isdbt). One of the > > most consuming blocks is OFDM synchronization, and in particular the > > equalization phase. This is simply the division between the input > > signal and the estimated channel gains (two modestly big arrays of > > ~5000 complexes for each OFDM symbol). > > Until now, this was performed by a for loop, so my plan was to > change > > it for a volk function. However, there is no complex division in > VOLK. > > So I've done a rather indirect operation using the property that > a/b = > > a*conj(b)/|b|^2, resulting in six lines of code (a multiply > conjugate, > > a magnitude squared, a deinterleave, a couple of float divisions and > > an interleave). Obviously the performance gain (measured with the > > Performance Monitor) is marginal (to be optimistic)... > > Does anyone has a better idea? > > I have a different idea, but I doubt it is better. The transformation > > w = Log (z) = ln|z| + jArg(z) > > transforms multiplication, division, power and root operations into > addition, subtraction, multiplication and division operations > respectively. > > So if c = Log(a), d = Log(b), then a/b = Exp(c-d) . > > If along with your complex division, you also have a lot of additional > complex multiplcation, power, and/or (real) root operations to > perform, > then the transform *might* give you a savings. A savings would > also be > more likely, if you don't need to invert the transformation at the end > (i.e. no need for z = Exp(w)). > > Regards, > Andy > > > Implementing a new kernel is simply out of my knowledge scope. > > Best > > Federico > > > > > > _______________________________________________ > Discuss-gnuradio mailing list > Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio