Forgot to include the link to my benchmarking tool: https://github.com/marcusmueller/table_vs_volk Had too look intensely for your mail: Trek, please don't "hijack" other threads by replying to them with a completely unrelated topic. If starting a new topic, simply send an email to the mailing list, without using the "reply" functionality, or else, most people won't even see it, because it's buried in a discussion thread irrelevant to them.
Best regards, Marcus On 07.04.2016 11:40, Marcus Müller wrote: > Hi Trek, > > as Martin noted, yes, if you search the GNU Radio source tree for that > file name, you'll find it. And also, yes, GNU Radio is Free Software, > and one of the main credos of that is that you should be able to use > everything from it for your own purposes (as long as you adhere to the > freeness that the part you're using demands; for GNU Radio, that's > GPLv3). However, to be honest, a linear approximation-based 8kB sine > table might or might not be the right tool for your problem – usually, > one would just think about what one needs and generate the sine table > oneself, matching exactly the requirements at hand. > > Us being DSP nerds, I guess some of us are curious: what is your fixed > point application? Are you planning to use this on some > microcontroller, or some programmable logic device, or do you need a > sin where you transform fixed point values (e.g. from an ADC) to > floating point values? What is the algorithm you're building with that? > > However, are you /sure/ a sine table is the optimum for your specific > problem? > I'm not an overly big fan of uniform sine tables (they make a lot of > sense on e.g. microcontrollers that don't have advanced math > functions, and if you don't need the accuracy), but if you look at > VOLK, you'll find things that are comparably fast, or in my case, even > faster; using a benchmarking stub I've got lying around (didn't > specify any compiler optimizations, i.e. gcc will not optimize). > Doing 100000000 operations. > fixed point > 0.781710s wall, 0.780000s user + 0.000000s system = 0.780000s CPU (99.8%) > standard libc float32 sin > 2.700463s wall, 2.700000s user + 0.000000s system = 2.700000s CPU (100.0%) > VOLK float32 sin > Using Volk machine: avx2_64_mmx_orc > 0.331708s wall, 0.330000s user + 0.000000s system = 0.330000s CPU (99.5%) > dummy memory bandwidth test: copy out- to input > 0.404707s wall, 0.400000s user + 0.000000s system = 0.400000s CPU (98.8%) > dummy memory bandwidth test: copy in- to output > 0.406990s wall, 0.410000s user + 0.000000s system = 0.410000s CPU (100.7%) > > Volk of course only makes sense if you can arrange your algorithms so > that you get a lot of sin input values continuously in memory. > > Four observations: > > 1. This sine-table implementation is but three times faster than the > standard libc sin, not even counting the fact that you'd have to > first come up with the proper input scaling. Unless your program > is really dominated by sin() performance, this might not be even > worth considering. A general hint: run "perf record -a > yourprogram"; "perf report" to find out where your PC spent it's > time. Well, at least without compiler optimizations. > 2. The VOLK routine is twice as fast as the fixed point > implementation, and being a six-summand Taylor series > approximation probably more accurate. > 3. Enabling compiler optimizations (CFLAGS=-Ofast make) will probably > double the speed of sin (my experience), and severely cut the the > time that the fixed point implementation takes, probably slightly > below the time of Volk (which will not change measurably). That's > because the compiler will inline everything in the fixed point > routine. Whether that slight advantage then will be worth the > accuracy loss is up to you. > 4. VOLK's sin is faster than float-wise copy (here, without compiler > optimizations); what seems paradox shows that making extensive use > of memory alignment and SIMD brings you much closer to the memory > bandwidth barrier. Knowing my machine, I now have a guess for the > performance of the fixed point sin table approach under heavy > compiler optimization: it will take around ¼ of the time one of > the dummy copies takes; that's how fast you get with 4-float32 > SIMD here, assuming this is really only bandwidth-limited. Trying > this verifies my suspicion! > > As you can see, the question what approach is fast really depends on > what your compiler does, what SIMD instructions you can make use of > (VOLK's sin only has optimizations for SSE4.1, I think) and how your > data lies in memory. > > Best regards, > Marcus > > On 07.04.2016 05:26, Trek Liu wrote: >> What is the purpose of this file? There is zero documentation in this >> file, is it ever being used? >> I am looking for a sin/cos table for speed optimization, is there one >> inside gnuradio? >> >> Thanks. >> >> >> _______________________________________________ >> Discuss-gnuradio mailing list >> Discuss-gnuradio@gnu.org >> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio >
_______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio