On 11/08/2011 07:40 PM, Nowlan, Sean wrote: > 3 quick questions - first, does the cmake setup automatically turn on > gcc optimizations, i.e, with "-O3"? Second, is there anything to be > gained (or lost) by turning on "-ftree-vectorize" and > "-funsafe-math-optimizations"? Finally, is the gcc on E100 really > CodeSourcery's arm-none-eabi-gcc (or an upstream GNU version > thereof)? >
CMake will automatically build in release mode, which gives you -03. Other important flags need to be specified, you can do this in one fell swoop with a toolchain file. Once is checked into the cmake/Toolchains directory, see comments for usage -josh > Thanks, Sean > > -----Original Message----- From: Nick Foster [mailto:n...@ettus.com] > Sent: Tuesday, November 08, 2011 4:10 PM To: Nowlan, Sean; > j...@ettus.com Subject: Re: [Discuss-gnuradio] Complex Short/INT16 > type > > On Tue, Nov 8, 2011 at 12:50 PM, Nowlan, Sean > <sean.now...@gtri.gatech.edu> wrote: >> So, what needs to be done? I noticed that there are already hooks >> for NEON in the volk library but no implementation (or very >> little... don't remember exactly). > > Josh is putting together a little example that uses Volk in > Gnuradio's core blocks (add, subtract, etc.). This will eventually > (hopefully) become the replacement for much of the functionality in > gnuradio-core. We've been talking about this for a long time, and it > should provide a pretty major speedup on all platforms, but > especially those for which the compiler sucks (ARM being the worst > offender). Josh's example should provide a framework for you to work > with while we get Volk fully integrated into Gnuradio "for real". > > You can also always use Volk functions in your own custom dsp blocks > to speed them up. You can also just use Volk outside of Gnuradio if > you like. > >> >> My understanding of Orc is that it generates architecture-dependent >> vector processor instructions from an Orc abstraction language. Is >> integrating Orc into Volk for NEON as simple as linking into liborc >> with a compile switch indicating that we want NEON output? Are the >> smarts already built into the cmake build process? > > Orc is actually a little cooler than that -- it's a runtime-compiled > architecture-independent vector assembly language. It's integrated as > one alternative architecture for implementing Volk functions. Volk > has been set up to automatically select the fastest implementation > available for a given function at runtime, so for the user it's as > simple as #include <volk/volk.h> and then > volk_32f_x2_add_32f_a16(...) to implement an adder. Volk will > automatically choose the fastest implementation at runtime the first > time the function is invoked, after figuring out what architecture > it's running on and what implementations are available for that given > function. If an Orc version of a function is available, it will be > automatically selected and the Orc code will runtime-compile to > vectorized NEON. You don't have to link against liborc at all, just > against libvolk. We don't have any native NEON in Volk -- we use Orc > to provide coverage on NEON platforms. We've found that Orc tends to > be around 90% as fast as good, hand-tuned assembly most of the time, > and sometimes faster. The reason we don't just use Orc for everything > is that it's usually possible to do a little better with careful > optimization and compiler intrinsics, and we were "gifted" a large > library of well-optimized SSE DSP routines to use. > >> >> Can I drop Philip's _fff and _ccf filters into volk and hit "go?" >> (I know there's more nuance to it, but if the combination of >> integrating Orc code and NEON FIR filter code that's already >> written gets me 90% of the way there, I'd be VERY happy! > > You can, but the _fff and _ccf filters are already implemented and > working in NEON. They were done by Phil before Volk was integrated, > so they're written in assembly in the filter core. They are also > automatically selected at runtime, so they should be "just working" > for you already. Eventually we'll pull the assembly implementations > out and put them into Volk. > > If you send me your flowgraph, I'll take a look at it on an E100 and > see if I can get some things optimized. > > --n > >> >> Thanks, Sean ________________________________________ From: Nick >> Foster [n...@ettus.com] Sent: Tuesday, November 08, 2011 1:27 PM >> To: j...@ettus.com Cc: discuss-gnuradio@gnu.org; Nowlan, Sean >> Subject: Re: [Discuss-gnuradio] Complex Short/INT16 type >> >> Sean, with all the talk about optimization for ARM, the first thing >> I would do is start to integrate Volk with existing floating-point >> blocks. Stock GCC is very, very bad at vectorizing for the NEON >> SIMD unit -- even when hardware floating point is used in GCC, most >> float instructions end up allocated to the VFP rather than the NEON >> unit. You might find an easy 2x-3x improvement just by doing the >> heavy lifting in Volk rather than in C++. All of the Orc functions >> in Volk will work for NEON. There's no FIR filter in Orc right now >> (need to get accumulators working properly in Orc), but Philip >> Balister already wrote NEON FIR filter cores for the _fff and _ccf >> FIR filters. >> >> This isn't to say that short complex wouldn't be a useful addition >> to GR. Just that it's likely going to be more work than making use >> of the existing floating-point hardware the E100 already has. >> >> This is work that needs to be done anyway to make ARM platforms as >> useful as possible, and we (Josh, Phil, and I) are happy to help >> you optimize your application for E100 if you give us details on >> how your application works. We're putting together a "motivating >> example" using Volk to show users how to Volkify their own blocks. >> >> --n >> >> On Tue, Nov 8, 2011 at 9:13 AM, Josh Blum <j...@ettus.com> wrote: >>> >>> >>> On 11/07/2011 02:15 PM, Nowlan, Sean wrote: >>>> Hi all - >>>> >>>> I'm getting limited by the slow ARM processor in the E100 and I >>>> want to modify parts of gr-digital and gnuradio-core to support >>>> complex short/INT16 types in the modulation schemes. I suspect >>>> that it won't be as trivial as defining "typedef >>>> std::complex<short> gr_complexs;" in >>>> gnuradio-core/src/lib/runtime/gr_complex.h and doing a >>> find-and-replace in the relevant source files. There are >>> probably >>> >>> It may be that simple for some blocks. Like the symbol table in >>> BPSK. >>> >>>> issues with dynamic range that I'll have to deal with in >>>> addition to having to implement filters using fixed-point >>>> math. >>>> >>> >>> Often blocks will need to have scale factors. Fortunatly, with a >>> FIR filter, you get a free scale factor in the "filter taps" >>> >>>> Questions: >>>> >>>> 1) Do you think I'd save anything by doing all the >>>> modulation & filtering in complex float32 and then converting >>>> at the very end? >>> >>> Its good to make the conversion part of an operation that does >>> something useful rather than doing it for the sake of >>> converting. Like a filter that takes in floats and spits out >>> shorts. >>> >>>> This will reduce the bandwidth requirement to the FPGA by two, >>>> but I'm afraid the float math is the true limitation. >>>> >>> >>> The format going into the FPGA is always integer. If you pass >>> floats into the UHD, they are copy-converted from host buffer to >>> memory mapped buffers. >>> >>>> 2) Why is there a gr_complex_to_interleaved_short block >>>> but not a gr_complex_to_complex_short block? Would it be better >>>> if I rolled my own or just hooked up a >>>> gr_complex_to_interleaved_short block and then a deinterleave >>>> block? Or alternatively, split the complex float vector into >>>> two streams and feed them to a USRP sink block using >>>> COMPLEX.INT16? >>>> >>> The interleaved short block is a strange hold-over from ancient >>> times. I would ignore it. I think a block such as >>> "gr_complex_to_complex_short" is a good idea. >>> >>>> 3) What specific parts of the modulation examples or >>>> gnuradio-core do you think I need to change to support complex >>>> short ints? >>>> >>> >>> Probably some new sc16 filter blocks for the matched filters. I >>> have mentioned the importance of volk before. >>> >>> The constellation stuff relies on this new constellation library >>> in gr-digital. Perhaps Ben can lean in here and offer some advice >>> on how to modify this for alternative data types. >>> >>> The recovery stuff in the BPSK is using Tom's new >>> gri-control-loop to simplify writing things like FLLs, PLLs. >>> Thats a place too look, see how the timing recovery blocks make >>> use of it. >>> >>> -Josh >>> >>> _______________________________________________ Discuss-gnuradio >>> mailing list Discuss-gnuradio@gnu.org >>> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio >>> >> > > _______________________________________________ Discuss-gnuradio > mailing list Discuss-gnuradio@gnu.org > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio