I'm not feeling the same guilt, Philip, so I'll just go ahead and "complain" about ARM :D
So, the ARM/Thumb instruction sets don't come with a Modulo instruction; Hence, "a%b" very likely is implemented as a-⎣a/b⎦·b And integer division often takes multiple cycles, and even more so, can take 4 up to 20 cycles (IIRC, but that might've been on an A7). NEON, the ARM SIMD instruction set extension, afaik doesn't even *have* a vectorized division operation, so this takes *at least* four cycle for the `a/b`, one for the `·b`, and one for the `a-`, and then you use that to address an array, so there's another multiply/accumulate happening there. That's a minimum of 8 cycles just to know where to store the result of whatever you do in `= …`. Of course, take this with a grain of salt, compilers aren't stupid :) So, the first thing I'd try is make your counter / index variable `int i` an `unsigned int i`, and the same for your size variable `arr_size`. (I'd still argue that the compiler *should* be inferring that `i` can only take signed values, but we don't know it does.) I'd even go as far as simply using `size_t arr_size` being the "optimal" way, because that way you tell your compiler that it must use an integer type that can hold the largest array size your architecture can deal with (`size_t` is in stddef.h, and an unsigned type, btw). I'm sometimes a bit zealous about making sure that size and counter variables are unsigned numbers – that way, your compiler will warn you whenever you are comparing a counter to a signed int, which can only count "half as far", and also, in function signatures, will tell you early on when there might be a corner case where you e.g. `malloc(-1 * sizeof(int))` or so. As Philip said, you can improve that if you have knowledge about `arr_size`. If it's a power of 2, then `i%arr_size=i&(arr_size-1)`, for example. There's also beauty in having an outer loop that simply runs from 0 to `arr_size` and an inner loop that iterates over all the `i` that are in the same rest class mod arr_size: If you do it that way, your stores will be linear in memory, and that's something you want to avoid unnecessarily updating cache lines mulitple times. Best regards, Marcus On 07.09.2017 04:21, Philip Balister via USRP-users wrote: > On 09/06/2017 10:17 PM, Philip Balister via USRP-users wrote: >> >> On 09/06/2017 07:07 PM, Taliver Heath wrote: >>> I had the same issues -- the big performance eater in my case was anything >>> that was doing modulo in a tight loop. >>> >>> So, if you have something like: >>> >>> for ( int i = 0; i < 1000; i++) { >>> array[i % arr_size] = ... >>> } >> Yeah, basically the % operator has a poor implementation. I found this > OK, feel guilty saying poor implementation :) > > % calculates the remainder after a division. So far two number, without > any idea how you got there, I'm sure it is fine. If you are using it to > detect a counter going past a number, you are doing it wrong :) > > Philip > >> out yeara ago. Really glad it made it to the list. Just count and reset >> back to zero when you overflow. >> >> Philip >> >>> You'll take a pretty big hit. >>> >>> On Wed, Sep 6, 2017 at 4:00 PM, Tom Bereknyei via USRP-users < >>> usrp-users@lists.ettus.com> wrote: >>> >>>> We ran into a similar issue. Big things that helped us was to move high >>>> rate dsp calculations to RFNoC. >>>> >>>> I've also had luck with volk_profile. It seems to help with some >>>> workloads. >>>> On Wed, Sep 6, 2017 at 16:53 Philip Balister via USRP-users < >>>> usrp-users@lists.ettus.com> wrote: >>>> >>>>> On 09/06/2017 04:38 PM, Marcus Müller via USRP-users wrote: >>>>>> Hi Mr Hamilton, >>>>>> >>>>>> So, what you'd want to optimize first depends on what needs the most >>>>>> optimization. Your x86 program might be a good place to start looking >>>>>> into what the bottleneck is. If you're running Linux on your x86, I can >>>>>> heartily recommend `perf`, which is a tool that lets you display live, >>>>>> record and analyze the points in your code where the program spends most >>>>>> time. >>>>> "perf top" gives results pretty quickly. >>>>> >>>>> It also sounds like you aren't using both cpu's to the full extent. >>>>> Maybe there is just one block doing all the work? >>>>> >>>>> Also, looking at using rfnoc to do high rate functions to reduce >>>>> calculations that need doing on the arm is a good plan. >>>>> >>>>> Philip >>>>> >>>>>> In general, modern x86 have way larger memory bandwidth and larger CPU >>>>>> caches, so that alone can become critical, but also things like more >>>>>> capable SIMD instructions and less hardware-handling overhead. >>>>>> >>>>>> I don't know whether this helped you much, but I hope it's a start, >>>>>> best regards, >>>>>> >>>>>> Marcus Müller >>>>>> >>>>>> On 09/06/2017 10:06 PM, S Hamilton via USRP-users wrote: >>>>>>> We're moving an application that we had running on pc hardware with >>>>>>> the Ettus B210, to the embedded arm E310. On the pc side we were at >>>>>>> 80% idle cpu when running (intel i5-4570). With armv7 we're down to >>>>>>> 30% idle, with one of the cores @100% so it's not keeping up. >>>>>>> Are there any arm specific optimizations that are recommended or >>>>> gotchas. >>>>>>> We are using the release4 version of the SDK and firmware. >>>>>>> >>>>>>> We'd also like to use the complex_to_mag_approx RFNOC block. Is there >>>>>>> any sample code around to look at. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> USRP-users mailing list >>>>>>> USRP-users@lists.ettus.com >>>>>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> USRP-users mailing list >>>>>> USRP-users@lists.ettus.com >>>>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>>>>> >>>>> _______________________________________________ >>>>> USRP-users mailing list >>>>> USRP-users@lists.ettus.com >>>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>>>> >>>> -- >>>> Maj Tom Bereknyei >>>> Defense Digital Service >>>> t...@dds.mil >>>> (571) 225-1630 >>>> >>>> _______________________________________________ >>>> USRP-users mailing list >>>> USRP-users@lists.ettus.com >>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>>> >>>> >> _______________________________________________ >> USRP-users mailing list >> USRP-users@lists.ettus.com >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com