Dave, On Tue, Nov 30, 2010 at 2:15 AM, Michael Hope <michael.h...@linaro.org> wrote: > On Tue, Nov 30, 2010 at 12:37 AM, Dave Martin <dave.mar...@linaro.org> wrote: >> On Sun, Nov 28, 2010 at 10:28 PM, Michael Hope <michael.h...@linaro.org> >> wrote: >>> I sat down and measured the power consumption of the NEON unit on an >>> OMAP3. Method and results are here: >>> https://wiki.linaro.org/MichaelHope/Sandbox/NEONPower >>> >>> The board takes 2.37 W and the NEON unit adds an extra 120 mW. >>> Assuming the core takes 1 W, then the code needs to run 12 % faster >>> with NEON on to be a net power win. >>> >>> Note that the results are inaccurate but valid enough. >> >> Just to play devil's advocate... the results will differ, perhaps >> significantly, between SoCs of course. >> >> In terms of the amount of energy required to perform a particular >> operation (i.e., at the microbenchmark level) I agree with your >> conclusion. However, in practice I suspect this isn't enough. I'm >> not familiar with exactly when NEON is likely to get turned on and >> off, but you need to factor in the behaviour of the OS--- if you >> accelerate a DSP operation which is used a few dozen times per >> timeslice, NEON will be used for only a tiny proportion of the time it >> is used, because once NEON is on, it probably stays on at least until >> the interrupt, and probably until the next task switch. With the >> kernel configured for dynamic timer tick, this can get even more >> exaggerated, since the rescheduling frequency may drop. >> >> The real benefits, in performance and power, therefore come in >> operations which dominate the run-time of a particular process, such >> as intensive image handling or codec operations. NEON in >> widely-dispersed but sporadically used features (such as >> general-purpose library code) could be expected to come at a net power >> cost. If you use NEON for memcpy for example, you will basically >> never be able to turn the NEON unit off. That's unlikely to be a win >> overall, since even if you now optimise all the code in the system for >> NEON, you're unlikely to see a significant performance boost-- NEON >> simply isn't designed for accelerating general-purpose code. >> >> The correct decision for how to optimise a given piece of code seems >> to depend on the SoC and the runtime load profile. And while you can >> usefully predict that at build-time for a media player or dedicated >> media stack components, it's pretty much impossible to do so with >> general-purpose libraries... unless there's a cunning strategy I >> haven't thought of. >> >> Ideally, processes whose load varies significantly over time and >> between different use cases (such as Xorg) would be able to select >> between NEON-ised and non-NEON-ised implementations dynamically, based >> on the current load. But I guess we're some distance away from being >> able to achieve that... ? > > I agree. I've been wondering if this is more of a power management > topic as what you've described there is basically the same as what the > CPU frequency governor does in deciding the best way to achieve a > workload. Perhaps this can also turn into hints to executing code re: > what instruction set to use. > > There might be an argument for explicit control as well. Say you're > decoding a AAC stream and using 20 % CPU - it might be more efficient > to acquire and release the NEON unit from within the decoder to start > it up faster and release it as soon as the job is done. > > Could a kernel developer describe how the NEON unit is controlled? My > understanding is: > * NEON is generally off > * Executing a NEON instruction causes a instruction trap, which kicks > the kernel, which starts the unit up > * The kernel only saves the NEON registers if the code uses them > > I'm not sure about: > * Does NEON remain on as long as that process is executing? Does it > get turned off on task switch, or perhaps after a timeout? On OMAP3, Neon is a separate Power domain and it can transition to low power state on its own based on its activity (managed by PRCM HW). However Neon PD has a Wake dependency with MPU which means Neon is woken up whenever MPU comes out of standby state.
> * VFP uses the same register set. Does a floating point instruction > also turn the NEON coprocessor on? Yes I supposed so since VFP engine is part of Neon Unit. Vishwa > > -- Michael > > _______________________________________________ > linaro-dev mailing list > linaro-dev@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-dev > _______________________________________________ linaro-dev mailing list linaro-dev@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-dev