On Sun, Nov 28, 2010 at 10:28 PM, Michael Hope <michael.h...@linaro.org> wrote:
> I sat down and measured the power consumption of the NEON unit on an
> OMAP3.  Method and results are here:
>  https://wiki.linaro.org/MichaelHope/Sandbox/NEONPower
>
> The board takes 2.37 W and the NEON unit adds an extra 120 mW.
> Assuming the core takes 1 W, then the code needs to run 12 % faster
> with NEON on to be a net power win.
>
> Note that the results are inaccurate but valid enough.

Just to play devil's advocate... the results will differ, perhaps
significantly, between SoCs of course.

In terms of the amount of energy required to perform a particular
operation (i.e., at the microbenchmark level) I agree with your
conclusion.  However, in practice I suspect this isn't enough.  I'm
not familiar with exactly when NEON is likely to get turned on and
off, but you need to factor in the behaviour of the OS--- if you
accelerate a DSP operation which is used a few dozen times per
timeslice, NEON will be used for only a tiny proportion of the time it
is used, because once NEON is on, it probably stays on at least until
the interrupt, and probably until the next task switch.  With the
kernel configured for dynamic timer tick, this can get even more
exaggerated, since the rescheduling frequency may drop.

The real benefits, in performance and power, therefore come in
operations which dominate the run-time of a particular process, such
as intensive image handling or codec operations.  NEON in
widely-dispersed but sporadically used features (such as
general-purpose library code) could be expected to come at a net power
cost.  If you use NEON for memcpy for example, you will basically
never be able to turn the NEON unit off.  That's unlikely to be a win
overall, since even if you now optimise all the code in the system for
NEON, you're unlikely to see a significant performance boost-- NEON
simply isn't designed for accelerating general-purpose code.

The correct decision for how to optimise a given piece of code seems
to depend on the SoC and the runtime load profile.  And while you can
usefully predict that at build-time for a media player or dedicated
media stack components, it's pretty much impossible to do so with
general-purpose libraries... unless there's a cunning strategy I
haven't thought of.

Ideally, processes whose load varies significantly over time and
between different use cases (such as Xorg) would be able to select
between NEON-ised and non-NEON-ised implementations dynamically, based
on the current load.  But I guess we're some distance away from being
able to achieve that... ?

Cheers
---Dave

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to