Dave,

On Tue, Nov 30, 2010 at 2:15 AM, Michael Hope <michael.h...@linaro.org> wrote:
> On Tue, Nov 30, 2010 at 12:37 AM, Dave Martin <dave.mar...@linaro.org> wrote:
>> On Sun, Nov 28, 2010 at 10:28 PM, Michael Hope <michael.h...@linaro.org> 
>> wrote:
>>> I sat down and measured the power consumption of the NEON unit on an
>>> OMAP3.  Method and results are here:
>>>  https://wiki.linaro.org/MichaelHope/Sandbox/NEONPower
>>>
>>> The board takes 2.37 W and the NEON unit adds an extra 120 mW.
>>> Assuming the core takes 1 W, then the code needs to run 12 % faster
>>> with NEON on to be a net power win.
>>>
>>> Note that the results are inaccurate but valid enough.
>>
>> Just to play devil's advocate... the results will differ, perhaps
>> significantly, between SoCs of course.
>>
>> In terms of the amount of energy required to perform a particular
>> operation (i.e., at the microbenchmark level) I agree with your
>> conclusion.  However, in practice I suspect this isn't enough.  I'm
>> not familiar with exactly when NEON is likely to get turned on and
>> off, but you need to factor in the behaviour of the OS--- if you
>> accelerate a DSP operation which is used a few dozen times per
>> timeslice, NEON will be used for only a tiny proportion of the time it
>> is used, because once NEON is on, it probably stays on at least until
>> the interrupt, and probably until the next task switch.  With the
>> kernel configured for dynamic timer tick, this can get even more
>> exaggerated, since the rescheduling frequency may drop.
>>
>> The real benefits, in performance and power, therefore come in
>> operations which dominate the run-time of a particular process, such
>> as intensive image handling or codec operations.  NEON in
>> widely-dispersed but sporadically used features (such as
>> general-purpose library code) could be expected to come at a net power
>> cost.  If you use NEON for memcpy for example, you will basically
>> never be able to turn the NEON unit off.  That's unlikely to be a win
>> overall, since even if you now optimise all the code in the system for
>> NEON, you're unlikely to see a significant performance boost-- NEON
>> simply isn't designed for accelerating general-purpose code.
>>
>> The correct decision for how to optimise a given piece of code seems
>> to depend on the SoC and the runtime load profile.  And while you can
>> usefully predict that at build-time for a media player or dedicated
>> media stack components, it's pretty much impossible to do so with
>> general-purpose libraries... unless there's a cunning strategy I
>> haven't thought of.
>>
>> Ideally, processes whose load varies significantly over time and
>> between different use cases (such as Xorg) would be able to select
>> between NEON-ised and non-NEON-ised implementations dynamically, based
>> on the current load.  But I guess we're some distance away from being
>> able to achieve that... ?
>
> I agree.  I've been wondering if this is more of a power management
> topic as what you've described there is basically the same as what the
> CPU frequency governor does in deciding the best way to achieve a
> workload.  Perhaps this can also turn into hints to executing code re:
> what instruction set to use.
>
> There might be an argument for explicit control as well.  Say you're
> decoding a AAC stream and using 20 % CPU - it might be more efficient
> to acquire and release the NEON unit from within the decoder to start
> it up faster and release it as soon as the job is done.
>
> Could a kernel developer describe how the NEON unit is controlled?  My
> understanding is:
>  * NEON is generally off
>  * Executing a NEON instruction causes a instruction trap, which kicks
> the kernel, which starts the unit up
>  * The kernel only saves the NEON registers if the code uses them
>
> I'm not sure about:
>  * Does NEON remain on as long as that process is executing?  Does it
> get turned off on task switch, or perhaps after a timeout?
On OMAP3, Neon is a separate Power domain and it can transition to low
power state on its own based on its activity (managed by PRCM HW).
However Neon PD has a Wake dependency with MPU which means Neon is
woken up whenever MPU comes out of standby state.

>  * VFP uses the same register set.  Does a floating point instruction
> also turn the NEON coprocessor on?
Yes I supposed so since VFP engine is part of Neon Unit.

Vishwa
>
> -- Michael
>
> _______________________________________________
> linaro-dev mailing list
> linaro-dev@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-dev
>

_______________________________________________
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev

Reply via email to