Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Jon Medhurst (Tixy)
On Thu, 2012-12-13 at 13:52 -0800, Christoffer Dall wrote:
> On Friday, December 7, 2012, Jon Medhurst (Tixy) wrote:
> 
> > To enable CPU hotplug the need to provide some boot code at the
> reset
> > vector and which survives after the kernel has booted without being
> > overwritten. We achieve this by the getting the linker script to
> place
> > the code in boot.S at address zero. This now means we can delete the
> > code that relocates the secondary CPU pen code to "a location less
> > likely to be overridden".
> >
> 
> not knowing these dev boards's memory maps I assume that the sysflags
> are
> initialized to 0

That's what the TRM says and what is observed in practice.

>  and that the first part of the physical memory space is
> reserved for secure access?

Wasn't quite sure what you meant my that, so looked at the RTSM manual
for 'security' and spotted that if you enable the 'secure_memory'
option, then the NOR flash at address 0 is "Secure RO, aborts on
non-secure accesses". So if we need to support that option (do we?) then
my patch is broken.

> Otherwise I can't see any issues with this code.
> 
> Acked-by: Christoffer Dall 
> 
Thanks for taking the time to review this.

-- 
Tixy

> >
> > We then modify the boot protocol slightly to allow hot-plugging of
> any
> > CPU, including CPU #0, when the system is already booted. This is
> done
> > by checking if SYS_FLAGS is already set before the normal check for
> CPU0
> > and the boot-or-wait decision made.
> >
> > This patch is based on work by Nicolas Pitre.
> >
> > Signed-off-by: Nicolas Pitre >
> > Signed-off-by: Jon Medhurst >
> > ---
> >  boot.S  |   41 -
> >  model.lds.S |3 ++-
> >  2 files changed, 26 insertions(+), 18 deletions(-)
> >
> > diff --git a/boot.S b/boot.S
> > index dd453e3..fb56693 100644
> > --- a/boot.S
> > +++ b/boot.S
> > @@ -12,6 +12,8 @@
> > .arch_extension virt
> > .text
> >
> > +   b   start   @ Must be first instruction for power on
> reset
> > vector
> > +
> >  .macro enter_hyp
> > @ We assume we're entered in Secure Supervisor mode. To
> > @ get to Hyp mode we have to pass through Monitor mode
> > @@ -119,27 +121,32 @@ start:
> > orr r0, r0, r1
> > mcr p15, 0, r0, c1, c1, 2
> >
> > -   @ Check CPU nr again
> > -   mrc p15, 0, r0, c0, c0, 5   @ MPIDR (ARMv7 only)
> > -   bfc r0, #24, #8 @ CPU number, taking
> > multicluster into account
> > -   cmp r0, #0  @ primary CPU?
> > -   beq 2f
> > -
> > -   @
> > -   @ Secondary CPUs (following the RealView SMP booting
> protocol)
> > -   @
> > -   enter_hyp
> > -
> > -   ldr r1, =fs_start - 0x100
> > -   adr r2, 1f
> > -   ldmia   r2, {r3 - r7}   @ move the code to a
> > location
> > -   stmia   r1, {r3 - r7}   @ less likely to be
> > overridden
> > +   /*
> > +* If SYS_FLAGS is already set, this is a warm boot and we
> blindly
> > +* branch to the indicated address right away, irrespective
> of the
> > +* CPU we are.
> > +*/
> >  #ifdef VEXPRESS
> > ldr r0, =0x1c010030 @ VE SYS_FLAGS
> register
> >  #else
> > ldr r0, =0x1030 @ RealView SYS_FLAGS
> > register
> >  #endif
> > -   mov pc, r1  @ branch to the
> relocated
> > code
> > +   ldr r1, [r0]
> > +   cmp r1, #0
> > +   beq 1f
> > +   enter_hyp
> > +   bx  r1
> > +1:
> > +   /*
> > +* Otherwise this is a cold boot.  In this case it depends
> if
> > +* we are the primary CPU or not.  The primary CPU boots the
> system
> > +* while the secondaries wait for the primary to set
> SYS_FLAGS.
> > +*/
> > +   mrc p15, 0, r1, c0, c0, 5
> > +   bicsr1, #0xff00
> > +   beq 2f
> > +
> > +   enter_hyp
> >  1:
> >  #ifdef VEXPRESS
> > wfe
> > @@ -147,7 +154,7 @@ start:
> > ldr r1, [r0]
> > cmp r1, #0
> > beq 1b
> > -   mov pc, r1  @ branch to the
> given
> > address
> > +   bx  r1  @ branch to the
> given
> > address
> >  #endif
> >
> >  2:
> > diff --git a/model.lds.S b/model.lds.S
> > index 793df89..f37824e 100644
> > --- a/model.lds.S
> > +++ b/model.lds.S
> > @@ -27,7 +27,8 @@ STACKTOP = 0xff00;
> >
> >  SECTIONS
> >  {
> > - . = PHYS_OFFSET;
> > + . = 0;
> > + .boot : { boot.o }
> >
> >   . = PHYS_OFFSET + 0x8000 - 0x40;
> >
> > --
> > 1.7.10.4
> >
> >
> >
> > ___
> > kvmarm mailing list
> > kvm...@lists.cs.columbia.edu 
> > https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
> >
> 
> 


___
linaro-dev mailing list
linaro-dev@lists.l

Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-14 Thread Vincent Guittot
On 14 December 2012 02:46, Alex Shi  wrote:
> On 12/13/2012 11:48 PM, Vincent Guittot wrote:
>> On 13 December 2012 15:53, Vincent Guittot  
>> wrote:
>>> On 13 December 2012 15:25, Alex Shi  wrote:
 On 12/13/2012 06:11 PM, Vincent Guittot wrote:
> On 13 December 2012 03:17, Alex Shi  wrote:
>> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
>>> During the creation of sched_domain, we define a pack buddy CPU for 
>>> each CPU
>>> when one is available. We want to pack at all levels where a group of 
>>> CPU can
>>> be power gated independently from others.
>>> On a system that can't power gate a group of CPUs independently, the 
>>> flag is
>>> set at all sched_domain level and the buddy is set to -1. This is the 
>>> default
>>> behavior.
>>> On a dual clusters / dual cores system which can power gate each core 
>>> and
>>> cluster independently, the buddy configuration will be :
>>>
>>>   | Cluster 0   | Cluster 1   |
>>>   | CPU0 | CPU1 | CPU2 | CPU3 |
>>> ---
>>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
>>>
>>> Small tasks tend to slip out of the periodic load balance so the best 
>>> place
>>> to choose to migrate them is during their wake up. The decision is in 
>>> O(1) as
>>> we only check again one buddy CPU
>>
>> Just have a little worry about the scalability on a big machine, like on
>> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
>> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
>> is different on task distribution decision.
>
> The buddy CPU should probably not be the same for all 64 LCPU it
> depends on where it's worth packing small tasks

 Do you have further ideas for buddy cpu on such example?
>>>
>>> yes, I have several ideas which were not really relevant for small
>>> system but could be interesting for larger system
>>>
>>> We keep the same algorithm in a socket but we could either use another
>>> LCPU in the targeted socket (conf0) or chain the socket (conf1)
>>> instead of packing directly in one LCPU
>>>
>>> The scheme below tries to summaries the idea:
>>>
>>> Socket  | socket 0 | socket 1   | socket 2   | socket 3   |
>>> LCPU| 0 | 1-15 | 16 | 17-31 | 32 | 33-47 | 48 | 49-63 |
>>> buddy conf0 | 0 | 0| 1  | 16| 2  | 32| 3  | 48|
>>> buddy conf1 | 0 | 0| 0  | 16| 16 | 32| 32 | 48|
>>> buddy conf2 | 0 | 0| 16 | 16| 32 | 32| 48 | 48|
>>>
>>> But, I don't know how this can interact with NUMA load balance and the
>>> better might be to use conf3.
>>
>> I mean conf2 not conf3
>
> So, it has 4 levels 0/16/32/ for socket 3 and 0 level for socket 0, it
> is unbalanced for different socket.

That the target because we have decided to pack the small tasks in
socket 0 when we have parsed the topology at boot.
We don't have to loop into sched_domain or sched_group anymore to find
the best LCPU when a small tasks wake up.

>
> And the ground level has just one buddy for 16 LCPUs - 8 cores, that's
> not a good design, consider my previous examples: if there are 4 or 8
> tasks in one socket, you just has 2 choices: spread them into all cores,
> or pack them into one LCPU. Actually, moving them just into 2 or 4 cores
> maybe a better solution. but the design missed this.

You speak about tasks without any notion of load. This patch only care
of small tasks and light LCPU load, but it falls back to default
behavior for other situation. So if there are 4 or 8 small tasks, they
will migrate to the socket 0 after 1 or up to 3 migration (it depends
of the conf and the LCPU they come from).

Then, if too much small tasks wake up simultaneously on the same LCPU,
the default load balance will spread them in the core/cluster/socket

>
> Obviously, more and more cores is the trend on any kinds of CPU, the
> buddy system seems hard to catch up this.
>
>

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-14 Thread Vincent Guittot
On 14 December 2012 08:45, Mike Galbraith  wrote:
> On Fri, 2012-12-14 at 14:36 +0800, Alex Shi wrote:
>> On 12/14/2012 12:45 PM, Mike Galbraith wrote:
>> >> > Do you have further ideas for buddy cpu on such example?
>> >>> > >
>> >>> > > Which kind of sched_domain configuration have you for such system ?
>> >>> > > and how many sched_domain level have you ?
>> >> >
>> >> > it is general X86 domain configuration. with 4 levels,
>> >> > sibling/core/cpu/numa.
>> > CPU is a bug that slipped into domain degeneration.  You should have
>> > SIBLING/MC/NUMA (chasing that down is on todo).
>>
>> Maybe.
>> the CPU/NUMA is different on domain flags, CPU has SD_PREFER_SIBLING.
>
> What I noticed during (an unrelated) bisection on a 40 core box was
> domains going from so..
>
> 3.4.0-bisect (virgin)
> [5.056214] CPU0 attaching sched-domain:
> [5.065009]  domain 0: span 0,32 level SIBLING
> [5.075011]   groups: 0 (cpu_power = 589) 32 (cpu_power = 589)
> [5.088381]   domain 1: span 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
> [5.107669]groups: 0,32 (cpu_power = 1178)  4,36 (cpu_power = 1178)  
> 8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
>  16,48 (cpu_power = 1177) 20,52 (cpu_power = 1178) 
> 24,56 (cpu_power = 1177) 28,60 (cpu_power = 1177)
>  64,72 (cpu_power = 1176) 68,76 (cpu_power = 1176)
> [5.162115]domain 2: span 0-79 level NODE
> [5.171927] groups: 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11773)
>
> 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11772)
>
> 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78 (cpu_power = 11773)
>
> 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11770)
>
> ..to so, which looks a little bent.  CPU and MC have identical spans, so
> CPU should have gone away, as it used to do.
>
> 3.6.0-bisect (virgin)
> [3.978338] CPU0 attaching sched-domain:
> [3.987125]  domain 0: span 0,32 level SIBLING
> [3.997125]   groups: 0 (cpu_power = 588) 32 (cpu_power = 589)
> [4.010477]   domain 1: span 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
> [4.029748]groups: 0,32 (cpu_power = 1177)  4,36 (cpu_power = 1177)  
> 8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
>  16,48 (cpu_power = 1178) 20,52 (cpu_power = 1178) 
> 24,56 (cpu_power = 1178) 28,60 (cpu_power = 1178)
>  64,72 (cpu_power = 1178) 68,76 (cpu_power = 1177)
> [4.084143]domain 2: span 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level CPU
> [4.103796] groups: 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)
> [4.124373] domain 3: span 0-79 level NUMA
> [4.134369]  groups: 
> 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)
> 
> 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11778)
> 
> 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74 ,78 (cpu_power = 11778)
> 
> 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11780)
>

Thanks. that's an interesting example of a numa topology

For your sched_domain difference,
On 3.4, SD_PREFER_SIBLING was set for both MC and CPU level thanks to
sd_balance_for_mc_power and  sd_balance_for_package_power
On 3.6, SD_PREFER_SIBLING is only set for CPU level and this flag
difference with MC level prevents the destruction of CPU sched_domain
during the degeneration

We may need to set SD_PREFER_SIBLING for MC level

Vincent

> -Mike
>

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-14 Thread Mike Galbraith
On Fri, 2012-12-14 at 14:36 +0800, Alex Shi wrote: 
> On 12/14/2012 12:45 PM, Mike Galbraith wrote:
> >> > Do you have further ideas for buddy cpu on such example?
> >>> > > 
> >>> > > Which kind of sched_domain configuration have you for such system ?
> >>> > > and how many sched_domain level have you ?
> >> > 
> >> > it is general X86 domain configuration. with 4 levels,
> >> > sibling/core/cpu/numa.
> > CPU is a bug that slipped into domain degeneration.  You should have
> > SIBLING/MC/NUMA (chasing that down is on todo).
> 
> Maybe.
> the CPU/NUMA is different on domain flags, CPU has SD_PREFER_SIBLING.

What I noticed during (an unrelated) bisection on a 40 core box was
domains going from so..

3.4.0-bisect (virgin)
[5.056214] CPU0 attaching sched-domain:
[5.065009]  domain 0: span 0,32 level SIBLING
[5.075011]   groups: 0 (cpu_power = 589) 32 (cpu_power = 589)
[5.088381]   domain 1: span 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
[5.107669]groups: 0,32 (cpu_power = 1178)  4,36 (cpu_power = 1178)  
8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
 16,48 (cpu_power = 1177) 20,52 (cpu_power = 1178) 
24,56 (cpu_power = 1177) 28,60 (cpu_power = 1177)
 64,72 (cpu_power = 1176) 68,76 (cpu_power = 1176)
[5.162115]domain 2: span 0-79 level NODE
[5.171927] groups: 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11773)
   
1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11772)
   
2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78 (cpu_power = 11773)
   
3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11770)

..to so, which looks a little bent.  CPU and MC have identical spans, so
CPU should have gone away, as it used to do.

3.6.0-bisect (virgin)
[3.978338] CPU0 attaching sched-domain:
[3.987125]  domain 0: span 0,32 level SIBLING
[3.997125]   groups: 0 (cpu_power = 588) 32 (cpu_power = 589)
[4.010477]   domain 1: span 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level MC
[4.029748]groups: 0,32 (cpu_power = 1177)  4,36 (cpu_power = 1177)  
8,40 (cpu_power = 1178) 12,44 (cpu_power = 1178)
 16,48 (cpu_power = 1178) 20,52 (cpu_power = 1178) 
24,56 (cpu_power = 1178) 28,60 (cpu_power = 1178)
 64,72 (cpu_power = 1178) 68,76 (cpu_power = 1177)
[4.084143]domain 2: span 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 level CPU
[4.103796] groups: 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)
[4.124373] domain 3: span 0-79 level NUMA
[4.134369]  groups: 
0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76 (cpu_power = 11777)

1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77 (cpu_power = 11778)

2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74 ,78 (cpu_power = 11778)

3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79 (cpu_power = 11780)

-Mike


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [RFC PATCH v2 3/6] sched: pack small tasks

2012-12-14 Thread Mike Galbraith
On Thu, 2012-12-13 at 22:25 +0800, Alex Shi wrote: 
> On 12/13/2012 06:11 PM, Vincent Guittot wrote:
> > On 13 December 2012 03:17, Alex Shi  wrote:
> >> On 12/12/2012 09:31 PM, Vincent Guittot wrote:
> >>> During the creation of sched_domain, we define a pack buddy CPU for each 
> >>> CPU
> >>> when one is available. We want to pack at all levels where a group of CPU 
> >>> can
> >>> be power gated independently from others.
> >>> On a system that can't power gate a group of CPUs independently, the flag 
> >>> is
> >>> set at all sched_domain level and the buddy is set to -1. This is the 
> >>> default
> >>> behavior.
> >>> On a dual clusters / dual cores system which can power gate each core and
> >>> cluster independently, the buddy configuration will be :
> >>>
> >>>   | Cluster 0   | Cluster 1   |
> >>>   | CPU0 | CPU1 | CPU2 | CPU3 |
> >>> ---
> >>> buddy | CPU0 | CPU0 | CPU0 | CPU2 |
> >>>
> >>> Small tasks tend to slip out of the periodic load balance so the best 
> >>> place
> >>> to choose to migrate them is during their wake up. The decision is in 
> >>> O(1) as
> >>> we only check again one buddy CPU
> >>
> >> Just have a little worry about the scalability on a big machine, like on
> >> a 4 sockets NUMA machine * 8 cores * HT machine, the buddy cpu in whole
> >> system need care 64 LCPUs. and in your case cpu0 just care 4 LCPU. That
> >> is different on task distribution decision.
> > 
> > The buddy CPU should probably not be the same for all 64 LCPU it
> > depends on where it's worth packing small tasks
> 
> Do you have further ideas for buddy cpu on such example?
> > 
> > Which kind of sched_domain configuration have you for such system ?
> > and how many sched_domain level have you ?
> 
> it is general X86 domain configuration. with 4 levels,
> sibling/core/cpu/numa.

CPU is a bug that slipped into domain degeneration.  You should have
SIBLING/MC/NUMA (chasing that down is on todo).

-Mike


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Christoffer Dall
On Friday, December 7, 2012, Jon Medhurst (Tixy) wrote:

> To enable CPU hotplug the need to provide some boot code at the reset
> vector and which survives after the kernel has booted without being
> overwritten. We achieve this by the getting the linker script to place
> the code in boot.S at address zero. This now means we can delete the
> code that relocates the secondary CPU pen code to "a location less
> likely to be overridden".
>

not knowing these dev boards's memory maps I assume that the sysflags are
initialized to 0 and that the first part of the physical memory space is
reserved for secure access?

Otherwise I can't see any issues with this code.

Acked-by: Christoffer Dall 


>
> We then modify the boot protocol slightly to allow hot-plugging of any
> CPU, including CPU #0, when the system is already booted. This is done
> by checking if SYS_FLAGS is already set before the normal check for CPU0
> and the boot-or-wait decision made.
>
> This patch is based on work by Nicolas Pitre.
>
> Signed-off-by: Nicolas Pitre >
> Signed-off-by: Jon Medhurst >
> ---
>  boot.S  |   41 -
>  model.lds.S |3 ++-
>  2 files changed, 26 insertions(+), 18 deletions(-)
>
> diff --git a/boot.S b/boot.S
> index dd453e3..fb56693 100644
> --- a/boot.S
> +++ b/boot.S
> @@ -12,6 +12,8 @@
> .arch_extension virt
> .text
>
> +   b   start   @ Must be first instruction for power on reset
> vector
> +
>  .macro enter_hyp
> @ We assume we're entered in Secure Supervisor mode. To
> @ get to Hyp mode we have to pass through Monitor mode
> @@ -119,27 +121,32 @@ start:
> orr r0, r0, r1
> mcr p15, 0, r0, c1, c1, 2
>
> -   @ Check CPU nr again
> -   mrc p15, 0, r0, c0, c0, 5   @ MPIDR (ARMv7 only)
> -   bfc r0, #24, #8 @ CPU number, taking
> multicluster into account
> -   cmp r0, #0  @ primary CPU?
> -   beq 2f
> -
> -   @
> -   @ Secondary CPUs (following the RealView SMP booting protocol)
> -   @
> -   enter_hyp
> -
> -   ldr r1, =fs_start - 0x100
> -   adr r2, 1f
> -   ldmia   r2, {r3 - r7}   @ move the code to a
> location
> -   stmia   r1, {r3 - r7}   @ less likely to be
> overridden
> +   /*
> +* If SYS_FLAGS is already set, this is a warm boot and we blindly
> +* branch to the indicated address right away, irrespective of the
> +* CPU we are.
> +*/
>  #ifdef VEXPRESS
> ldr r0, =0x1c010030 @ VE SYS_FLAGS register
>  #else
> ldr r0, =0x1030 @ RealView SYS_FLAGS
> register
>  #endif
> -   mov pc, r1  @ branch to the relocated
> code
> +   ldr r1, [r0]
> +   cmp r1, #0
> +   beq 1f
> +   enter_hyp
> +   bx  r1
> +1:
> +   /*
> +* Otherwise this is a cold boot.  In this case it depends if
> +* we are the primary CPU or not.  The primary CPU boots the system
> +* while the secondaries wait for the primary to set SYS_FLAGS.
> +*/
> +   mrc p15, 0, r1, c0, c0, 5
> +   bicsr1, #0xff00
> +   beq 2f
> +
> +   enter_hyp
>  1:
>  #ifdef VEXPRESS
> wfe
> @@ -147,7 +154,7 @@ start:
> ldr r1, [r0]
> cmp r1, #0
> beq 1b
> -   mov pc, r1  @ branch to the given
> address
> +   bx  r1  @ branch to the given
> address
>  #endif
>
>  2:
> diff --git a/model.lds.S b/model.lds.S
> index 793df89..f37824e 100644
> --- a/model.lds.S
> +++ b/model.lds.S
> @@ -27,7 +27,8 @@ STACKTOP = 0xff00;
>
>  SECTIONS
>  {
> - . = PHYS_OFFSET;
> + . = 0;
> + .boot : { boot.o }
>
>   . = PHYS_OFFSET + 0x8000 - 0x40;
>
> --
> 1.7.10.4
>
>
>
> ___
> kvmarm mailing list
> kvm...@lists.cs.columbia.edu 
> https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
>
___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[PATCH 0/2][V2] cpuidle - remove power from the select equation

2012-12-14 Thread Daniel Lezcano
The states could be sorted in the power backward order, we get rid of
some lines of code and by this way that fixes also a bug in the dynamic
c-states.

Changelog:

V2:
 * added the optimization in the select menu governor

Daniel Lezcano (2):
  cpuidle - remove the power_specified field in the driver
  cpuidle - optimize the select function for the 'menu' governor

 drivers/cpuidle/cpuidle.c|   17 -
 drivers/cpuidle/driver.c |   25 -
 drivers/cpuidle/governors/menu.c |   20 
 include/linux/cpuidle.h  |2 +-
 4 files changed, 13 insertions(+), 51 deletions(-)

-- 
1.7.5.4


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


[PATCH 1/2][V2] cpuidle - remove the power_specified field in the driver

2012-12-14 Thread Daniel Lezcano
This patch follows the discussion about reinitializing the power usage
when a C-state is added/removed.

 https://lkml.org/lkml/2012/10/16/518

We realized the power usage field is never filled and when it is
filled for tegra, the power_specified flag is not set making all these
values to be resetted when the driver is initialized with the set_power_state
function.

Julius and I feel this is over-engineered and the power_specified
flag could be simply removed and continue assuming the states are
backward sorted.

The menu governor select function is simplified as the power is ordered.
Actually the condition is always true with the current code.

The cpuidle_play_dead function is also simplified by doing a reverse lookup
on the array.

The set_power_states function is removed as it does no make sense anymore.

As a consequence, this patch fix also the bug where on the dynamic C-state
system, the power fields is not initialized.

Signed-off-by: Daniel Lezcano 
Cc: Julius Werner 
---
 drivers/cpuidle/cpuidle.c|   17 -
 drivers/cpuidle/driver.c |   25 -
 drivers/cpuidle/governors/menu.c |8 ++--
 include/linux/cpuidle.h  |2 +-
 4 files changed, 7 insertions(+), 45 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index 8df53dd..e1f6860 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -69,24 +69,15 @@ int cpuidle_play_dead(void)
 {
struct cpuidle_device *dev = __this_cpu_read(cpuidle_devices);
struct cpuidle_driver *drv = cpuidle_get_cpu_driver(dev);
-   int i, dead_state = -1;
-   int power_usage = -1;
+   int i;
 
if (!drv)
return -ENODEV;
 
/* Find lowest-power state that supports long-term idle */
-   for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
-   struct cpuidle_state *s = &drv->states[i];
-
-   if (s->power_usage < power_usage && s->enter_dead) {
-   power_usage = s->power_usage;
-   dead_state = i;
-   }
-   }
-
-   if (dead_state != -1)
-   return drv->states[dead_state].enter_dead(dev, dead_state);
+   for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i--)
+   if (drv->states[i].enter_dead)
+   return drv->states[i].enter_dead(dev, i);
 
return -ENODEV;
 }
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 3af841f..bb045f1 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -19,34 +19,9 @@ DEFINE_SPINLOCK(cpuidle_driver_lock);
 static void __cpuidle_set_cpu_driver(struct cpuidle_driver *drv, int cpu);
 static struct cpuidle_driver * __cpuidle_get_cpu_driver(int cpu);
 
-static void set_power_states(struct cpuidle_driver *drv)
-{
-   int i;
-
-   /*
-* cpuidle driver should set the drv->power_specified bit
-* before registering if the driver provides
-* power_usage numbers.
-*
-* If power_specified is not set,
-* we fill in power_usage with decreasing values as the
-* cpuidle code has an implicit assumption that state Cn
-* uses less power than C(n-1).
-*
-* With CONFIG_ARCH_HAS_CPU_RELAX, C0 is already assigned
-* an power value of -1.  So we use -2, -3, etc, for other
-* c-states.
-*/
-   for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++)
-   drv->states[i].power_usage = -1 - i;
-}
-
 static void __cpuidle_driver_init(struct cpuidle_driver *drv)
 {
drv->refcnt = 0;
-
-   if (!drv->power_specified)
-   set_power_states(drv);
 }
 
 static int __cpuidle_register_driver(struct cpuidle_driver *drv, int cpu)
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index bd40b94..fe343a0 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -312,7 +312,6 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
 {
struct menu_device *data = &__get_cpu_var(menu_devices);
int latency_req = pm_qos_request(PM_QOS_CPU_DMA_LATENCY);
-   int power_usage = -1;
int i;
int multiplier;
struct timespec t;
@@ -383,11 +382,8 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
if (s->exit_latency * multiplier > data->predicted_us)
continue;
 
-   if (s->power_usage < power_usage) {
-   power_usage = s->power_usage;
-   data->last_state_idx = i;
-   data->exit_us = s->exit_latency;
-   }
+   data->last_state_idx = i;
+   data->exit_us = s->exit_latency;
}
 
/* not deepest C-state chosen for low predicted residency */
diff --git a/include/linux

[PATCH 2/2][V2] cpuidle - optimize the select function for the 'menu' governor

2012-12-14 Thread Daniel Lezcano
As the power is backward sorted in the states array and we are looking for
the state consuming the little power as possible, instead of looking from
the beginning of the array, we look from the end. That should save us some
iterations in the loop each time we select a state at idle time.

Signed-off-by: Daniel Lezcano 
---
 drivers/cpuidle/governors/menu.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index fe343a0..05b8998 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -367,24 +367,24 @@ static int menu_select(struct cpuidle_driver *drv, struct 
cpuidle_device *dev)
 * Find the idle state with the lowest power while satisfying
 * our constraints.
 */
-   for (i = CPUIDLE_DRIVER_STATE_START; i < drv->state_count; i++) {
+   for (i = drv->state_count - 1; i >= CPUIDLE_DRIVER_STATE_START; i--) {
struct cpuidle_state *s = &drv->states[i];
struct cpuidle_state_usage *su = &dev->states_usage[i];
 
if (s->disabled || su->disable)
continue;
-   if (s->target_residency > data->predicted_us) {
-   low_predicted = 1;
-   continue;
-   }
if (s->exit_latency > latency_req)
continue;
+   if (s->target_residency > data->predicted_us)
+   continue;
if (s->exit_latency * multiplier > data->predicted_us)
continue;
 
+   low_predicted = i - CPUIDLE_DRIVER_STATE_START;
data->last_state_idx = i;
data->exit_us = s->exit_latency;
-   }
+   break;
+   }
 
/* not deepest C-state chosen for low predicted residency */
if (low_predicted) {
-- 
1.7.5.4


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Jon Medhurst (Tixy)
On Fri, 2012-12-14 at 10:10 -0500, Christoffer Dall wrote:
> On Fri, Dec 14, 2012 at 3:49 AM, Jon Medhurst (Tixy)  wrote:
> > On Thu, 2012-12-13 at 13:52 -0800, Christoffer Dall wrote:
> >> On Friday, December 7, 2012, Jon Medhurst (Tixy) wrote:
> >>
> >> > To enable CPU hotplug the need to provide some boot code at the
> >> reset
> >> > vector and which survives after the kernel has booted without being
> >> > overwritten. We achieve this by the getting the linker script to
> >> place
> >> > the code in boot.S at address zero. This now means we can delete the
> >> > code that relocates the secondary CPU pen code to "a location less
> >> > likely to be overridden".
> >> >
> >>
> >> not knowing these dev boards's memory maps I assume that the sysflags
> >> are
> >> initialized to 0
> >
> > That's what the TRM says and what is observed in practice.
> >
> >>  and that the first part of the physical memory space is
> >> reserved for secure access?
> >
> > Wasn't quite sure what you meant my that, so looked at the RTSM manual
> > for 'security' and spotted that if you enable the 'secure_memory'
> > option, then the NOR flash at address 0 is "Secure RO, aborts on
> > non-secure accesses". So if we need to support that option (do we?) then
> > my patch is broken.
> >
> we probably don't need this for vexpress in any situation that uses
> the boot-wrapper would be my guess, and we can always fix later then.
> 
> What I was trying to figure out was why you can get rid of the "a
> location less likely to be overridden" and why you're sure this memory
> stays unmodified - I thought that was because you were loading the
> code in the beginning of the physical memory space, which wouldn't be
> touched by the linux kernel (i.e. reserved for the secure side), but
> my guess may have been wrong and I was never a whiz with linker
> scripts.

In the RTSM manual, the first 64Mb of physical memory space is marked as
"NOR FLASH0", as it is on real hardware. This is flash memory which is
read-only under normal circumstances, and not anything Linux would go
poking about in, unless you configured an MTD device and used it for a
file system.

DRAM is in the top 2GB and is what Linux uses for memory, and where most
of the bootwrapper still lives. The only bit I've relocated to the
bottom of the address space is everything in boot.S which includes all
the bits we need to survive after Linux boots in order to handle a cpu
startup again after being powered down then up again.

Are you convinced, or do you still have concerns?

I must admit, my patch is based on someone elses work, and I've not seen
documentation which says that NOR flash can be written to simply by the
application being loaded by the models having code located at the
relevant address. I do know, that this code works on both A15 and A15xA7
models to boot Linux and Android, and that the new support it provides
for power down and up cores works.

I'll try and see if I can find out if locating code at address zero is
officially supported...

-- 
Tixy



___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Alexander Spyridakis
On 14 December 2012 16:10, Christoffer Dall
wrote:

> What I was trying to figure out was why you can get rid of the "a
> location less likely to be overridden" and why you're sure this memory
> stays unmodified - I thought that was because you were loading the
> code in the beginning of the physical memory space, which wouldn't be
> touched by the linux kernel (i.e. reserved for the secure side), but
> my guess may have been wrong and I was never a whiz with linker
> scripts.
>

I have to comment on this that the relocation address "ldr r1,
=fs_start - 0x100" doesn't work for the board. Secondary CPUs will never
wake and I had to move the relocation part under the monitor offset instead
(not needed to relocate at all in the end if boot.o is put at the start of
memory, I guess...).
___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Marc Zyngier
On 14/12/12 15:54, Jon Medhurst (Tixy) wrote:
> On Fri, 2012-12-14 at 10:10 -0500, Christoffer Dall wrote:
>> On Fri, Dec 14, 2012 at 3:49 AM, Jon Medhurst (Tixy)  wrote:
>>> On Thu, 2012-12-13 at 13:52 -0800, Christoffer Dall wrote:
 On Friday, December 7, 2012, Jon Medhurst (Tixy) wrote:

> To enable CPU hotplug the need to provide some boot code at the
 reset
> vector and which survives after the kernel has booted without being
> overwritten. We achieve this by the getting the linker script to
 place
> the code in boot.S at address zero. This now means we can delete the
> code that relocates the secondary CPU pen code to "a location less
> likely to be overridden".
>

 not knowing these dev boards's memory maps I assume that the sysflags
 are
 initialized to 0
>>>
>>> That's what the TRM says and what is observed in practice.
>>>
  and that the first part of the physical memory space is
 reserved for secure access?
>>>
>>> Wasn't quite sure what you meant my that, so looked at the RTSM manual
>>> for 'security' and spotted that if you enable the 'secure_memory'
>>> option, then the NOR flash at address 0 is "Secure RO, aborts on
>>> non-secure accesses". So if we need to support that option (do we?) then
>>> my patch is broken.
>>>
>> we probably don't need this for vexpress in any situation that uses
>> the boot-wrapper would be my guess, and we can always fix later then.
>>
>> What I was trying to figure out was why you can get rid of the "a
>> location less likely to be overridden" and why you're sure this memory
>> stays unmodified - I thought that was because you were loading the
>> code in the beginning of the physical memory space, which wouldn't be
>> touched by the linux kernel (i.e. reserved for the secure side), but
>> my guess may have been wrong and I was never a whiz with linker
>> scripts.
> 
> In the RTSM manual, the first 64Mb of physical memory space is marked as
> "NOR FLASH0", as it is on real hardware. This is flash memory which is
> read-only under normal circumstances, and not anything Linux would go
> poking about in, unless you configured an MTD device and used it for a
> file system.
> 
> DRAM is in the top 2GB and is what Linux uses for memory, and where most
> of the bootwrapper still lives. The only bit I've relocated to the
> bottom of the address space is everything in boot.S which includes all
> the bits we need to survive after Linux boots in order to handle a cpu
> startup again after being powered down then up again.
> 
> Are you convinced, or do you still have concerns?
> 
> I must admit, my patch is based on someone elses work, and I've not seen
> documentation which says that NOR flash can be written to simply by the
> application being loaded by the models having code located at the
> relevant address. I do know, that this code works on both A15 and A15xA7
> models to boot Linux and Android, and that the new support it provides
> for power down and up cores works.
> 
> I'll try and see if I can find out if locating code at address zero is
> officially supported...

In the horrible hack that I use to boot both TC2 and RTSM, I use the
static RAM located at 0x2E00. It is only 64kB though.

M.
-- 
Jazz is not dead. It just smells funny...


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Jon Medhurst (Tixy)
On Fri, 2012-12-14 at 16:00 +, Marc Zyngier wrote:
> On 14/12/12 15:54, Jon Medhurst (Tixy) wrote:
> > On Fri, 2012-12-14 at 10:10 -0500, Christoffer Dall wrote:
> >> On Fri, Dec 14, 2012 at 3:49 AM, Jon Medhurst (Tixy)  
> >> wrote:
> >>> On Thu, 2012-12-13 at 13:52 -0800, Christoffer Dall wrote:
>  On Friday, December 7, 2012, Jon Medhurst (Tixy) wrote:
> 
> > To enable CPU hotplug the need to provide some boot code at the
>  reset
> > vector and which survives after the kernel has booted without being
> > overwritten. We achieve this by the getting the linker script to
>  place
> > the code in boot.S at address zero. This now means we can delete the
> > code that relocates the secondary CPU pen code to "a location less
> > likely to be overridden".
> >
> 
>  not knowing these dev boards's memory maps I assume that the sysflags
>  are
>  initialized to 0
> >>>
> >>> That's what the TRM says and what is observed in practice.
> >>>
>   and that the first part of the physical memory space is
>  reserved for secure access?
> >>>
> >>> Wasn't quite sure what you meant my that, so looked at the RTSM manual
> >>> for 'security' and spotted that if you enable the 'secure_memory'
> >>> option, then the NOR flash at address 0 is "Secure RO, aborts on
> >>> non-secure accesses". So if we need to support that option (do we?) then
> >>> my patch is broken.
> >>>
> >> we probably don't need this for vexpress in any situation that uses
> >> the boot-wrapper would be my guess, and we can always fix later then.
> >>
> >> What I was trying to figure out was why you can get rid of the "a
> >> location less likely to be overridden" and why you're sure this memory
> >> stays unmodified - I thought that was because you were loading the
> >> code in the beginning of the physical memory space, which wouldn't be
> >> touched by the linux kernel (i.e. reserved for the secure side), but
> >> my guess may have been wrong and I was never a whiz with linker
> >> scripts.
> > 
> > In the RTSM manual, the first 64Mb of physical memory space is marked as
> > "NOR FLASH0", as it is on real hardware. This is flash memory which is
> > read-only under normal circumstances, and not anything Linux would go
> > poking about in, unless you configured an MTD device and used it for a
> > file system.
> > 
> > DRAM is in the top 2GB and is what Linux uses for memory, and where most
> > of the bootwrapper still lives. The only bit I've relocated to the
> > bottom of the address space is everything in boot.S which includes all
> > the bits we need to survive after Linux boots in order to handle a cpu
> > startup again after being powered down then up again.
> > 
> > Are you convinced, or do you still have concerns?
> > 
> > I must admit, my patch is based on someone elses work, and I've not seen
> > documentation which says that NOR flash can be written to simply by the
> > application being loaded by the models having code located at the
> > relevant address. I do know, that this code works on both A15 and A15xA7
> > models to boot Linux and Android, and that the new support it provides
> > for power down and up cores works.
> > 
> > I'll try and see if I can find out if locating code at address zero is
> > officially supported...
> 
> In the horrible hack that I use to boot both TC2 and RTSM, I use the
> static RAM located at 0x2E00. It is only 64kB though.

We also want to get some code located so it gets executed when the core
is reset (to support cpu hotplug), that's why locating the code to
address zero was used. Or is there some magic 'redirect reset address'
register?

Perhaps we need to create a separate NOR flash image to give RTSM to
support this or reconsider whether it's worth the effort to get RTSM
supporting cpu hotplug at all.

-- 
Tixy


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [kvmarm] [PATCH] bootwrapper: CPU hotplug aware boot protocol

2012-12-14 Thread Nicolas Pitre
On Fri, 14 Dec 2012, Marc Zyngier wrote:

> On 14/12/12 15:54, Jon Medhurst (Tixy) wrote:
> > I must admit, my patch is based on someone elses work, and I've not seen
> > documentation which says that NOR flash can be written to simply by the
> > application being loaded by the models having code located at the
> > relevant address. I do know, that this code works on both A15 and A15xA7
> > models to boot Linux and Android, and that the new support it provides
> > for power down and up cores works.
> > 
> > I'll try and see if I can find out if locating code at address zero is
> > officially supported...
> 
> In the horrible hack that I use to boot both TC2 and RTSM, I use the
> static RAM located at 0x2E00. It is only 64kB though.

That misses the point.

This patch has to hook some code at the address where the reset vector 
is, so that we can get control of a CPU when it is brought back out of 
reset.

So gents, please let stop splitting hairs.  This is required for CPU 
hotplug to work, and by extension the b.L switcher.


Nicolas

___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev