date:20130131

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> On 01/31/2013 02:19 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> Please see below. :)
> >>
> >> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >>>
> >>> Sorry, I still confuse. :(
> >>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> >>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >>>
> >>> node_states is what? node_states[N_NORMAL_MEMOR] or
> >>> node_states[N_MEMORY]?
> >>
> >> Are you asking what node_states[] is ?
> >>
> >> node_states[] is an array of nodemask,
> >>
> >>   extern nodemask_t node_states[NR_NODE_STATES];
> >>
> >> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
> >> normal memory.
> >> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> >> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
> >> ZONE_MOVABLE.
> >>
> >
> > Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> > *ZONE_MOVABLE*, the comment of enum nodes_states said that
> > N_NORMAL_MEMORY just means the node has regular memory.
> >
> 
> Hi Simon,
> 
> Let's say it in this way.
> 
> If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
> don't have a separate
> macro to represent highmem because we don't have highmem.
> This is easy to understand, right ?
> 
> Now, think it just like above:
> If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
> N_NORMAL_MEMORY.
> This means we don't allow a node to have only movable memory, not we 
> don't have movable memory.
> A node could have normal memory and movable memory. So 
> nodes_state[N_NORMAL_MEMORY] represents
> a node have 0 ... *ZONE_MOVABLE*.
> 
> I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
> only movable memory.
> So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
> movable memory. It means
> the node cannot have only movable memory. It can have normal memory and 
> movable memory.
> 
> 1) With CONFIG_MOVABLE_NODE:
> N_NORMAL_MEMORY: nodes who have normal memory.
>  normal memory only
>  normal and highmem
>  normal and highmem and movablemem
>  normal and movablemem
> N_MEMORY: nodes who has memory (any memory)
>  normal memory only
>  normal and highmem
>  normal and highmem and movablemem
>  normal and movablemem  We can have 
> movablemem.
>  highmem only -
>  highmem and movablemem ---
>  movablemem only -- We can have 
> movablemem only.***
> 
> 2) With out CONFIG_MOVABLE_NODE:
> N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
>  normal memory only
>  normal and highmem
>  normal and highmem and movablemem
>  normal and movablemem  We can have 
> movablemem.
>  No movablemem only --- We cannot 
> have movablemem only. ***
> 
> The semantics is not that clear here. So we can only try to understand 
> it from the code where
> we use N_MEMORY. :)
> 
> That is my understanding of this.

Thanks for your clarify, very clear now. :)

> 
> Thanks. :)
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [GIT PULL 00/21] perf/core improvements and fixes

2013-01-31 Thread Ingo Molnar


* Arnaldo Carvalho de Melo  wrote:

> Hi Ingo,
> 
>   Please consider pulling.
> 
>   Namhyung, Jiri, the 'group report' patches are at acme/perf/group,
> will send a pull req later if it survives further testing.
> 
> - Arnaldo
> 
> The following changes since commit a2d28d0c198b65fac28ea6212f5f8edc77b29c27:
> 
>   Merge tag 'perf-core-for-mingo' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2013-01-25 11:34:00 +0100)
> 
> are available in the git repository at:
> 
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux 
> tags/perf-core-for-mingo
> 
> for you to fetch changes up to 5809fde040de2afa477a6c593ce2e8fd2c11d9d3:
> 
>   perf header: Fix double fclose() on do_write(fd, xxx) failure (2013-01-30 
> 10:40:44 -0300)
> 
> 
> perf/core improvements and fixes:
> 
> . Fix some leaks in exit paths.
> 
> . Use memdup where applicable
> 
> . Remove some die() calls, allowing callers to handle exit paths
>   gracefully.
> 
> . Correct typo in tools Makefile, fix from Borislav Petkov.
> 
> . Add 'perf bench numa mem' NUMA performance measurement suite, from Ingo 
> Molnar.
> 
> . Handle dynamic array's element size properly, fix from Jiri Olsa.
> 
> . Fix memory leaks on evsel->counts, from Namhyung Kim.
> 
> . Make numa benchmark optional, allowing the build in machines where required
>   numa libraries are not present, fix from Peter Hurley.
> 
> . Add interval printing in 'perf stat', from Stephane Eranian.
> 
> . Fix compile warnings in tests/attr.c, from Sukadev Bhattiprolu.
> 
> . Fix double free, pclose instead of fclose, leaks and double fclose errors
>   found with the cppcheck tool, from Thomas Jarosch.
> 
> Signed-off-by: Arnaldo Carvalho de Melo 
> 
> 
> Arnaldo Carvalho de Melo (8):
>   perf tools: Stop using 'self' in strlist
>   perf tools: Stop using 'self' in map.[ch]
>   perf tools: Use memdup in map__clone
>   perf kmem: Use memdup()
>   perf header: Stop using die() calls when processing tracing data
>   perf ui browser: Free browser->helpline() on ui_browser__hide()
>   perf tests: Call machine__exit in the vmlinux matches kallsyms test
>   perf tests: Fix leaks on PERF_RECORD_* test
> 
> Borislav Petkov (1):
>   tools: Correct typo in tools Makefile
> 
> Ingo Molnar (1):
>   perf: Add 'perf bench numa mem' NUMA performance measurement suite
> 
> Jiri Olsa (1):
>   tools lib traceevent: Handle dynamic array's element size properly
> 
> Namhyung Kim (1):
>   perf evsel: Fix memory leaks on evsel->counts
> 
> Peter Hurley (1):
>   perf tools: Make numa benchmark optional
> 
> Stephane Eranian (2):
>   perf evsel: Add prev_raw_count field
>   perf stat: Add interval printing
> 
> Sukadev Bhattiprolu (1):
>   perf tools, powerpc: Fix compile warnings in tests/attr.c
> 
> Thomas Jarosch (5):
>   perf tools: Fix possible double free on error
>   perf sort: Use pclose() instead of fclose() on pipe stream
>   perf tools: Fix memory leak on error
>   perf header: Fix memory leak for the "Not caching a kptr_restrict'ed 
> /proc/kallsyms" case
>   perf header: Fix double fclose() on do_write(fd, xxx) failure
> 
>  tools/Makefile   |2 +-
>  tools/lib/traceevent/event-parse.c   |   39 +-
>  tools/perf/Documentation/perf-stat.txt   |4 +
>  tools/perf/Makefile  |   13 +
>  tools/perf/arch/common.c |1 +
>  tools/perf/bench/bench.h |1 +
>  tools/perf/bench/numa.c  | 1731 
> ++
>  tools/perf/builtin-bench.c   |   17 +
>  tools/perf/builtin-kmem.c|6 +-
>  tools/perf/builtin-stat.c|  158 ++-
>  tools/perf/config/feature-tests.mak  |   11 +
>  tools/perf/tests/attr.c  |5 +
>  tools/perf/tests/open-syscall-all-cpus.c |1 +
>  tools/perf/tests/perf-record.c   |   12 +-
>  tools/perf/tests/vmlinux-kallsyms.c  |4 +-
>  tools/perf/ui/browser.c  |2 +
>  tools/perf/util/event.c  |4 +-
>  tools/perf/util/evsel.c  |   31 +
>  tools/perf/util/evsel.h  |2 +
>  tools/perf/util/header.c |   25 +-
>  tools/perf/util/map.c|  118 +-
>  tools/perf/util/map.h|   24 +-
>  tools/perf/util/sort.c   |7 +-
>  tools/perf/util/strlist.c|   54 +-
>  tools/perf/util/strlist.h|   42 +-
>  25 files changed, 2154 insertions(+), 160 deletions(-)
>  create mode 100644 tools/perf/bench/numa.c

Pulled, thanks a lot Arnaldo!

Ingo
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/l

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

1. IIUC, there is a button on machine which supports hot-remove memory,
then what's the difference between press button and echo to /sys?
2. Since kernel memory is linear mapping(I mean direct mapping part),
why can't put kernel direct mapping memory into one memory device, and
other memory into the other devices? As you know x86_64 don't need
highmem, IIUC, all kernel memory will linear mapping in this case. Is my
idea available? If is correct, x86_32 can't implement in the same way
since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
hard to focus kernel memory on single memory device.
3. In current implementation, if memory hotplug just need memory
subsystem and ACPI codes support? Or also needs firmware take part in?
Hope you can explain in details, thanks in advance. :)
4. What's the status of memory hotplug? Apart from can't remove kernel
memory, other things are fully implementation?  


> On 01/31/2013 02:19 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 11:31 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> Please see below. :)
> >>
> >> On 01/31/2013 09:22 AM, Simon Jeons wrote:
> >>>
> >>> Sorry, I still confuse. :(
> >>> update node_states[N_NORMAL_MEMORY] to node_states[N_MEMORY] or
> >>> node_states[N_NORMAL_MEMOR] present 0...ZONE_MOVABLE?
> >>>
> >>> node_states is what? node_states[N_NORMAL_MEMOR] or
> >>> node_states[N_MEMORY]?
> >>
> >> Are you asking what node_states[] is ?
> >>
> >> node_states[] is an array of nodemask,
> >>
> >>   extern nodemask_t node_states[NR_NODE_STATES];
> >>
> >> For example, node_states[N_NORMAL_MEMOR] represents which nodes have
> >> normal memory.
> >> If N_MEMORY == N_HIGH_MEMORY == N_NORMAL_MEMORY, node_states[N_MEMORY] is
> >> node_states[N_NORMAL_MEMOR]. So it represents which nodes have 0 ...
> >> ZONE_MOVABLE.
> >>
> >
> > Sorry, how can nodes_state[N_NORMAL_MEMORY] represents a node have 0 ...
> > *ZONE_MOVABLE*, the comment of enum nodes_states said that
> > N_NORMAL_MEMORY just means the node has regular memory.
> >
> 
> Hi Simon,
> 
> Let's say it in this way.
> 
> If we don't have CONFIG_HIGHMEM, N_HIGH_MEMORY == N_NORMAL_MEMORY. We 
> don't have a separate
> macro to represent highmem because we don't have highmem.
> This is easy to understand, right ?
> 
> Now, think it just like above:
> If we don't have CONFIG_MOVABLE_NODE, N_MEMORY == N_HIGH_MEMORY == 
> N_NORMAL_MEMORY.
> This means we don't allow a node to have only movable memory, not we 
> don't have movable memory.
> A node could have normal memory and movable memory. So 
> nodes_state[N_NORMAL_MEMORY] represents
> a node have 0 ... *ZONE_MOVABLE*.
> 
> I think the point is: CONFIG_MOVABLE_NODE means we allow a node to have 
> only movable memory.
> So without CONFIG_MOVABLE_NODE, it doesn't mean a node cannot have 
> movable memory. It means
> the node cannot have only movable memory. It can have normal memory and 
> movable memory.
> 
> 1) With CONFIG_MOVABLE_NODE:
> N_NORMAL_MEMORY: nodes who have normal memory.
>  normal memory only
>  normal and highmem
>  normal and highmem and movablemem
>  normal and movablemem
> N_MEMORY: nodes who has memory (any memory)
>  normal memory only
>  normal and highmem
>  normal and highmem and movablemem
>  normal and movablemem  We can have 
> movablemem.
>  highmem only -
>  highmem and movablemem ---
>  movablemem only -- We can have 
> movablemem only.***
> 
> 2) With out CONFIG_MOVABLE_NODE:
> N_MEMORY == N_NORMAL_MEMORY: (Here, I omit N_HIGH_MEMORY)
>  normal memory only
>  normal and highmem
>  normal and highmem and movablemem
>  normal and movablemem  We can have 
> movablemem.
>  No movablemem only --- We cannot 
> have movablemem only. ***
> 
> The semantics is not that clear here. So we can only try to understand 
> it from the code where
> we use N_MEMORY. :)
> 
> That is my understanding of this.
> 
> Thanks. :)
> 
> 
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Tang Chen


Hi Simon,

On 01/31/2013 04:48 PM, Simon Jeons wrote:

Hi Tang,
On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:

1. IIUC, there is a button on machine which supports hot-remove memory,
then what's the difference between press button and echo to /sys?


No important difference, I think. Since I don't have the machine you are
saying, I cannot surely answer you. :)
AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
is just another entrance. At last, they will run into the same code.


2. Since kernel memory is linear mapping(I mean direct mapping part),
why can't put kernel direct mapping memory into one memory device, and
other memory into the other devices?


We cannot do that because in that way, we will lose NUMA performance.

If you know NUMA, you will understand the following example:

node0:node1:
   cpu0~cpu15cpu16~cpu31
   memory0~memory511 memory512~memory1023

cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
If we set direct mapping area in node0, and movable area in node1, then
the kernel code running on cpu16~cpu31 will have to access 
memory0~memory511.

This is a terrible performance down.


As you know x86_64 don't need
highmem, IIUC, all kernel memory will linear mapping in this case. Is my
idea available? If is correct, x86_32 can't implement in the same way
since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
hard to focus kernel memory on single memory device.


Sorry, I'm not quite familiar with x86_32 box.


3. In current implementation, if memory hotplug just need memory
subsystem and ACPI codes support? Or also needs firmware take part in?
Hope you can explain in details, thanks in advance. :)


We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
based memory migration mentioned by Liu Jiang.

So far, I only know this. :)


4. What's the status of memory hotplug? Apart from can't remove kernel
memory, other things are fully implementation?


I think the main job is done for now. And there are still bugs to fix.
And this functionality is not stable.

Thanks. :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

Hi Tang,
On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
> Hi Simon,
> 
> On 01/31/2013 04:48 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >
> > 1. IIUC, there is a button on machine which supports hot-remove memory,
> > then what's the difference between press button and echo to /sys?
> 
> No important difference, I think. Since I don't have the machine you are
> saying, I cannot surely answer you. :)
> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
> is just another entrance. At last, they will run into the same code.
> 
> > 2. Since kernel memory is linear mapping(I mean direct mapping part),
> > why can't put kernel direct mapping memory into one memory device, and
> > other memory into the other devices?
> 
> We cannot do that because in that way, we will lose NUMA performance.
> 
> If you know NUMA, you will understand the following example:
> 
> node0:node1:
> cpu0~cpu15cpu16~cpu31
> memory0~memory511 memory512~memory1023
> 
> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
> If we set direct mapping area in node0, and movable area in node1, then
> the kernel code running on cpu16~cpu31 will have to access 
> memory0~memory511.
> This is a terrible performance down.

So if config NUMA, kernel memory will not be linear mapping anymore? For
example, 

Node 0  Node 1 

0 ~ 10G 11G~14G

kernel memory only at Node 0? Can part of kernel memory also at Node 1?

How big is kernel direct mapping memory in x86_64? Is there max limit?
It seems that only around 896MB on x86_32. 

> 
> >As you know x86_64 don't need
> > highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> > idea available? If is correct, x86_32 can't implement in the same way
> > since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> > hard to focus kernel memory on single memory device.
> 
> Sorry, I'm not quite familiar with x86_32 box.
> 
> > 3. In current implementation, if memory hotplug just need memory
> > subsystem and ACPI codes support? Or also needs firmware take part in?
> > Hope you can explain in details, thanks in advance. :)
> 
> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> based memory migration mentioned by Liu Jiang.

Is there any material about firmware based memory migration?

> 
> So far, I only know this. :)
> 
> > 4. What's the status of memory hotplug? Apart from can't remove kernel
> > memory, other things are fully implementation?
> 
> I think the main job is done for now. And there are still bugs to fix.
> And this functionality is not stable.
> 
> Thanks. :)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier

2013-01-31 Thread Alexander Graf


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> VCPU's MMUCFG register initialization should not depend on KVM_CAP_SW_TLB
> ioctl call. Move it earlier into tlb initalization phase.

Quite the contrary. The fact that there is an mfspr() in e500_mmu.c already 
tells us that the code is broken. The TLB guest code should only depend on 
input from the SW_TLB configuration. It's completely orthogonal to the host 
capabilities.


Alex

> 
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/kvm/e500_mmu.c |4 ++--
> 1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 5c44759..bb1b2b0 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -692,8 +692,6 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
>   vcpu_e500->gtlb_offset[0] = 0;
>   vcpu_e500->gtlb_offset[1] = params.tlb_sizes[0];
> 
> - vcpu->arch.mmucfg = mfspr(SPRN_MMUCFG) & ~MMUCFG_LPIDSIZE;
> -
>   vcpu->arch.tlbcfg[0] &= ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
>   if (params.tlb_sizes[0] <= 2048)
>   vcpu->arch.tlbcfg[0] |= params.tlb_sizes[0];
> @@ -781,6 +779,8 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 
> *vcpu_e500)
>   if (!vcpu_e500->g2h_tlb1_map)
>   goto err;
> 
> + vcpu->arch.mmucfg = mfspr(SPRN_MMUCFG) & ~MMUCFG_LPIDSIZE;
> +
>   /* Init TLB configuration register */
>   vcpu->arch.tlbcfg[0] = mfspr(SPRN_TLB0CFG) &
>~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/5] KVM: PPC: e500: Emulate TLBnPS registers

2013-01-31 Thread Alexander Graf


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> Emulate TLBnPS registers which are available in MMU Architecture Version
> (MAV) 2.0.
> 
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kvm/e500.h |5 +
> arch/powerpc/kvm/e500_emulate.c |   10 ++
> arch/powerpc/kvm/e500_mmu.c |5 +
> 4 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 8a72d59..88fcfe6 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -501,6 +501,7 @@ struct kvm_vcpu_arch {
>   spinlock_t wdt_lock;
>   struct timer_list wdt_timer;
>   u32 tlbcfg[4];
> + u32 tlbps[4];
>   u32 mmucfg;
>   u32 epr;
>   struct kvmppc_booke_debug_reg dbg_reg;
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index 41cefd4..b9f76d8 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -303,4 +303,9 @@ static inline unsigned int get_tlbmiss_tid(struct 
> kvm_vcpu *vcpu)
> #define get_tlb_sts(gtlbe)  (MAS1_TS)
> #endif /* !BOOKE_HV */
> 
> +static inline unsigned int has_mmu_v2(const struct kvm_vcpu *vcpu)

bool. Also rename it to "is_..." then.

> +{
> + return ((vcpu->arch.mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2);
> +}
> +
> #endif /* KVM_E500_H */
> diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
> index e78f353..5515dc5 100644
> --- a/arch/powerpc/kvm/e500_emulate.c
> +++ b/arch/powerpc/kvm/e500_emulate.c
> @@ -329,6 +329,16 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
> sprn, ulong *spr_val)
>   *spr_val = vcpu->arch.ivor[BOOKE_IRQPRIO_DBELL_CRIT];
>   break;
> #endif
> + case SPRN_TLB0PS:
> + if (!has_mmu_v2(vcpu))
> + return EMULATE_FAIL;
> + *spr_val = vcpu->arch.tlbps[0];
> + break;
> + case SPRN_TLB1PS:
> + if (!has_mmu_v2(vcpu))
> + return EMULATE_FAIL;
> + *spr_val = vcpu->arch.tlbps[1];
> + break;
>   default:
>   emulated = kvmppc_booke_emulate_mfspr(vcpu, sprn, spr_val);
>   }
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index bb1b2b0..129299a 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -794,6 +794,11 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 
> *vcpu_e500)
>   vcpu->arch.tlbcfg[1] |=
>   vcpu_e500->gtlb_params[1].ways << TLBnCFG_ASSOC_SHIFT;
> 
> + if (has_mmu_v2(vcpu)) {
> + vcpu->arch.tlbps[0] = mfspr(SPRN_TLB0PS);
> + vcpu->arch.tlbps[1] = mfspr(SPRN_TLB1PS);

So I suppose that means that user space doesn't tell us the possible TLB entry 
sizes through the SW_TLB config? Then we should add them there.

To not break untested code paths, we can still compare if the values user space 
asks for are identical to what physical hardware does. But eventually we 
shouldn't care.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 3/5] KVM: PPC: e500: Remove E.PT category from VCPUs

2013-01-31 Thread Alexander Graf


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> Embedded.Page Table (E.PT) category in VMs requires indirect tlb entries
> emulation which is not supported yet. Configure TLBnCFG to remove E.PT
> category from VCPUs.
> 
> Signed-off-by: Mihai Caraman 

Please do this in a separate function that you call from these locations. That 
way the code is self-documenting on what it actually does.

Also add a comment to this one function that removes E.PT related bits from 
TLBCFG that our _guest_ mmu emulation currently doesn't handle E.PT.


Alex

> ---
> arch/powerpc/kvm/e500_mmu.c |   10 ++
> 1 files changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 129299a..9a1f7b7 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -692,12 +692,14 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
>   vcpu_e500->gtlb_offset[0] = 0;
>   vcpu_e500->gtlb_offset[1] = params.tlb_sizes[0];
> 
> - vcpu->arch.tlbcfg[0] &= ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> + vcpu->arch.tlbcfg[0] &=
> +   ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
>   if (params.tlb_sizes[0] <= 2048)
>   vcpu->arch.tlbcfg[0] |= params.tlb_sizes[0];
>   vcpu->arch.tlbcfg[0] |= params.tlb_ways[0] << TLBnCFG_ASSOC_SHIFT;
> 
> - vcpu->arch.tlbcfg[1] &= ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> + vcpu->arch.tlbcfg[1] &=
> +   ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
>   vcpu->arch.tlbcfg[1] |= params.tlb_sizes[1];
>   vcpu->arch.tlbcfg[1] |= params.tlb_ways[1] << TLBnCFG_ASSOC_SHIFT;
> 
> @@ -783,13 +785,13 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 
> *vcpu_e500)
> 
>   /* Init TLB configuration register */
>   vcpu->arch.tlbcfg[0] = mfspr(SPRN_TLB0CFG) &
> -  ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> +  ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
>   vcpu->arch.tlbcfg[0] |= vcpu_e500->gtlb_params[0].entries;
>   vcpu->arch.tlbcfg[0] |=
>   vcpu_e500->gtlb_params[0].ways << TLBnCFG_ASSOC_SHIFT;
> 
>   vcpu->arch.tlbcfg[1] = mfspr(SPRN_TLB1CFG) &
> -  ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC);
> +  ~(TLBnCFG_N_ENTRY | TLBnCFG_ASSOC | TLBnCFG_IND);
>   vcpu->arch.tlbcfg[1] |= vcpu_e500->gtlb_params[1].entries;
>   vcpu->arch.tlbcfg[1] |=
>   vcpu_e500->gtlb_params[1].ways << TLBnCFG_ASSOC_SHIFT;
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register

2013-01-31 Thread Alexander Graf


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> EPTCFG register defined by E.PT is accessed unconditionally by Linux guests
> in the presence of MAV 2.0. Emulate EPTCFG register now.
> 
> Signed-off-by: Mihai Caraman 
> ---
> arch/powerpc/include/asm/kvm_host.h |1 +
> arch/powerpc/kvm/e500.h |6 ++
> arch/powerpc/kvm/e500_emulate.c |9 +
> arch/powerpc/kvm/e500_mmu.c |5 +
> 4 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h 
> b/arch/powerpc/include/asm/kvm_host.h
> index 88fcfe6..f480b20 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -503,6 +503,7 @@ struct kvm_vcpu_arch {
>   u32 tlbcfg[4];
>   u32 tlbps[4];
>   u32 mmucfg;
> + u32 eptcfg;

This too needs to be settable through SW_TLB.

>   u32 epr;
>   struct kvmppc_booke_debug_reg dbg_reg;
> #endif
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index b9f76d8..983eb95 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -308,4 +308,10 @@ static inline unsigned int has_mmu_v2(const struct 
> kvm_vcpu *vcpu)
>   return ((vcpu->arch.mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2);
> }
> 
> +static inline unsigned int supports_page_tables(const struct kvm_vcpu *vcpu)

bool again. Can we generalize this a bit more? How about a small framework that 
allows us to differentiate across e.XX features?

if (has_feature(vcpu, FEATURE_E_PT))
   ...


> +{
> + return ((vcpu->arch.tlbcfg[0] & TLBnCFG_IND)
> + || (vcpu->arch.tlbcfg[1] & TLBnCFG_IND));
> +}
> +
> #endif /* KVM_E500_H */
> diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
> index 5515dc5..493e231 100644
> --- a/arch/powerpc/kvm/e500_emulate.c
> +++ b/arch/powerpc/kvm/e500_emulate.c
> @@ -339,6 +339,15 @@ int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int 
> sprn, ulong *spr_val)
>   return EMULATE_FAIL;
>   *spr_val = vcpu->arch.tlbps[1];
>   break;
> + case SPRN_EPTCFG:
> + if (!has_mmu_v2(vcpu))
> + return EMULATE_FAIL;
> + /*
> +  * Legacy Linux guests access EPTCFG register even if the E.PT
> +  * category is disabled in the VM. Give them a chance to live.
> +  */
> + *spr_val = vcpu->arch.eptcfg;
> + break;
>   default:
>   emulated = kvmppc_booke_emulate_mfspr(vcpu, sprn, spr_val);
>   }
> diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
> index 9a1f7b7..199c11e 100644
> --- a/arch/powerpc/kvm/e500_mmu.c
> +++ b/arch/powerpc/kvm/e500_mmu.c
> @@ -799,6 +799,11 @@ int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 
> *vcpu_e500)
>   if (has_mmu_v2(vcpu)) {
>   vcpu->arch.tlbps[0] = mfspr(SPRN_TLB0PS);
>   vcpu->arch.tlbps[1] = mfspr(SPRN_TLB1PS);
> +
> + if (supports_page_tables(vcpu))
> + vcpu->arch.eptcfg = mfspr(SPRN_EPTCFG);

Please don't introduce new mfspr()s here :). Just have user space set it.


Alex

> + else
> + vcpu->arch.eptcfg = 0;
>   }
> 
>   kvmppc_recalc_tlb1map_range(vcpu_e500);
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 5/5] KVM: PPC: e500mc: Enable e6500 cores

2013-01-31 Thread Alexander Graf


On 30.01.2013, at 14:29, Mihai Caraman wrote:

> Extend processor compatibility names to e6500 cores.
> 
> Signed-off-by: Mihai Caraman 

Looks good to me.

Reviewed-by: Alexander Graf 


Alex

> ---
> arch/powerpc/kvm/e500mc.c |2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500mc.c b/arch/powerpc/kvm/e500mc.c
> index 1f89d26..6c87299 100644
> --- a/arch/powerpc/kvm/e500mc.c
> +++ b/arch/powerpc/kvm/e500mc.c
> @@ -172,6 +172,8 @@ int kvmppc_core_check_processor_compat(void)
>   r = 0;
>   else if (strcmp(cur_cpu_spec->cpu_name, "e5500") == 0)
>   r = 0;
> + else if (strcmp(cur_cpu_spec->cpu_name, "e6500") == 0)
> + r = 0;
>   else
>   r = -ENOTSUPP;
> 
> -- 
> 1.7.4.1
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 2/5] KVM: PPC: e500: Emulate TLBnPS registers

2013-01-31 Thread Alexander Graf


On 31.01.2013, at 14:24, Alexander Graf wrote:

> 
> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> 
>> Emulate TLBnPS registers which are available in MMU Architecture Version
>> (MAV) 2.0.
>> 
>> Signed-off-by: Mihai Caraman 
>> ---
>> arch/powerpc/include/asm/kvm_host.h |1 +
>> arch/powerpc/kvm/e500.h |5 +
>> arch/powerpc/kvm/e500_emulate.c |   10 ++
>> arch/powerpc/kvm/e500_mmu.c |5 +
>> 4 files changed, 21 insertions(+), 0 deletions(-)
>> 
>> diff --git a/arch/powerpc/include/asm/kvm_host.h 
>> b/arch/powerpc/include/asm/kvm_host.h
>> index 8a72d59..88fcfe6 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -501,6 +501,7 @@ struct kvm_vcpu_arch {
>>  spinlock_t wdt_lock;
>>  struct timer_list wdt_timer;
>>  u32 tlbcfg[4];
>> +u32 tlbps[4];
>>  u32 mmucfg;
>>  u32 epr;
>>  struct kvmppc_booke_debug_reg dbg_reg;
>> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
>> index 41cefd4..b9f76d8 100644
>> --- a/arch/powerpc/kvm/e500.h
>> +++ b/arch/powerpc/kvm/e500.h
>> @@ -303,4 +303,9 @@ static inline unsigned int get_tlbmiss_tid(struct 
>> kvm_vcpu *vcpu)
>> #define get_tlb_sts(gtlbe)  (MAS1_TS)
>> #endif /* !BOOKE_HV */
>> 
>> +static inline unsigned int has_mmu_v2(const struct kvm_vcpu *vcpu)
> 
> bool. Also rename it to "is_..." then.

In light of the comment I did in a later patch, this too could be convert to 
feature flags.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

2013-01-31 Thread Toshi Kani

On Thu, 2013-01-31 at 05:24 +, Greg KH wrote:
> On Wed, Jan 30, 2013 at 06:15:12PM -0700, Toshi Kani wrote:
> > > Please make it a "real" pointer, and not a void *, those shouldn't be
> > > used at all if possible.
> > 
> > How about changing the "void *handle" to acpi_dev_node below?   
> > 
> >struct acpi_dev_nodeacpi_node;
> > 
> > Basically, it has the same challenge as struct device, which uses
> > acpi_dev_node as well.  We can add other FW node when needed (just like
> > device also has *of_node).
> 
> That sounds good to me.

Great!  Thanks Greg,
-Toshi


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier

2013-01-31 Thread Caraman Mihai Claudiu-B02008

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, January 31, 2013 3:21 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> initialization earlier
> 
> 
> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> 
> > VCPU's MMUCFG register initialization should not depend on
> KVM_CAP_SW_TLB
> > ioctl call. Move it earlier into tlb initalization phase.
> 
> Quite the contrary. The fact that there is an mfspr() in e500_mmu.c
> already tells us that the code is broken. The TLB guest code should only
> depend on input from the SW_TLB configuration. It's completely orthogonal
> to the host capabilities.

Then we have the same issue for TLBnCFG registers which need to be configured
via SW_TLB ioctl. What is the purpose of guest tlb initalization in e500_mmu.c
if we rely on SW_TLB?

-Mike

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier

2013-01-31 Thread Alexander Graf


On 31.01.2013, at 15:56, Caraman Mihai Claudiu-B02008 wrote:

>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Thursday, January 31, 2013 3:21 PM
>> To: Caraman Mihai Claudiu-B02008
>> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
>> d...@lists.ozlabs.org
>> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
>> initialization earlier
>> 
>> 
>> On 30.01.2013, at 14:29, Mihai Caraman wrote:
>> 
>>> VCPU's MMUCFG register initialization should not depend on
>> KVM_CAP_SW_TLB
>>> ioctl call. Move it earlier into tlb initalization phase.
>> 
>> Quite the contrary. The fact that there is an mfspr() in e500_mmu.c
>> already tells us that the code is broken. The TLB guest code should only
>> depend on input from the SW_TLB configuration. It's completely orthogonal
>> to the host capabilities.
> 
> Then we have the same issue for TLBnCFG registers which need to be configured
> via SW_TLB ioctl. What is the purpose of guest tlb initalization in e500_mmu.c
> if we rely on SW_TLB?

It's to provide a fallback to user space that doesn't implement SW_TLB 
configuration yet.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register

2013-01-31 Thread Caraman Mihai Claudiu-B02008

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, January 31, 2013 3:31 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH 4/5] KVM: PPC: e500: Emulate EPTCFG register
> 
> 
> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> 
> > EPTCFG register defined by E.PT is accessed unconditionally by Linux
> guests
> > in the presence of MAV 2.0. Emulate EPTCFG register now.
> >
> > Signed-off-by: Mihai Caraman 
> > ---
> > arch/powerpc/include/asm/kvm_host.h |1 +
> > arch/powerpc/kvm/e500.h |6 ++
> > arch/powerpc/kvm/e500_emulate.c |9 +
> > arch/powerpc/kvm/e500_mmu.c |5 +
> > 4 files changed, 21 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_host.h
> b/arch/powerpc/include/asm/kvm_host.h
> > index 88fcfe6..f480b20 100644
> > --- a/arch/powerpc/include/asm/kvm_host.h
> > +++ b/arch/powerpc/include/asm/kvm_host.h
> > @@ -503,6 +503,7 @@ struct kvm_vcpu_arch {
> > u32 tlbcfg[4];
> > u32 tlbps[4];
> > u32 mmucfg;
> > +   u32 eptcfg;
> 
> This too needs to be settable through SW_TLB.
> 
> > u32 epr;
> > struct kvmppc_booke_debug_reg dbg_reg;
> > #endif
> > diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> > index b9f76d8..983eb95 100644
> > --- a/arch/powerpc/kvm/e500.h
> > +++ b/arch/powerpc/kvm/e500.h
> > @@ -308,4 +308,10 @@ static inline unsigned int has_mmu_v2(const struct
> kvm_vcpu *vcpu)
> > return ((vcpu->arch.mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2);
> > }
> >
> > +static inline unsigned int supports_page_tables(const struct kvm_vcpu
> *vcpu)
> 
> bool again. Can we generalize this a bit more? How about a small
> framework that allows us to differentiate across e.XX features? 

I thought you will ask for it :)

-Mike

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

RE: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier

2013-01-31 Thread Caraman Mihai Claudiu-B02008

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, January 31, 2013 4:58 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> initialization earlier
> 
> 
> On 31.01.2013, at 15:56, Caraman Mihai Claudiu-B02008 wrote:
> 
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Thursday, January 31, 2013 3:21 PM
> >> To: Caraman Mihai Claudiu-B02008
> >> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
> >> d...@lists.ozlabs.org
> >> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> >> initialization earlier
> >>
> >>
> >> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> >>
> >>> VCPU's MMUCFG register initialization should not depend on
> >> KVM_CAP_SW_TLB
> >>> ioctl call. Move it earlier into tlb initalization phase.
> >>
> >> Quite the contrary. The fact that there is an mfspr() in e500_mmu.c
> >> already tells us that the code is broken. The TLB guest code should
> only
> >> depend on input from the SW_TLB configuration. It's completely
> orthogonal
> >> to the host capabilities.
> >
> > Then we have the same issue for TLBnCFG registers which need to be
> configured
> > via SW_TLB ioctl. What is the purpose of guest tlb initalization in
> e500_mmu.c
> > if we rely on SW_TLB?
> 
> It's to provide a fallback to user space that doesn't implement SW_TLB
> configuration yet.

Do we have such a case now or is it just hypothetical? For the fallback we
need to initialize the MMUCFG register which I intended to say in the commit
message.

> 
> 
> Alex
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register initialization earlier

2013-01-31 Thread Scott Wood

On 01/31/2013 09:26:20 AM, Caraman Mihai Claudiu-B02008 wrote:

> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, January 31, 2013 4:58 PM
> To: Caraman Mihai Claudiu-B02008
> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG register
> initialization earlier
>
>
> On 31.01.2013, at 15:56, Caraman Mihai Claudiu-B02008 wrote:
>
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Thursday, January 31, 2013 3:21 PM
> >> To: Caraman Mihai Claudiu-B02008
> >> Cc: kvm-...@vger.kernel.org; k...@vger.kernel.org; linuxppc-
> >> d...@lists.ozlabs.org
> >> Subject: Re: [PATCH 1/5] KVM: PPC: e500: Move VCPU's MMUCFG  
register

> >> initialization earlier
> >>
> >>
> >> On 30.01.2013, at 14:29, Mihai Caraman wrote:
> >>
> >>> VCPU's MMUCFG register initialization should not depend on
> >> KVM_CAP_SW_TLB
> >>> ioctl call. Move it earlier into tlb initalization phase.
> >>
> >> Quite the contrary. The fact that there is an mfspr() in  
e500_mmu.c
> >> already tells us that the code is broken. The TLB guest code  
should

> only
> >> depend on input from the SW_TLB configuration. It's completely
> orthogonal
> >> to the host capabilities.
> >
> > Then we have the same issue for TLBnCFG registers which need to be
> configured
> > via SW_TLB ioctl. What is the purpose of guest tlb initalization  
in

> e500_mmu.c
> > if we rely on SW_TLB?
>
> It's to provide a fallback to user space that doesn't implement  
SW_TLB

> configuration yet.

Do we have such a case now or is it just hypothetical? For the  
fallback we
need to initialize the MMUCFG register which I intended to say in the  
commit

message.

I don't think we need to support a fallback for e6500, since there's  
nothing to be backwards compatible with.

As for use case, I don't see us ever supporting the guest being a  
different CPU than the host.  Page sizes probably aren't a problem, but  
there are other barriers.

The main reasons that TLBnCFG are settable through SW_TLB are:
1. The guest TLB can be enlarged as a performance hack (like in Topaz,  
though QEMU doesn't currently do this),
2. The legacy default in KVM is based on the e500v1 TLB0 size, which is  
half of what e500v2/e500mc have, and
3. QEMU needs to know the exact geometry of the TLB so that it can  
interpret the shared data properly.

#3 seems like a compelling reason here, to avoid silent weirdness if  
there's a slight mismatch between what QEMU thinks it's modelling and  
what we're actually running on.

-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 24/25] perf/POWER7: Make some POWER7 events available in sysfs

2013-01-31 Thread Arnaldo Carvalho de Melo

From: Sukadev Bhattiprolu 

Make some POWER7-specific perf events available in sysfs.

$ /bin/ls -1 /sys/bus/event_source/devices/cpu/events/
branch-instructions
branch-misses
cache-misses
cache-references
cpu-cycles
instructions
PM_BRU_FIN
PM_BRU_MPRED
PM_CMPLU_STALL
PM_CYC
PM_GCT_NOSLOT_CYC
PM_INST_CMPL
PM_LD_MISS_L1
PM_LD_REF_L1
stalled-cycles-backend
stalled-cycles-frontend

where the 'PM_*' events are POWER specific and the others are the
generic events.

This will enable users to specify these events with their symbolic
names rather than with their raw code.

perf stat -e 'cpu/PM_CYC' ...

Signed-off-by: Sukadev Bhattiprolu 
Cc: Andi Kleen 
Cc: Anton Blanchard 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Robert Richter 
Cc: Stephane Eranian 
Cc: linuxppc-...@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062528.ge13...@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 arch/powerpc/include/asm/perf_event_server.h |  3 +++
 arch/powerpc/perf/power7-pmu.c   | 18 ++
 2 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index b9b6c55..b29fcc6 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -132,3 +132,6 @@ extern ssize_t power_events_sysfs_show(struct device *dev,
 
 #defineGENERIC_EVENT_ATTR(_name, _id)  EVENT_ATTR(_name, _id, _g)
 #defineGENERIC_EVENT_PTR(_id)  EVENT_PTR(_id, _g)
+
+#definePOWER_EVENT_ATTR(_name, _id)EVENT_ATTR(PM_##_name, _id, _p)
+#definePOWER_EVENT_PTR(_id)EVENT_PTR(_id, _p)
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 269bf24..b554879 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -384,6 +384,15 @@ GENERIC_EVENT_ATTR(cache-misses,   LD_MISS_L1);
 GENERIC_EVENT_ATTR(branch-instructions,BRU_FIN);
 GENERIC_EVENT_ATTR(branch-misses,  BRU_MPRED);
 
+POWER_EVENT_ATTR(CYC,  CYC);
+POWER_EVENT_ATTR(GCT_NOSLOT_CYC,   GCT_NOSLOT_CYC);
+POWER_EVENT_ATTR(CMPLU_STALL,  CMPLU_STALL);
+POWER_EVENT_ATTR(INST_CMPL,INST_CMPL);
+POWER_EVENT_ATTR(LD_REF_L1,LD_REF_L1);
+POWER_EVENT_ATTR(LD_MISS_L1,   LD_MISS_L1);
+POWER_EVENT_ATTR(BRU_FIN,  BRU_FIN)
+POWER_EVENT_ATTR(BRU_MPRED,BRU_MPRED);
+
 static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(CYC),
GENERIC_EVENT_PTR(GCT_NOSLOT_CYC),
@@ -393,6 +402,15 @@ static struct attribute *power7_events_attr[] = {
GENERIC_EVENT_PTR(LD_MISS_L1),
GENERIC_EVENT_PTR(BRU_FIN),
GENERIC_EVENT_PTR(BRU_MPRED),
+
+   POWER_EVENT_PTR(CYC),
+   POWER_EVENT_PTR(GCT_NOSLOT_CYC),
+   POWER_EVENT_PTR(CMPLU_STALL),
+   POWER_EVENT_PTR(INST_CMPL),
+   POWER_EVENT_PTR(LD_REF_L1),
+   POWER_EVENT_PTR(LD_MISS_L1),
+   POWER_EVENT_PTR(BRU_FIN),
+   POWER_EVENT_PTR(BRU_MPRED),
NULL
 };
 
-- 
1.8.1.1.361.gec3ae6e

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 25/25] perf: Document the ABI of perf sysfs entries

2013-01-31 Thread Arnaldo Carvalho de Melo

From: Sukadev Bhattiprolu 

This patchset addes two new sets of files to sysfs for POWER architecture.

- perf event config format in /sys/devices/cpu/format/event
- generic and POWER-specific perf events in /sys/devices/cpu/events/

The format of the first file is already documented in:

sysfs-bus-event_source-devices-format

Document the format of the second set of files '/sys/devices/cpu/events/*'
which would also become part of the ABI.

Changelog[v4]:
[Jiri Olsa]: Mention that multiple event= like terms can be specified
in the 'events' file.
[Jiri Olsa]: Remove the documentation for the 'config format' file
as it is already documented in 'Documentation/ABI/testing/'.
[Jiri Olsa]: Move ABI documentation from 'stable/' to 'testing/'

Changelog[v3]:
[Greg KH] Include ABI documentation.

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Cc: Andi Kleen 
Cc: Anton Blanchard 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Robert Richter 
Cc: Stephane Eranian 
Cc: linuxppc-...@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062645.gg13...@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 Documentation/ABI/stable/sysfs-devices-cpu-events  |  0
 .../testing/sysfs-bus-event_source-devices-events  | 62 ++
 2 files changed, 62 insertions(+)
 delete mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-events
 create mode 100644 
Documentation/ABI/testing/sysfs-bus-event_source-devices-events

diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events 
b/Documentation/ABI/stable/sysfs-devices-cpu-events
deleted file mode 100644
index e69de29..000
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-events 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
new file mode 100644
index 000..0adeb52
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-events
@@ -0,0 +1,62 @@
+What:  /sys/devices/cpu/events/
+   /sys/devices/cpu/events/branch-misses
+   /sys/devices/cpu/events/cache-references
+   /sys/devices/cpu/events/cache-misses
+   /sys/devices/cpu/events/stalled-cycles-frontend
+   /sys/devices/cpu/events/branch-instructions
+   /sys/devices/cpu/events/stalled-cycles-backend
+   /sys/devices/cpu/events/instructions
+   /sys/devices/cpu/events/cpu-cycles
+
+Date:  2013/01/08
+
+Contact:   Linux kernel mailing list 
+
+Description:   Generic performance monitoring events
+
+   A collection of performance monitoring events that may be
+   supported by many/most CPUs. These events can be monitored
+   using the 'perf(1)' tool.
+
+   The contents of each file would look like:
+
+   event=0x
+
+   where 'N' is a hex digit and the number '0x' shows the
+   "raw code" for the perf event identified by the file's
+   "basename".
+
+
+What:  /sys/devices/cpu/events/PM_LD_MISS_L1
+   /sys/devices/cpu/events/PM_LD_REF_L1
+   /sys/devices/cpu/events/PM_CYC
+   /sys/devices/cpu/events/PM_BRU_FIN
+   /sys/devices/cpu/events/PM_GCT_NOSLOT_CYC
+   /sys/devices/cpu/events/PM_BRU_MPRED
+   /sys/devices/cpu/events/PM_INST_CMPL
+   /sys/devices/cpu/events/PM_CMPLU_STALL
+
+Date:  2013/01/08
+
+Contact:   Linux kernel mailing list 
+   Linux Powerpc mailing list 
+
+Description:   POWER-systems specific performance monitoring events
+
+   A collection of performance monitoring events that may be
+   supported by the POWER CPU. These events can be monitored
+   using the 'perf(1)' tool.
+
+   These events may not be supported by other CPUs.
+
+   The contents of each file would look like:
+
+   event=0x
+
+   where 'N' is a hex digit and the number '0x' shows the
+   "raw code" for the perf event identified by the file's
+   "basename".
+
+   Further, multiple terms like 'event=0x' can be specified
+   and separated with comma. All available terms are defined in
+   the /sys/bus/event_source/devices//format file.
-- 
1.8.1.1.361.gec3ae6e

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[GIT PULL 00/25] perf/core improvements and fixes

2013-01-31 Thread Arnaldo Carvalho de Melo

Hi Ingo,

Please consider pulling,

- Arnaldo

The following changes since commit 152fefa921535665f95840c08062844ab2f5593e:

  Merge tag 'perf-core-for-mingo' of 
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
(2013-01-31 10:20:14 +0100)

are available in the git repository at:


  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux 
tags/perf-core-for-mingo

for you to fetch changes up to 2ac3634a7e1c8eedc961030c87c5c36ebd5bbf8e:

  perf: Document the ABI of perf sysfs entries (2013-01-31 13:07:51 -0300)


perf/core improvements and fixes:

. Make some POWER7 events available in sysfs, equivalent to
  what was done on x86, from Sukadev Bhattiprolu.

. Add event group view, from Namyung Kim:

  To use it, 'perf record' should group events when recording. And then perf
  report parses the saved group relation from file header and prints them
  together if --group option is provided.  You can use 'perf evlist' command to
  see event group information:

$ perf record -e '{ref-cycles,cycles}' noploop 1
[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 0.385 MB perf.data (~16807 samples) ]

$ perf evlist --group
{ref-cycles,cycles}

  With this example, default perf report will show you each event
  separately like this:

$ perf report
...
# group: {ref-cycles,cycles}
# 
# Samples: 3K of event 'ref-cycles'
# Event count (approx.): 3153797218
#
# Overhead  Command  Shared Object  Symbol
#   ...  .  ..
99.84%  noploop  noploop[.] main
 0.07%  noploop  ld-2.15.so [.] strcmp
 0.03%  noploop  [kernel.kallsyms]  [k] timerqueue_del
 0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
 0.02%  noploop  [kernel.kallsyms]  [k] account_user_time
 0.01%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
 0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe

# Samples: 3K of event 'cycles'
# Event count (approx.): 3722310525
#
# Overhead  Command  Shared Object Symbol
#   ...  .  .
99.76%  noploop  noploop[.] main
 0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
 0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
 0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
 0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
 0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time
 0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe

  In this case the event group information will be shown in the end of
  header area.  So you can use --group option to enable event group view.

$ perf report --group
...
# group: {ref-cycles,cycles}
# 
# Samples: 7K of event 'anon group { ref-cycles, cycles }'
# Event count (approx.): 6876107743
#
# Overhead  Command  Shared Object  Symbol
#   ...  .  ..
99.84%  99.76%  noploop  noploop[.] main
 0.07%   0.00%  noploop  ld-2.15.so [.] strcmp
 0.03%   0.00%  noploop  [kernel.kallsyms]  [k] timerqueue_del
 0.03%   0.03%  noploop  [kernel.kallsyms]  [k] sched_clock_cpu
 0.02%   0.00%  noploop  [kernel.kallsyms]  [k] account_user_time
 0.01%   0.00%  noploop  [kernel.kallsyms]  [k] __alloc_pages_nodemask
 0.00%   0.00%  noploop  [kernel.kallsyms]  [k] native_write_msr_safe
 0.00%   0.11%  noploop  [kernel.kallsyms]  [k] _raw_spin_lock
 0.00%   0.06%  noploop  [kernel.kallsyms]  [k] find_get_page
 0.00%   0.02%  noploop  [kernel.kallsyms]  [k] rcu_check_callbacks
 0.00%   0.02%  noploop  [kernel.kallsyms]  [k] __current_kernel_time

  As you can see the Overhead column now contains both of ref-cycles and
  cycles and header line shows group information also - 'anon group {
  ref-cycles, cycles }'.  The output is sorted by period of group leader
  first.

  If perf.data file doesn't contain group information, this --group
  option does nothing.  So if you want enable event group view by
  default you can set it in ~/.perfconfig file:

$ cat ~/.perfconfig
[report]
group = true

  It can be overridden with command line if you want:

$ perf report --no-group

Signed-off-by: Arnaldo Carvalho de Melo 


Arnaldo Carvalho de Melo (2):
  perf top: Stop using exit()
  perf top: Delete maps on exit

Namhyung Kim (18):
  perf tools: Keep group information
  perf tests: Add group test conditions
  perf header: Add HEADER_GROUP_DES

[PATCH 23/25] perf/POWER7: Make generic event translations available in sysfs

2013-01-31 Thread Arnaldo Carvalho de Melo

From: Sukadev Bhattiprolu 

Make the generic perf events in POWER7 available via sysfs.

$ ls /sys/bus/event_source/devices/cpu/events
branch-instructions
branch-misses
cache-misses
cache-references
cpu-cycles
instructions
stalled-cycles-backend
stalled-cycles-frontend

$ cat /sys/bus/event_source/devices/cpu/events/cache-misses
event=0x400f0

This patch is based on commits that implement this functionality on x86.
Eg:
commit a47473939db20e3961b200eb00acf5fcf084d755
Author: Jiri Olsa 
Date:   Wed Oct 10 14:53:11 2012 +0200

perf/x86: Make hardware event translations available in sysfs

Changelog:[v2]
[Jiri Osla] Drop EVENT_ID() macro since it is only used once.

Signed-off-by: Sukadev Bhattiprolu 
Cc: Andi Kleen 
Cc: Anton Blanchard 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Robert Richter 
Cc: Stephane Eranian 
Cc: linuxppc-...@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062454.gd13...@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 Documentation/ABI/stable/sysfs-devices-cpu-events |  0
 arch/powerpc/include/asm/perf_event_server.h  | 23 +++
 arch/powerpc/perf/core-book3s.c   | 12 
 arch/powerpc/perf/power7-pmu.c| 34 +++
 4 files changed, 69 insertions(+)
 create mode 100644 Documentation/ABI/stable/sysfs-devices-cpu-events

diff --git a/Documentation/ABI/stable/sysfs-devices-cpu-events 
b/Documentation/ABI/stable/sysfs-devices-cpu-events
new file mode 100644
index 000..e69de29
diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 9710be3..b9b6c55 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -11,6 +11,7 @@
 
 #include 
 #include 
+#include 
 
 #define MAX_HWEVENTS   8
 #define MAX_EVENT_ALTERNATIVES 8
@@ -35,6 +36,7 @@ struct power_pmu {
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
int (*limited_pmc_event)(u64 event_id);
u32 flags;
+   const struct attribute_group**attr_groups;
int n_generic;
int *generic_events;
int (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
@@ -109,3 +111,24 @@ extern unsigned long perf_instruction_pointer(struct 
pt_regs *regs);
  * If an event_id is not subject to the constraint expressed by a particular
  * field, then it will have 0 in both the mask and value for that field.
  */
+
+extern ssize_t power_events_sysfs_show(struct device *dev,
+   struct device_attribute *attr, char *page);
+
+/*
+ * EVENT_VAR() is same as PMU_EVENT_VAR with a suffix.
+ *
+ * Having a suffix allows us to have aliases in sysfs - eg: the generic
+ * event 'cpu-cycles' can have two entries in sysfs: 'cpu-cycles' and
+ * 'PM_CYC' where the latter is the name by which the event is known in
+ * POWER CPU specification.
+ */
+#defineEVENT_VAR(_id, _suffix) event_attr_##_id##_suffix
+#defineEVENT_PTR(_id, _suffix) &EVENT_VAR(_id, _suffix)
+
+#defineEVENT_ATTR(_name, _id, _suffix) 
\
+   PMU_EVENT_ATTR(_name, EVENT_VAR(_id, _suffix), PME_PM_##_id,\
+   power_events_sysfs_show)
+
+#defineGENERIC_EVENT_ATTR(_name, _id)  EVENT_ATTR(_name, _id, _g)
+#defineGENERIC_EVENT_PTR(_id)  EVENT_PTR(_id, _g)
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index aa2465e..fa476d5 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -1305,6 +1305,16 @@ static int power_pmu_event_idx(struct perf_event *event)
return event->hw.idx;
 }
 
+ssize_t power_events_sysfs_show(struct device *dev,
+   struct device_attribute *attr, char *page)
+{
+   struct perf_pmu_events_attr *pmu_attr;
+
+   pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr);
+
+   return sprintf(page, "event=0x%02llx\n", pmu_attr->id);
+}
+
 struct pmu power_pmu = {
.pmu_enable = power_pmu_enable,
.pmu_disable= power_pmu_disable,
@@ -1537,6 +1547,8 @@ int __cpuinit register_power_pmu(struct power_pmu *pmu)
pr_info("%s performance monitor hardware support registered\n",
pmu->name);
 
+   power_pmu.attr_groups = ppmu->attr_groups;
+
 #ifdef MSR_HV
/*
 * Use FCHV to ignore kernel events if MSR.HV is set.
diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index eebb36d..269bf24 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -374,6 +374,39 @@ static int 
power7_cache_events[C(MAX)][C(OP_MAX)][C(RESULT_MAX)] = {
},
 };
 
+
+GENERIC_EVEN

[PATCH 21/25] perf/Power7: Use macros to identify perf events

2013-01-31 Thread Arnaldo Carvalho de Melo

From: Sukadev Bhattiprolu 

Define and use macros to identify perf events codes This would make it
easier and more readable when these event codes need to be used in more
than one place.

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Cc: Andi Kleen 
Cc: Anton Blanchard 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Robert Richter 
Cc: Stephane Eranian 
Cc: linuxppc-...@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062353.gb13...@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 arch/powerpc/perf/power7-pmu.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/perf/power7-pmu.c b/arch/powerpc/perf/power7-pmu.c
index 2ee01e3..eebb36d 100644
--- a/arch/powerpc/perf/power7-pmu.c
+++ b/arch/powerpc/perf/power7-pmu.c
@@ -51,6 +51,18 @@
 #define MMCR1_PMCSEL_MSK   0xff
 
 /*
+ * Power7 event codes.
+ */
+#definePME_PM_CYC  0x1e
+#definePME_PM_GCT_NOSLOT_CYC   0x100f8
+#definePME_PM_CMPLU_STALL  0x4000a
+#definePME_PM_INST_CMPL0x2
+#definePME_PM_LD_REF_L10xc880
+#definePME_PM_LD_MISS_L1   0x400f0
+#definePME_PM_BRU_FIN  0x10068
+#definePME_PM_BRU_MPRED0x400f6
+
+/*
  * Layout of constraint bits:
  * 554433221100
  * 3210987654321098765432109876543210987654321098765432109876543210
@@ -307,14 +319,14 @@ static void power7_disable_pmc(unsigned int pmc, unsigned 
long mmcr[])
 }
 
 static int power7_generic_events[] = {
-   [PERF_COUNT_HW_CPU_CYCLES] = 0x1e,
-   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x100f8, /* GCT_NOSLOT_CYC */
-   [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] = 0x4000a,  /* CMPLU_STALL */
-   [PERF_COUNT_HW_INSTRUCTIONS] = 2,
-   [PERF_COUNT_HW_CACHE_REFERENCES] = 0xc880,  /* LD_REF_L1_LSU*/
-   [PERF_COUNT_HW_CACHE_MISSES] = 0x400f0, /* LD_MISS_L1   */
-   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x10068,  /* BRU_FIN  */
-   [PERF_COUNT_HW_BRANCH_MISSES] = 0x400f6,/* BR_MPRED */
+   [PERF_COUNT_HW_CPU_CYCLES] =PME_PM_CYC,
+   [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =   PME_PM_GCT_NOSLOT_CYC,
+   [PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =PME_PM_CMPLU_STALL,
+   [PERF_COUNT_HW_INSTRUCTIONS] =  PME_PM_INST_CMPL,
+   [PERF_COUNT_HW_CACHE_REFERENCES] =  PME_PM_LD_REF_L1,
+   [PERF_COUNT_HW_CACHE_MISSES] =  PME_PM_LD_MISS_L1,
+   [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] =   PME_PM_BRU_FIN,
+   [PERF_COUNT_HW_BRANCH_MISSES] = PME_PM_BRU_MPRED,
 };
 
 #define C(x)   PERF_COUNT_HW_CACHE_##x
-- 
1.8.1.1.361.gec3ae6e

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[PATCH 22/25] perf: Make EVENT_ATTR global

2013-01-31 Thread Arnaldo Carvalho de Melo

From: Sukadev Bhattiprolu 

Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is
available to all architectures.

Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass
in the variable name as a parameter.

Changelog[v2]
- [Jiri Olsa] No need to define PMU_EVENT_PTR()

Signed-off-by: Sukadev Bhattiprolu 
Acked-by: Jiri Olsa 
Cc: Andi Kleen 
Cc: Anton Blanchard 
Cc: Ingo Molnar 
Cc: Jiri Olsa 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Robert Richter 
Cc: Stephane Eranian 
Cc: linuxppc-...@ozlabs.org
Link: http://lkml.kernel.org/r/20130123062422.gc13...@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo 
---
 arch/x86/kernel/cpu/perf_event.c | 13 +++--
 include/linux/perf_event.h   | 11 +++
 2 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 6774c17..c0df5ed2 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1310,11 +1310,6 @@ static struct attribute_group x86_pmu_format_group = {
.attrs = NULL,
 };
 
-struct perf_pmu_events_attr {
-   struct device_attribute attr;
-   u64 id;
-};
-
 /*
  * Remove all undefined events (x86_pmu.event_map(id) == 0)
  * out of events_attr attributes.
@@ -1348,11 +1343,9 @@ static ssize_t events_sysfs_show(struct device *dev, 
struct device_attribute *at
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
 
-#define EVENT_ATTR(_name, _id) \
-static struct perf_pmu_events_attr EVENT_VAR(_id) = {  \
-   .attr = __ATTR(_name, 0444, events_sysfs_show, NULL),   \
-   .id   =  PERF_COUNT_HW_##_id,   \
-};
+#define EVENT_ATTR(_name, _id) \
+   PMU_EVENT_ATTR(_name, EVENT_VAR(_id), PERF_COUNT_HW_##_id,  \
+   events_sysfs_show)
 
 EVENT_ATTR(cpu-cycles, CPU_CYCLES  );
 EVENT_ATTR(instructions,   INSTRUCTIONS);
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 6bfb2faa..42adf01 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -817,6 +817,17 @@ do {   
\
 } while (0)
 
 
+struct perf_pmu_events_attr {
+   struct device_attribute attr;
+   u64 id;
+};
+
+#define PMU_EVENT_ATTR(_name, _var, _id, _show)
\
+static struct perf_pmu_events_attr _var = {\
+   .attr = __ATTR(_name, 0444, _show, NULL),   \
+   .id   =  _id,   \
+};
+
 #define PMU_FORMAT_ATTR(_name, _format)
\
 static ssize_t \
 _name##_show(struct device *dev,   \
-- 
1.8.1.1.361.gec3ae6e

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

2013-01-31 Thread Rafael J. Wysocki

On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > > +/*
> > > + * Hot-plug device information
> > > + */
> > 
> > Again, stop it with the "generic" hotplug term here, and everywhere
> > else.  You are doing a very _specific_ type of hotplug devices, so spell
> > it out.  We've worked hard to hotplug _everything_ in Linux, you are
> > going to confuse a lot of people with this type of terms.
> 
> Agreed.  I will clarify in all places.
> 
> > > +union shp_dev_info {
> > > + struct shp_cpu {
> > > + u32 cpu_id;
> > > + } cpu;
> > 
> > What is this?  Why not point to the system device for the cpu?
> 
> This info is used to on-line a new CPU and create its system/cpu device.
> In other word, a system/cpu device is created as a result of CPU
> hotplug.
> 
> > > + struct shp_memory {
> > > + int node;
> > > + u64 start_addr;
> > > + u64 length;
> > > + } mem;
> > 
> > Same here, why not point to the system device?
> 
> Same as above.
> 
> > > + struct shp_hostbridge {
> > > + } hb;
> > > +
> > > + struct shp_node {
> > > + } node;
> > 
> > What happened here with these?  Empty structures?  Huh?
> 
> They are place holders for now.  PCI bridge hot-plug and node hot-plug
> are still very much work in progress, so I have not integrated them into
> this framework yet.
> 
> > > +};
> > > +
> > > +struct shp_device {
> > > + struct list_headlist;
> > > + struct device   *device;
> > 
> > No, make it a "real" device, embed the device into it.
> 
> This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> online/offline operation in order to maintain the current behavior.  CPU
> online/offline operation only changes the state of CPU, so its
> system/cpu device continues to be present before and after an operation.
> (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> device.)  So, this "*device" needs to be a pointer to reference an
> existing device that is to be on-lined/off-lined.
> 
> > But, again, I'm going to ask why you aren't using the existing cpu /
> > memory / bridge / node devices that we have in the kernel.  Please use
> > them, or give me a _really_ good reason why they will not work.
> 
> We cannot use the existing system devices or ACPI devices here.  During
> hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> device information in a platform-neutral way.  During hot-add, we first
> creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> but platform-neutral modules cannot use them as they are ACPI-specific.

But suppose we're smart and have ACPI scan handlers that will create
"physical" device nodes for those devices during the ACPI namespace scan.
Then, the platform-neutral nodes will be able to bind to those "physical"
nodes.  Moreover, it should be possible to get a hierarchy of device objects
this way that will reflect all of the dependencies we need to take into
account during hot-add and hot-remove operations.  That may not be what we
have today, but I don't see any *fundamental* obstacles preventing us from
using this approach.

This is already done for PCI host bridges and platform devices and I don't
see why we can't do that for the other types of devices too.

The only missing piece I see is a way to handle the "eject" problem, i.e.
when we try do eject a device at the top of a subtree and need to tear down
the entire subtree below it, but if that's going to lead to a system crash,
for example, we want to cancel the eject.  It seems to me that we'll need some
help from the driver core here.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Jianguo Wu

On 2013/1/31 18:38, Simon Jeons wrote:

> Hi Tang,
> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
>> Hi Simon,
>>
>> On 01/31/2013 04:48 PM, Simon Jeons wrote:
>>> Hi Tang,
>>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>>>
>>> 1. IIUC, there is a button on machine which supports hot-remove memory,
>>> then what's the difference between press button and echo to /sys?
>>
>> No important difference, I think. Since I don't have the machine you are
>> saying, I cannot surely answer you. :)
>> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
>> is just another entrance. At last, they will run into the same code.
>>
>>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
>>> why can't put kernel direct mapping memory into one memory device, and
>>> other memory into the other devices?
>>
>> We cannot do that because in that way, we will lose NUMA performance.
>>
>> If you know NUMA, you will understand the following example:
>>
>> node0:node1:
>> cpu0~cpu15cpu16~cpu31
>> memory0~memory511 memory512~memory1023
>>
>> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
>> If we set direct mapping area in node0, and movable area in node1, then
>> the kernel code running on cpu16~cpu31 will have to access 
>> memory0~memory511.
>> This is a terrible performance down.
> 
> So if config NUMA, kernel memory will not be linear mapping anymore? For
> example, 
> 
> Node 0  Node 1 
> 
> 0 ~ 10G 11G~14G
> 
> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> 
> How big is kernel direct mapping memory in x86_64? Is there max limit?


Max kernel direct mapping memory in x86_64 is 64TB.

> It seems that only around 896MB on x86_32. 
> 
>>
>>> As you know x86_64 don't need
>>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
>>> idea available? If is correct, x86_32 can't implement in the same way
>>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
>>> hard to focus kernel memory on single memory device.
>>
>> Sorry, I'm not quite familiar with x86_32 box.
>>
>>> 3. In current implementation, if memory hotplug just need memory
>>> subsystem and ACPI codes support? Or also needs firmware take part in?
>>> Hope you can explain in details, thanks in advance. :)
>>
>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>> based memory migration mentioned by Liu Jiang.
> 
> Is there any material about firmware based memory migration?
> 
>>
>> So far, I only know this. :)
>>
>>> 4. What's the status of memory hotplug? Apart from can't remove kernel
>>> memory, other things are fully implementation?
>>
>> I think the main job is done for now. And there are still bugs to fix.
>> And this functionality is not stable.
>>
>> Thanks. :)
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> 
> .
> 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

2013-01-31 Thread Toshi Kani

On Thu, 2013-01-31 at 21:54 +0100, Rafael J. Wysocki wrote:
> On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> > On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
 :
> > > > +};
> > > > +
> > > > +struct shp_device {
> > > > +   struct list_headlist;
> > > > +   struct device   *device;
> > > 
> > > No, make it a "real" device, embed the device into it.
> > 
> > This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> > online/offline operation in order to maintain the current behavior.  CPU
> > online/offline operation only changes the state of CPU, so its
> > system/cpu device continues to be present before and after an operation.
> > (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> > device.)  So, this "*device" needs to be a pointer to reference an
> > existing device that is to be on-lined/off-lined.
> > 
> > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > memory / bridge / node devices that we have in the kernel.  Please use
> > > them, or give me a _really_ good reason why they will not work.
> > 
> > We cannot use the existing system devices or ACPI devices here.  During
> > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > device information in a platform-neutral way.  During hot-add, we first
> > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > but platform-neutral modules cannot use them as they are ACPI-specific.
> 
> But suppose we're smart and have ACPI scan handlers that will create
> "physical" device nodes for those devices during the ACPI namespace scan.
> Then, the platform-neutral nodes will be able to bind to those "physical"
> nodes.  Moreover, it should be possible to get a hierarchy of device objects
> this way that will reflect all of the dependencies we need to take into
> account during hot-add and hot-remove operations.  That may not be what we
> have today, but I don't see any *fundamental* obstacles preventing us from
> using this approach.

I misstated in my previous email.  system/cpu device is actually created
by ACPI driver during ACPI scan in case of hot-add.  This is done by 
acpi_processor_hotadd_init(), which I consider as a hack but can be
done.  system/memory device is created in add_memory() by the mm module.

> This is already done for PCI host bridges and platform devices and I don't
> see why we can't do that for the other types of devices too.
> 
> The only missing piece I see is a way to handle the "eject" problem, i.e.
> when we try do eject a device at the top of a subtree and need to tear down
> the entire subtree below it, but if that's going to lead to a system crash,
> for example, we want to cancel the eject.  It seems to me that we'll need some
> help from the driver core here.

There are three different approaches suggested for system device
hot-plug:
 A. Proceed within system device bus scan.
 B. Proceed within ACPI bus scan.
 C. Proceed with a sequence (as a mini-boot).

Option A uses system devices as tokens, option B uses acpi devices as
tokens, and option C uses resource tables as tokens, for their handlers.

Here is summary of key questions & answers so far.  I hope this
clarifies why I am suggesting option 3.

1. What are the system devices?
System devices provide system-wide core computing resources, which are
essential to compose a computer system.  System devices are not
connected to any particular standard buses.

2. Why are the system devices special?
The system devices are initialized during early boot-time, by multiple
subsystems, from the boot-up sequence, in pre-defined order.  They
provide low-level services to enable other subsystems to come up.

3. Why can't initialize the system devices from the driver structure at
boot?
The driver structure is initialized at the end of the boot sequence and
requires the low-level services from the system devices initialized
beforehand.

4. Why do we need a new common framework?
Sysfs CPU and memory on-lining/off-lining are performed within the CPU
and memory modules.  They are common code and do not depend on ACPI.
Therefore, a new common framework is necessary to integrate both
on-lining/off-lining operation and hot-plugging operation of system
devices into a single framework.

5. Why can't do everything with ACPI bus scan?
Software dependency among system devices may not be dictated by the ACPI
hierarchy.  For instance, memory should be initialized before CPUs (i.e.
a new cpu may need its local memory), but such ordering cannot be
guaranteed by the ACPI hierarchy.  Also, as described in 4,
online/offline operations are independent from ACPI.  

Thanks,
-Toshi

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/l

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Tang Chen


On 02/01/2013 09:36 AM, Simon Jeons wrote:

On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:


So if config NUMA, kernel memory will not be linear mapping anymore? For
example,

Node 0  Node 1

0 ~ 10G 11G~14G


It has nothing to do with linear mapping, I think.



kernel memory only at Node 0? Can part of kernel memory also at Node 1?


Please refer to find_zone_movable_pfns_for_nodes().
The kernel is not only on node0. It uses all the online nodes evenly. :)



How big is kernel direct mapping memory in x86_64? Is there max limit?



Max kernel direct mapping memory in x86_64 is 64TB.


For example, I have 8G memory, all of them will be direct mapping for
kernel? then userspace memory allocated from where?


I think you misunderstood what Wu tried to say. :)

The kernel mapped that large space, it doesn't mean it is using that 
large space.
The mapping is to make kernel be able to access all the memory, not for 
the kernel
to use only. User space can also use the memory, but each process has 
its own mapping.


For example:

   64TB, what ever 
   xxxTB, what ever

logic address space: |_kernel___|_user_|
   \  \  /  /
\  /\  /
physical address space:  |___\/__\/_|  4GB or 
8GB, what ever

  *

The * part physical is mapped to user space in the process' own 
pagetable.
It is also direct mapped in kernel's pagetable. So the kernel can also 
access it. :)







It seems that only around 896MB on x86_32.



We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
based memory migration mentioned by Liu Jiang.


Is there any material about firmware based memory migration?


No, I don't have any because this is a functionality of machine from HUAWEI.
I think you can ask Liu Jiang or Wu Jianguo to share some with you. :)

Thanks. :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Jianguo Wu

On 2013/2/1 9:36, Simon Jeons wrote:

> On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
>> On 2013/1/31 18:38, Simon Jeons wrote:
>>
>>> Hi Tang,
>>> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
 Hi Simon,

 On 01/31/2013 04:48 PM, Simon Jeons wrote:
> Hi Tang,
> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>
> 1. IIUC, there is a button on machine which supports hot-remove memory,
> then what's the difference between press button and echo to /sys?

 No important difference, I think. Since I don't have the machine you are
 saying, I cannot surely answer you. :)
 AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
 is just another entrance. At last, they will run into the same code.

> 2. Since kernel memory is linear mapping(I mean direct mapping part),
> why can't put kernel direct mapping memory into one memory device, and
> other memory into the other devices?

 We cannot do that because in that way, we will lose NUMA performance.

 If you know NUMA, you will understand the following example:

 node0:node1:
 cpu0~cpu15cpu16~cpu31
 memory0~memory511 memory512~memory1023

 cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
 If we set direct mapping area in node0, and movable area in node1, then
 the kernel code running on cpu16~cpu31 will have to access 
 memory0~memory511.
 This is a terrible performance down.
>>>
>>> So if config NUMA, kernel memory will not be linear mapping anymore? For
>>> example, 
>>>
>>> Node 0  Node 1 
>>>
>>> 0 ~ 10G 11G~14G
>>>
>>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
>>>
>>> How big is kernel direct mapping memory in x86_64? Is there max limit?
>>
>>
>> Max kernel direct mapping memory in x86_64 is 64TB.
> 
> For example, I have 8G memory, all of them will be direct mapping for
> kernel? then userspace memory allocated from where?

Direct mapping memory means you can use __va() and pa(), but not means that them
can be only used by kernel, them can be used by user-space too, as long as them 
are free.

> 
>>
>>> It seems that only around 896MB on x86_32. 
>>>

> As you know x86_64 don't need
> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> idea available? If is correct, x86_32 can't implement in the same way
> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> hard to focus kernel memory on single memory device.

 Sorry, I'm not quite familiar with x86_32 box.

> 3. In current implementation, if memory hotplug just need memory
> subsystem and ACPI codes support? Or also needs firmware take part in?
> Hope you can explain in details, thanks in advance. :)

 We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
 based memory migration mentioned by Liu Jiang.
>>>
>>> Is there any material about firmware based memory migration?
>>>

 So far, I only know this. :)

> 4. What's the status of memory hotplug? Apart from can't remove kernel
> memory, other things are fully implementation?

 I think the main job is done for now. And there are still bugs to fix.
 And this functionality is not stable.

 Thanks. :)
>>>
>>>
>>> --
>>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>>> the body to majord...@kvack.org.  For more info on Linux MM,
>>> see: http://www.linux-mm.org/ .
>>> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> 
> .
> 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

Hi Jianguo,
On Fri, 2013-02-01 at 09:57 +0800, Jianguo Wu wrote:
> On 2013/2/1 9:36, Simon Jeons wrote:
> 
> > On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
> >> On 2013/1/31 18:38, Simon Jeons wrote:
> >>
> >>> Hi Tang,
> >>> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
>  Hi Simon,
> 
>  On 01/31/2013 04:48 PM, Simon Jeons wrote:
> > Hi Tang,
> > On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >
> > 1. IIUC, there is a button on machine which supports hot-remove memory,
> > then what's the difference between press button and echo to /sys?
> 
>  No important difference, I think. Since I don't have the machine you are
>  saying, I cannot surely answer you. :)
>  AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
>  is just another entrance. At last, they will run into the same code.
> 
> > 2. Since kernel memory is linear mapping(I mean direct mapping part),
> > why can't put kernel direct mapping memory into one memory device, and
> > other memory into the other devices?
> 
>  We cannot do that because in that way, we will lose NUMA performance.
> 
>  If you know NUMA, you will understand the following example:
> 
>  node0:node1:
>  cpu0~cpu15cpu16~cpu31
>  memory0~memory511 memory512~memory1023
> 
>  cpu16~cpu31 access memory16~memory1023 much faster than 
>  memory0~memory511.
>  If we set direct mapping area in node0, and movable area in node1, then
>  the kernel code running on cpu16~cpu31 will have to access 
>  memory0~memory511.
>  This is a terrible performance down.
> >>>
> >>> So if config NUMA, kernel memory will not be linear mapping anymore? For
> >>> example, 
> >>>
> >>> Node 0  Node 1 
> >>>
> >>> 0 ~ 10G 11G~14G
> >>>
> >>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> >>>
> >>> How big is kernel direct mapping memory in x86_64? Is there max limit?
> >>
> >>
> >> Max kernel direct mapping memory in x86_64 is 64TB.
> > 
> > For example, I have 8G memory, all of them will be direct mapping for
> > kernel? then userspace memory allocated from where?
> 
> Direct mapping memory means you can use __va() and pa(), but not means that 
> them
> can be only used by kernel, them can be used by user-space too, as long as 
> them are free.

IIUC, the benefit of va() and pa() is just for quick get
virtual/physical address, it takes advantage of linear mapping. But mmu
still need to go through pgd/pud/pmd/pte, correct?

> 
> > 
> >>
> >>> It seems that only around 896MB on x86_32. 
> >>>
> 
> > As you know x86_64 don't need
> > highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> > idea available? If is correct, x86_32 can't implement in the same way
> > since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> > hard to focus kernel memory on single memory device.
> 
>  Sorry, I'm not quite familiar with x86_32 box.
> 
> > 3. In current implementation, if memory hotplug just need memory
> > subsystem and ACPI codes support? Or also needs firmware take part in?
> > Hope you can explain in details, thanks in advance. :)
> 
>  We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>  based memory migration mentioned by Liu Jiang.
> >>>
> >>> Is there any material about firmware based memory migration?
> >>>
> 
>  So far, I only know this. :)
> 
> > 4. What's the status of memory hotplug? Apart from can't remove kernel
> > memory, other things are fully implementation?
> 
>  I think the main job is done for now. And there are still bugs to fix.
>  And this functionality is not stable.
> 
>  Thanks. :)
> >>>
> >>>
> >>> --
> >>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> >>> the body to majord...@kvack.org.  For more info on Linux MM,
> >>> see: http://www.linux-mm.org/ .
> >>> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> >>>
> >>> .
> >>>
> >>
> >>
> >>
> > 
> > 
> > 
> > .
> > 
> 
> 
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Jianguo Wu

On 2013/2/1 10:06, Simon Jeons wrote:

> Hi Jianguo,
> On Fri, 2013-02-01 at 09:57 +0800, Jianguo Wu wrote:
>> On 2013/2/1 9:36, Simon Jeons wrote:
>>
>>> On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
 On 2013/1/31 18:38, Simon Jeons wrote:

> Hi Tang,
> On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
>> Hi Simon,
>>
>> On 01/31/2013 04:48 PM, Simon Jeons wrote:
>>> Hi Tang,
>>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
>>>
>>> 1. IIUC, there is a button on machine which supports hot-remove memory,
>>> then what's the difference between press button and echo to /sys?
>>
>> No important difference, I think. Since I don't have the machine you are
>> saying, I cannot surely answer you. :)
>> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
>> is just another entrance. At last, they will run into the same code.
>>
>>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
>>> why can't put kernel direct mapping memory into one memory device, and
>>> other memory into the other devices?
>>
>> We cannot do that because in that way, we will lose NUMA performance.
>>
>> If you know NUMA, you will understand the following example:
>>
>> node0:node1:
>> cpu0~cpu15cpu16~cpu31
>> memory0~memory511 memory512~memory1023
>>
>> cpu16~cpu31 access memory16~memory1023 much faster than 
>> memory0~memory511.
>> If we set direct mapping area in node0, and movable area in node1, then
>> the kernel code running on cpu16~cpu31 will have to access 
>> memory0~memory511.
>> This is a terrible performance down.
>
> So if config NUMA, kernel memory will not be linear mapping anymore? For
> example, 
>
> Node 0  Node 1 
>
> 0 ~ 10G 11G~14G
>
> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
>
> How big is kernel direct mapping memory in x86_64? Is there max limit?


 Max kernel direct mapping memory in x86_64 is 64TB.
>>>
>>> For example, I have 8G memory, all of them will be direct mapping for
>>> kernel? then userspace memory allocated from where?
>>
>> Direct mapping memory means you can use __va() and pa(), but not means that 
>> them
>> can be only used by kernel, them can be used by user-space too, as long as 
>> them are free.
> 
> IIUC, the benefit of va() and pa() is just for quick get
> virtual/physical address, it takes advantage of linear mapping. But mmu
> still need to go through pgd/pud/pmd/pte, correct?

Yes.

> 

>>
>>>

> It seems that only around 896MB on x86_32. 
>
>>
>>> As you know x86_64 don't need
>>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
>>> idea available? If is correct, x86_32 can't implement in the same way
>>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
>>> hard to focus kernel memory on single memory device.
>>
>> Sorry, I'm not quite familiar with x86_32 box.
>>
>>> 3. In current implementation, if memory hotplug just need memory
>>> subsystem and ACPI codes support? Or also needs firmware take part in?
>>> Hope you can explain in details, thanks in advance. :)
>>
>> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>> based memory migration mentioned by Liu Jiang.
>
> Is there any material about firmware based memory migration?
>
>>
>> So far, I only know this. :)
>>
>>> 4. What's the status of memory hotplug? Apart from can't remove kernel
>>> memory, other things are fully implementation?
>>
>> I think the main job is done for now. And there are still bugs to fix.
>> And this functionality is not stable.
>>
>> Thanks. :)
>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
>
> .
>



>>>
>>>
>>>
>>> .
>>>
>>
>>
>>
> 
> 
> 
> .
> 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Tang Chen


Hi Simon,

On 02/01/2013 10:17 AM, Simon Jeons wrote:

For example:

 64TB, what ever
 xxxTB, what ever
logic address space: |_kernel___|_user_|
 \  \  /  /
  \  /\  /
physical address space:  |___\/__\/_|  4GB or
8GB, what ever
*


How much address space user process can have on x86_64? Also 8GB?


Usually, we don't say that.

8GB is your physical memory, right ?
But kernel space and user space is the logic conception in OS. They are 
in logic

address space.

So both the kernel space and the user space can use all the physical memory.
But if the page is already in use by either of them, the other one 
cannot use it.
For example, some pages are direct mapped to kernel, and is in use by 
kernel, the

user space cannot map it.





The * part physical is mapped to user space in the process' own
pagetable.
It is also direct mapped in kernel's pagetable. So the kernel can also
access it. :)


But how to protect user process not modify kernel memory?


This is the job of CPU. On intel cpus, user space code is running in 
level 3, and
kernel space code is running in level 0. So the code in level 3 cannot 
access the data

segment in level 0.

Thanks. :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

Hi Tang,
On Fri, 2013-02-01 at 09:57 +0800, Tang Chen wrote:
> On 02/01/2013 09:36 AM, Simon Jeons wrote:
> > On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
> >>>
> >>> So if config NUMA, kernel memory will not be linear mapping anymore? For
> >>> example,
> >>>
> >>> Node 0  Node 1
> >>>
> >>> 0 ~ 10G 11G~14G
> 
> It has nothing to do with linear mapping, I think.
> 
> >>>
> >>> kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> 
> Please refer to find_zone_movable_pfns_for_nodes().

I see, thanks. :)

> The kernel is not only on node0. It uses all the online nodes evenly. :)
> 
> >>>
> >>> How big is kernel direct mapping memory in x86_64? Is there max limit?
> >>
> >>
> >> Max kernel direct mapping memory in x86_64 is 64TB.
> >
> > For example, I have 8G memory, all of them will be direct mapping for
> > kernel? then userspace memory allocated from where?
> 
> I think you misunderstood what Wu tried to say. :)
> 
> The kernel mapped that large space, it doesn't mean it is using that 
> large space.
> The mapping is to make kernel be able to access all the memory, not for 
> the kernel
> to use only. User space can also use the memory, but each process has 
> its own mapping.
> 
> For example:
> 
> 64TB, what ever 
> xxxTB, what ever
> logic address space: |_kernel___|_user_|
> \  \  /  /
>  \  /\  /
> physical address space:  |___\/__\/_|  4GB or 
> 8GB, what ever
>*

How much address space user process can have on x86_64? Also 8GB?

> 
> The * part physical is mapped to user space in the process' own 
> pagetable.
> It is also direct mapped in kernel's pagetable. So the kernel can also 
> access it. :)

But how to protect user process not modify kernel memory?

> 
> >
> >>
> >>> It seems that only around 896MB on x86_32.
> >>>
> 
>  We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
>  based memory migration mentioned by Liu Jiang.
> >>>
> >>> Is there any material about firmware based memory migration?
> 
> No, I don't have any because this is a functionality of machine from HUAWEI.
> I think you can ask Liu Jiang or Wu Jianguo to share some with you. :)
> 
> Thanks. :)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] powerpc: kernel/kgdb.c: fix memory leakage

2013-01-31 Thread Jason Wessel

On 01/14/2013 11:26 AM, Cong Ding wrote:
> the variable backup_current_thread_info isn't freed before existing the
> function.
> 
> Signed-off-by: Cong Ding 
> ---
>  arch/powerpc/kernel/kgdb.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 8747447..5ca82cd 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -154,12 +154,12 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs)
>  static int kgdb_singlestep(struct pt_regs *regs)
>  {
>   struct thread_info *thread_info, *exception_thread_info;
> - struct thread_info *backup_current_thread_info = \
> - (struct thread_info *)kmalloc(sizeof(struct thread_info), 
> GFP_KERNEL);
> + struct thread_info *backup_current_thread_info;



Woh...  This is definitely wrong.  You have found a problem for sure,
but this is not the right way to fix it.

It is not a good idea to kmalloc while single stepping because you can
hang the kernel if you single step any operation in kmalloc().

I am in the process of going through all the kgdb mails from the last
few months while I had been away from the project, so I didn't catch
this one and I see it has upstream commit (fefd9e6f8).  I'll submit
another patch to fix this the right way and use a static variable.
This is ok to use a static variable here because this is not something
we can recursively call at a single CPU level.

If Ben prefers we not burn the memory unless kgdb is active we can
kmalloc / kfree the space we need at the time that kgdb is
initialized.  Else we can go with this patch you see below.  We'll see
what Ben desires.

-
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index a7bc752..bb12c8b 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -151,15 +151,16 @@ static int kgdb_handle_breakpoint(struct pt_regs *regs)
return 1;
 }
 
+static struct thread_info kgdb_backup_thread_info[NR_CPUS];
+
 static int kgdb_singlestep(struct pt_regs *regs)
 {
struct thread_info *thread_info, *exception_thread_info;
-   struct thread_info *backup_current_thread_info;
+   int cpu = raw_smp_processor_id();
 
if (user_mode(regs))
return 0;
 
-   backup_current_thread_info = (struct thread_info 
*)kmalloc(sizeof(struct thread_info), GFP_KERNEL);
/*
 * On Book E and perhaps other processors, singlestep is handled on
 * the critical exception stack.  This causes current_thread_info()
@@ -175,7 +176,7 @@ static int kgdb_singlestep(struct pt_regs *regs)
 
if (thread_info != exception_thread_info) {
/* Save the original current_thread_info. */
-   memcpy(backup_current_thread_info, exception_thread_info, 
sizeof *thread_info);
+   memcpy(&kgdb_backup_thread_info[cpu], exception_thread_info, 
sizeof *thread_info);
memcpy(exception_thread_info, thread_info, sizeof *thread_info);
}
 
@@ -183,9 +184,8 @@ static int kgdb_singlestep(struct pt_regs *regs)
 
if (thread_info != exception_thread_info)
/* Restore current_thread_info lastly. */
-   memcpy(exception_thread_info, backup_current_thread_info, 
sizeof *thread_info);
+   memcpy(exception_thread_info, &kgdb_backup_thread_info[cpu], 
sizeof *thread_info);
 
-   kfree(backup_current_thread_info);
return 1;
 }
 

-


Thanks,
Jason.


>  
>   if (user_mode(regs))
>   return 0;
>  
> + backup_current_thread_info = (struct thread_info 
> *)kmalloc(sizeof(struct thread_info), GFP_KERNEL);
>   /*
>* On Book E and perhaps other processors, singlestep is handled on
>* the critical exception stack.  This causes current_thread_info()
> @@ -185,6 +185,7 @@ static int kgdb_singlestep(struct pt_regs *regs)
>   /* Restore current_thread_info lastly. */
>   memcpy(exception_thread_info, backup_current_thread_info, 
> sizeof *thread_info);
>  
> + kfree(backup_current_thread_info);
>   return 1;
>  }
>  
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

Hi Tang,
On Fri, 2013-02-01 at 10:42 +0800, Tang Chen wrote:

I confuse!

> Hi Simon,
> 
> On 02/01/2013 10:17 AM, Simon Jeons wrote:
> >> For example:
> >>
> >>  64TB, what ever
> >>  xxxTB, what ever
> >> logic address space: 
> >> |_kernel___|_user_|
> >>  \  \  /  /
> >>   \  /\  /
> >> physical address space:  |___\/__\/_|  4GB or
> >> 8GB, what ever
> >> *
> >
> > How much address space user process can have on x86_64? Also 8GB?
> 
> Usually, we don't say that.
> 
> 8GB is your physical memory, right ?
> But kernel space and user space is the logic conception in OS. They are 
> in logic
> address space.
> 
> So both the kernel space and the user space can use all the physical memory.
> But if the page is already in use by either of them, the other one 
> cannot use it.
> For example, some pages are direct mapped to kernel, and is in use by 
> kernel, the
> user space cannot map it.

How can distinguish map and use? I mean how can confirm memory is used
by kernel instead of map? 

> 
> >
> >>
> >> The * part physical is mapped to user space in the process' own
> >> pagetable.
> >> It is also direct mapped in kernel's pagetable. So the kernel can also
> >> access it. :)
> >
> > But how to protect user process not modify kernel memory?
> 
> This is the job of CPU. On intel cpus, user space code is running in 
> level 3, and
> kernel space code is running in level 0. So the code in level 3 cannot 
> access the data
> segment in level 0.

1) If user process and kenel map to same physical memory, user process
will get SIGSEGV during #PF if access to this memory, but If user proces
s will map to the same memory which kernel map? Why? It can't access it.
2) If two user processes map to same physical memory, what will happen
if one process access the memory?

> 
> Thanks. :)


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Simon Jeons

On Fri, 2013-02-01 at 09:32 +0800, Jianguo Wu wrote:
> On 2013/1/31 18:38, Simon Jeons wrote:
> 
> > Hi Tang,
> > On Thu, 2013-01-31 at 17:44 +0800, Tang Chen wrote:
> >> Hi Simon,
> >>
> >> On 01/31/2013 04:48 PM, Simon Jeons wrote:
> >>> Hi Tang,
> >>> On Thu, 2013-01-31 at 15:10 +0800, Tang Chen wrote:
> >>>
> >>> 1. IIUC, there is a button on machine which supports hot-remove memory,
> >>> then what's the difference between press button and echo to /sys?
> >>
> >> No important difference, I think. Since I don't have the machine you are
> >> saying, I cannot surely answer you. :)
> >> AFAIK, pressing the button means trigger the hotplug from hardware, sysfs
> >> is just another entrance. At last, they will run into the same code.
> >>
> >>> 2. Since kernel memory is linear mapping(I mean direct mapping part),
> >>> why can't put kernel direct mapping memory into one memory device, and
> >>> other memory into the other devices?
> >>
> >> We cannot do that because in that way, we will lose NUMA performance.
> >>
> >> If you know NUMA, you will understand the following example:
> >>
> >> node0:node1:
> >> cpu0~cpu15cpu16~cpu31
> >> memory0~memory511 memory512~memory1023
> >>
> >> cpu16~cpu31 access memory16~memory1023 much faster than memory0~memory511.
> >> If we set direct mapping area in node0, and movable area in node1, then
> >> the kernel code running on cpu16~cpu31 will have to access 
> >> memory0~memory511.
> >> This is a terrible performance down.
> > 
> > So if config NUMA, kernel memory will not be linear mapping anymore? For
> > example, 
> > 
> > Node 0  Node 1 
> > 
> > 0 ~ 10G 11G~14G
> > 
> > kernel memory only at Node 0? Can part of kernel memory also at Node 1?
> > 
> > How big is kernel direct mapping memory in x86_64? Is there max limit?
> 
> 
> Max kernel direct mapping memory in x86_64 is 64TB.

For example, I have 8G memory, all of them will be direct mapping for
kernel? then userspace memory allocated from where?

> 
> > It seems that only around 896MB on x86_32. 
> > 
> >>
> >>> As you know x86_64 don't need
> >>> highmem, IIUC, all kernel memory will linear mapping in this case. Is my
> >>> idea available? If is correct, x86_32 can't implement in the same way
> >>> since highmem(kmap/kmap_atomic/vmalloc) can map any address, so it's
> >>> hard to focus kernel memory on single memory device.
> >>
> >> Sorry, I'm not quite familiar with x86_32 box.
> >>
> >>> 3. In current implementation, if memory hotplug just need memory
> >>> subsystem and ACPI codes support? Or also needs firmware take part in?
> >>> Hope you can explain in details, thanks in advance. :)
> >>
> >> We need firmware take part in, such as SRAT in ACPI BIOS, or the firmware
> >> based memory migration mentioned by Liu Jiang.
> > 
> > Is there any material about firmware based memory migration?
> > 
> >>
> >> So far, I only know this. :)
> >>
> >>> 4. What's the status of memory hotplug? Apart from can't remove kernel
> >>> memory, other things are fully implementation?
> >>
> >> I think the main job is done for now. And there are still bugs to fix.
> >> And this functionality is not stable.
> >>
> >> Thanks. :)
> > 
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majord...@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
> > 
> > .
> > 
> 
> 
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v6 00/15] memory-hotplug: hot-remove physical memory

2013-01-31 Thread Tang Chen


Hi Simon,

On 02/01/2013 11:06 AM, Simon Jeons wrote:


How can distinguish map and use? I mean how can confirm memory is used
by kernel instead of map?


If the page is free, for example, it is in the buddy system, it is not 
in use.
Even if it is direct mapped by kernel, the kernel logic should not to 
access it
because you didn't allocate it. This is the kernel's logic. Of course 
the hardware

and the user will not know this.

You want to access some memory, you should first have a logic address, 
right?

So how can you get a logic address ?  You call alloc api.

For example, when you are coding, of course you write:

p = alloc_xxx();  allocate memory, now, it is in use, alloc_xxx() 
makes kernel know it.

*p = ..   use the memory

You won't write:
p = 0x8745;   if so, kernel doesn't know it is in use
*p = ..   wrong...

right ?

The kernel mapped a page, it doesn't mean it is using the page. You 
should allocate it.

That is just the kernel's allocating logic.

Well, I think I can only give you this answer now. If you want something 
deeper, I think

you need to read how the kernel manage the physical pages. :)



1) If user process and kenel map to same physical memory, user process
will get SIGSEGV during #PF if access to this memory, but If user proces
s will map to the same memory which kernel map? Why? It can't access it.


When you call malloc() to allocate memory in user space, the OS logic will
assure that you won't map a page that has already been used by kernel.

A page is mapped by kernel, but not used by kernel (not allocated, like 
above),

malloc() could allocate it, and map it to user space. This is the situation
you are talking about, right ?

Now it is mapped by kernel and user, but it is only allocated by user. 
So the kernel
will not use it. When the kernel wants some memory, it will allocate 
some other memory.
This is just the kernel logic. This is what memory management subsystem 
does.


I think I cannot answer more because I'm also a student in memory 
management.

This is just my understanding. And I hope it is helpful. :)


2) If two user processes map to same physical memory, what will happen
if one process access the memory?


Obviously you don't need to worry about this situation. We can swap the page
used by process 1 out, and process 2 can use the same page. When process 
1 wants
to access it again, we swap it in. This only happens when the physical 
memory

is not enough to use. :)

And also, if you are using shared memory in user space, like

shmget(), shmat()..

it is the shared memory, both processes can use it at the same time.

Thanks. :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

2013-01-31 Thread Greg KH

On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > memory / bridge / node devices that we have in the kernel.  Please use
> > > them, or give me a _really_ good reason why they will not work.
> > 
> > We cannot use the existing system devices or ACPI devices here.  During
> > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > device information in a platform-neutral way.  During hot-add, we first
> > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > but platform-neutral modules cannot use them as they are ACPI-specific.
> 
> But suppose we're smart and have ACPI scan handlers that will create
> "physical" device nodes for those devices during the ACPI namespace scan.
> Then, the platform-neutral nodes will be able to bind to those "physical"
> nodes.  Moreover, it should be possible to get a hierarchy of device objects
> this way that will reflect all of the dependencies we need to take into
> account during hot-add and hot-remove operations.  That may not be what we
> have today, but I don't see any *fundamental* obstacles preventing us from
> using this approach.

I would _much_ rather see that be the solution here as I think it is the
proper one.

> This is already done for PCI host bridges and platform devices and I don't
> see why we can't do that for the other types of devices too.

I agree.

> The only missing piece I see is a way to handle the "eject" problem, i.e.
> when we try do eject a device at the top of a subtree and need to tear down
> the entire subtree below it, but if that's going to lead to a system crash,
> for example, we want to cancel the eject.  It seems to me that we'll need some
> help from the driver core here.

I say do what we always have done here, if the user asked us to tear
something down, let it happen as they are the ones that know best :)

Seriously, I guess this gets back to the "fail disconnect" idea that the
ACPI developers keep harping on.  I thought we already resolved this
properly by having them implement it in their bus code, no reason the
same thing couldn't happen here, right?  I don't think the core needs to
do anything special, but if so, I'll be glad to review it.

thanks,

gre k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework

2013-01-31 Thread Greg KH

On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
 > This is already done for PCI host bridges and platform devices and I don't
> > see why we can't do that for the other types of devices too.
> > 
> > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > when we try do eject a device at the top of a subtree and need to tear down
> > the entire subtree below it, but if that's going to lead to a system crash,
> > for example, we want to cancel the eject.  It seems to me that we'll need 
> > some
> > help from the driver core here.
> 
> There are three different approaches suggested for system device
> hot-plug:
>  A. Proceed within system device bus scan.
>  B. Proceed within ACPI bus scan.
>  C. Proceed with a sequence (as a mini-boot).
> 
> Option A uses system devices as tokens, option B uses acpi devices as
> tokens, and option C uses resource tables as tokens, for their handlers.
> 
> Here is summary of key questions & answers so far.  I hope this
> clarifies why I am suggesting option 3.
> 
> 1. What are the system devices?
> System devices provide system-wide core computing resources, which are
> essential to compose a computer system.  System devices are not
> connected to any particular standard buses.

Not a problem, lots of devices are not connected to any "particular
standard busses".  All this means is that system devices are connected
to the "system" bus, nothing more.

> 2. Why are the system devices special?
> The system devices are initialized during early boot-time, by multiple
> subsystems, from the boot-up sequence, in pre-defined order.  They
> provide low-level services to enable other subsystems to come up.

Sorry, no, that doesn't mean they are special, nothing here is unique
for the point of view of the driver model from any other device or bus.

> 3. Why can't initialize the system devices from the driver structure at
> boot?
> The driver structure is initialized at the end of the boot sequence and
> requires the low-level services from the system devices initialized
> beforehand.

Wait, what "driver structure"?  If you need to initialize the driver
core earlier, then do so.  Or, even better, just wait until enough of
the system has come up and then go initialize all of the devices you
have found so far as part of your boot process.

None of the above things you have stated seem to have anything to do
with your proposed patch, so I don't understand why you have mentioned
them...

> 4. Why do we need a new common framework?
> Sysfs CPU and memory on-lining/off-lining are performed within the CPU
> and memory modules.  They are common code and do not depend on ACPI.
> Therefore, a new common framework is necessary to integrate both
> on-lining/off-lining operation and hot-plugging operation of system
> devices into a single framework.

{sigh}

Removing and adding devices and handling hotplug operations is what the
driver core was written for, almost 10 years ago.  To somehow think that
your devices are "special" just because they don't use ACPI is odd,
because the driver core itself has nothing to do with ACPI.  Don't get
the current mix of x86 system code tied into ACPI confused with an
driver core issues here please.

> 5. Why can't do everything with ACPI bus scan?
> Software dependency among system devices may not be dictated by the ACPI
> hierarchy.  For instance, memory should be initialized before CPUs (i.e.
> a new cpu may need its local memory), but such ordering cannot be
> guaranteed by the ACPI hierarchy.  Also, as described in 4,
> online/offline operations are independent from ACPI.  

That's fine, the driver core is independant from ACPI.  I don't care how
you do the scaning of your devices, but I do care about you creating new
driver core pieces that duplicate the existing functionality of what we
have today.

In short, I like Rafael's proposal better, and I fail to see how
anything you have stated here would matter in how this is implemented. :)

thanks,

greg k-h
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

38 matches

Mail list logo