Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical

2008-08-31 Thread Benjamin Herrenschmidt
On Sat, 2008-08-30 at 11:24 -0500, Scott Wood wrote:
> On Fri, Aug 29, 2008 at 08:42:01AM +1000, Benjamin Herrenschmidt wrote:
> > For the non-SMP case, I think it should be possible to optimize it. The
> > only thing that can happen at interrupt time is hashing of kernel or
> > vmalloc/ioremap pages, which shouldn't compete with set_pte on those
> > pages, so there would be no access races there, but I may be missing
> > something as it's the morning and I about just woke up :-)
> 
> Is that still true with preemptible kernels?

Those shouldn't be an issue as long as we can't preempt while holding a
spinlock and we do hold the pte lock on any modification... Of course,
-rt is a different matter.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-31 Thread Benjamin Herrenschmidt
O> > It would be useful of somebody interested in getting things things
> > > into glibc did the necessary FSF copyright assignment stuff and worked
> > > toward integrating them.
> >
> > Ben makes a very good point!
> 
> Sounds reasonable... but I am still wondering about what you mean 
> with "things"?

Typo. I meant "these things", that is, variants of various libc
functions optimized for a given processor type.

> AFAICS there is almost nothing there (besides the memcpy() routine from 
> Gunnar 
> von Boehn, which is apparently still far from optimal). And I was asking for 
> someone to correct me here ;-)

No idea, as we said, it's mostly up to users of the processors (or to a
certain extent, manufacturers, hint hint hint) to do that work.

> > There is also a framework for adding and maintaining optimizations of
> > this type:
> > 
> > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
> 
> I had already stumbled across this one, but it seems to focus on G3 or newer 
> processors (power4). There is no optimal memcpy() for G2/PPC603/e300.

It focuses on what the people doing it have access to, are paid to work
on, or other material constraints. It's up to others from the community
to fill the gaps.

> >[...]
> > So it does no good to complain here. If you have core you want to
> > contribute, Get your FSF CR assignment and join #glibc on freenode IRC.
> 
> I am not complaining. I was only wondering if it is just me or there really 
> is 
> very little that has been done (for either uClibc, glibc, or whatever for 
> powerpc) to improve performance of (linux-) applications on "lower"-power 
> platforms (G2 core), AFAICS there is a LOT that can be gained by simple 
> tweaks.

Well, possibly, then you are welcome to work on those tweaks and if they
indeed improve things, submit patches to glibc :-) I'm sure Steve and
Ryan will be happy to help with the submission process.

> > And we will help you.
> 
> Thanks, now that I know which is the "correct" way to contribute, I only need 
> to come up with a good set of optimization, worthy of inclusion in glibc.

You don't have to do it all at once. A  simple tweak of one function
such as memcpy, if it's measurably improving performances without
notable regressions could be a first step, and then tweak after tweak...

It's a common mistake to try to do too much "out of tree" and then
struggle and give up when it's time to merge that stuff because there
are too many areas that won't necessarily be acceptable "as is".

One little bit at a time is generally a better approach.

> OTOH, maybe it is easier and simpler to start with a collection of functions 
> in a shared-library, that may be suited for preloading via LD_PRELOAD 
> or /etc/ld_preload...
> 
> Maybe once this collection is more stable (in terms of that heavy tweaking 
> has 
> stopped) one could try the pilgrimage towards glibc inclusion

I believe that's the wrong approach as it leads to never-merged out-of
tree code.

> The problem is: I have very little experience with powerpc assembly and only 
> very limited time to dedicate to this and I am looking for others who have 

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?

2008-08-31 Thread Zhou Rui
Hi, all:
My problem seems basically solved.
We we used to call vmalloc() in the memory management part of our
source, but it seems to be the key unreliable point resulting in the
problem. vmalloc() always assigns some virtual addresses whose
corresponding physical addresses are out of memory size (there is only
32MB DRAM in our 405 board). Once instructions try to access these
illegal physical address, machine check happens
Afterwards, we call kmalloc() instead and it works basically as what
we want. But problems of the memory management still exist because
therea are program check exception sometimes and page always:

-bash-3.2# PROGRAM: reason: 0x800, nip: 0xc028bf20
Oops: Exception in kernel mode, sig: 4 [#1]
NIP: C028BF20 LR: C028BF20 CTR: C31C6078
REGS: c028be80 TRAP: 0700   Not tainted  (2.6.19.2-eldk-xm.1.0)
MSR: 00029030   CR:   XER: 
TASK = c0228a30[0] 'swapper' THREAD: c028a000
GPR00:  C028BF30 C0228A30 C034B7B0 C028BF20  0001
 
GPR08: 0003 C31D 2282 00029030 2BDD9FE1 C03B3164 066F
2B1F1DC8 
GPR16: C03B3050 0FFEA478 1001 C31D C028BEF0 C31CA2E4 00021030
C028A000 
GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030
C03B3050 
NIP [C028BF20] init_thread_union+0x1f20/0x2000
LR [C028BF20] init_thread_union+0x1f20/0x2000
Call Trace:
[C028BF30] [0FFEA478] 0xffea478 (unreliable)
Instruction dump:
        
        
Kernel panic - not syncing: Attempted to kill the idle task!
 <0>Rebooting in 180 seconds..

And there is bad page:
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Backtrace:
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Bad page state in process 'loader.xm'
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Bad page state in process 'loader.xm'
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: Trying to fix it up, but a reboot is needed
Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
405 kernel: page:c02f0e60 flags:0x0400 mapping: mapcount:0
count:1

I will do some traces for fixing those problems.

And could anyone like to give some explanation between vmalloc() and
kmalloc()? Based on our work, there seems to be great difference.

Thank you very much!

Best Wishes

Zhou Rui
2008-08-31

在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> Hi,
> I think maybe you have known this project named XtratuM
> (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> implementation on PPC440 has been basically finished
> (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2)
>  and I know there was discussion about it in this mail list before. XtratuM 
> is an ADEOS based nano kernel. It aims for realtime and is designed to 
> provide virtual timer, virtual interrupt and memory space sperations for 
> domains. Each domain is loaded by a userspace program (instead of the root 
> domain as a kernel module) and the loader will load the domain's (ELF 
> staticly excutable) PT_LOAD section into memory, and then raise a properly 
> system call (passing the structurized loaded data as arguments) to load the 
> domain via load_domain_sys() of XtratuM, and at the last step of loading the 
> domain, xtratum will jump to the entry code of the new domain(asm wrappered 
> start() routine) and then everything should be fine. 0x10a0 is the entry 
> point of the test domain, and that is why I need to start execution from it.
> 
> I think I can say something of my analysis so far for the cause of my
> problem. Thanks for the mention of memory size. Once the kernel module
> of XtratuM is loaded, the symbols of it are placed to virtual addresses
> like 0xc3xx. Because in normal state, address translation is enabled
> (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> the domain, because the entry point 0x10a0 is not in TLB and it
> should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
> called. The exception clears MSR[IR, DR], so address translation is
> disabled and physical address should be used at this moment. If we want
> something at the virtual address of 0xc3xx, we must access the
> physical addresses like 0x03xx. Nevertheless, the limitation of 32MB
> memory makes the valid physical address range from 0x0 to 0x1ff.
> Therefore, during the exception handling, the addresses out of range
> should not be accessed, but the instructions cannot know the memory
> limitation in advance and tries to do something in addresses such as
> 0x03072da0 based on the address translation mechanism, which leads to
> machine check.
> I haved tried to append "mem=32M" to kernel command line but no help. I
> think it is bec

[PATCH] prevent powerpc from invoking irq handlers on offline CPUs

2008-08-31 Thread Paul E. McKenney
Make powerpc refrain from clearing a given to-be-offlined CPU's bit in the
cpu_online_mask until it has processed pending irqs.  This change
prevents other CPUs from being blindsided by an apparently offline CPU
nevertheless changing globally visible state.

Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
---

 smp.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5337ca7..1fedd7d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -250,11 +250,11 @@ int generic_cpu_disable(void)
if (cpu == boot_cpuid)
return -EBUSY;
 
-   cpu_clear(cpu, cpu_online_map);
 #ifdef CONFIG_PPC64
vdso_data->processorCount--;
fixup_irqs(cpu_online_map);
 #endif
+   cpu_clear(cpu, cpu_online_map);
return 0;
 }
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ibm_newemac: MAL[12]_IER_EVENTS definition: 2x *_OTE -> *_DE

2008-08-31 Thread Benjamin Herrenschmidt
On Sat, 2008-08-30 at 22:48 +0200, roel kluin wrote:
> MAL[12]_IER_EVENTS definitions have MAL_IER_OTE twice
> but lack MAL_IER_DE
> 
> Signed-off-by: Roel Kluin <[EMAIL PROTECTED]>

Thanks.

> ---
>  drivers/net/ibm_newemac/mal.h |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ibm_newemac/mal.h b/drivers/net/ibm_newemac/mal.h
> index eaa7262..717dc38 100644
> --- a/drivers/net/ibm_newemac/mal.h
> +++ b/drivers/net/ibm_newemac/mal.h
> @@ -102,7 +102,7 @@
>  /* MAL V1 IER bits */
>  #define   MAL1_IER_NWE   0x0008
>  #define   MAL1_IER_SOC_EVENTSMAL1_IER_NWE
> -#define   MAL1_IER_EVENTS(MAL1_IER_SOC_EVENTS | MAL_IER_OTE | \
> +#define   MAL1_IER_EVENTS(MAL1_IER_SOC_EVENTS | MAL_IER_DE | \
>MAL_IER_OTE | MAL_IER_OE | MAL_IER_PE)
>  
>  /* MAL V2 IER bits */
> @@ -110,7 +110,7 @@
>  #define   MAL2_IER_PRE   0x0040
>  #define   MAL2_IER_PWE   0x0020
>  #define   MAL2_IER_SOC_EVENTS(MAL2_IER_PT | MAL2_IER_PRE | 
> MAL2_IER_PWE)
> -#define   MAL2_IER_EVENTS(MAL2_IER_SOC_EVENTS | MAL_IER_OTE | \
> +#define   MAL2_IER_EVENTS(MAL2_IER_SOC_EVENTS | MAL_IER_DE | \
>MAL_IER_OTE | MAL_IER_OE | MAL_IER_PE)
>  
> 

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] prevent powerpc from invoking irq handlers on offline CPUs

2008-08-31 Thread Benjamin Herrenschmidt
On Sun, 2008-08-31 at 10:31 -0700, Paul E. McKenney wrote:
> Make powerpc refrain from clearing a given to-be-offlined CPU's bit in the
> cpu_online_mask until it has processed pending irqs.  This change
> prevents other CPUs from being blindsided by an apparently offline CPU
> nevertheless changing globally visible state.
> 
> Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
> ---

Sounds reasonable... the only possible worry here is if somebody tries
an IPI ... The IPI code will and the target CPU mask with the online
map, so it may try to send to the to-be-offlined CPU and timeout, no ?

Cheers,
Ben.

>  smp.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 5337ca7..1fedd7d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -250,11 +250,11 @@ int generic_cpu_disable(void)
>   if (cpu == boot_cpuid)
>   return -EBUSY;
>  
> - cpu_clear(cpu, cpu_online_map);
>  #ifdef CONFIG_PPC64
>   vdso_data->processorCount--;
>   fixup_irqs(cpu_online_map);
>  #endif
> + cpu_clear(cpu, cpu_online_map);
>   return 0;
>  }
>  

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] ppc4xx_pci: necessary fixes for 4GB RAM size

2008-08-31 Thread Benjamin Herrenschmidt
On Thu, 2008-08-28 at 09:28 -0400, Josh Boyer wrote:
> On Fri, 22 Aug 2008 11:43:35 +0400
> Ilya Yanok <[EMAIL PROTECTED]> wrote:
> 
> > 1. total_memory should be phys_addr_t not unsigned long
> > 2. is_power_of_2() works with u32 so I just inlined (size & (size-1)) != 0
> > instead.
> > Also this patch fixes default initialization: res->end should be 0x7fff
> > not 0x8000.
> > 
> > Signed-off-by: Ilya Yanok <[EMAIL PROTECTED]>
> 
> Ben, any comments here?  Looks right to me.

Just one minor comment... The patch should do what I failed to do
before, which is to move total_memory declaration to a header :-)

Cheers,
Ben.

> josh
> 
> > ---
> >  arch/powerpc/sysdev/ppc4xx_pci.c |   11 ++-
> >  1 files changed, 6 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/powerpc/sysdev/ppc4xx_pci.c 
> > b/arch/powerpc/sysdev/ppc4xx_pci.c
> > index e1c7df9..645b2c9 100644
> > --- a/arch/powerpc/sysdev/ppc4xx_pci.c
> > +++ b/arch/powerpc/sysdev/ppc4xx_pci.c
> > @@ -36,7 +36,7 @@
> >  static int dma_offset_set;
> > 
> >  /* Move that to a useable header */
> > -extern unsigned long total_memory;
> > +extern phys_addr_t total_memory;
> > 
> >  #define U64_TO_U32_LOW(val)((u32)((val) & 0xULL))
> >  #define U64_TO_U32_HIGH(val)   ((u32)((val) >> 32))
> > @@ -105,7 +105,8 @@ static int __init ppc4xx_parse_dma_ranges(struct 
> > pci_controller *hose,
> > 
> > /* Default */
> > res->start = 0;
> > -   res->end = size = 0x8000;
> > +   size = 0x8000;
> > +   res->end = size - 1;
> > res->flags = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> > 
> > /* Get dma-ranges property */
> > @@ -167,13 +168,13 @@ static int __init ppc4xx_parse_dma_ranges(struct 
> > pci_controller *hose,
> >  */
> > if (size < total_memory) {
> > printk(KERN_ERR "%s: dma-ranges too small "
> > -  "(size=%llx total_memory=%lx)\n",
> > -  hose->dn->full_name, size, total_memory);
> > +  "(size=%llx total_memory=%llx)\n",
> > +  hose->dn->full_name, size, (u64)total_memory);
> > return -ENXIO;
> > }
> > 
> > /* Check we are a power of 2 size and that base is a multiple of size*/
> > -   if (!is_power_of_2(size) ||
> > +   if ((size & (size - 1)) != 0  ||
> > (res->start & (size - 1)) != 0) {
> > printk(KERN_ERR "%s: dma-ranges unaligned\n",
> >hose->dn->full_name);

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


[PATCH] powerpc: use sys_pause for 32bit pause entry point

2008-08-31 Thread Christoph Hellwig
sys32_pause is a useless copy of the generic sys_pause.


Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Index: linux-2.6/arch/powerpc/include/asm/systbl.h
===
--- linux-2.6.orig/arch/powerpc/include/asm/systbl.h2008-08-22 
12:47:08.0 -0300
+++ linux-2.6/arch/powerpc/include/asm/systbl.h 2008-08-22 12:47:35.0 
-0300
@@ -32,7 +32,7 @@ COMPAT_SYS_SPU(stime)
 COMPAT_SYS(ptrace)
 SYSCALL_SPU(alarm)
 OLDSYS(fstat)
-COMPAT_SYS(pause)
+SYSCALL(pause)
 COMPAT_SYS(utime)
 SYSCALL(ni_syscall)
 SYSCALL(ni_syscall)
Index: linux-2.6/arch/powerpc/kernel/sys_ppc32.c
===
--- linux-2.6.orig/arch/powerpc/kernel/sys_ppc32.c  2008-08-22 
12:47:41.0 -0300
+++ linux-2.6/arch/powerpc/kernel/sys_ppc32.c   2008-08-22 12:47:51.0 
-0300
@@ -107,14 +107,6 @@ asmlinkage long compat_sys_sysfs(u32 opt
return sys_sysfs((int)option, arg1, arg2);
 }
 
-asmlinkage long compat_sys_pause(void)
-{
-   current->state = TASK_INTERRUPTIBLE;
-   schedule();
-   
-   return -ERESTARTNOHAND;
-}
-
 static inline long get_ts32(struct timespec *o, struct compat_timeval __user 
*i)
 {
long usec;
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] prevent powerpc from invoking irq handlers on offline CPUs

2008-08-31 Thread Paul E. McKenney
On Mon, Sep 01, 2008 at 10:34:44AM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2008-08-31 at 10:31 -0700, Paul E. McKenney wrote:
> > Make powerpc refrain from clearing a given to-be-offlined CPU's bit in the
> > cpu_online_mask until it has processed pending irqs.  This change
> > prevents other CPUs from being blindsided by an apparently offline CPU
> > nevertheless changing globally visible state.
> > 
> > Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
> > ---
> 
> Sounds reasonable... the only possible worry here is if somebody tries
> an IPI ... The IPI code will and the target CPU mask with the online
> map, so it may try to send to the to-be-offlined CPU and timeout, no ?

OK.  Do we need separate IPI and online masks?

Thanx, Paul

> Cheers,
> Ben.
> 
> >  smp.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 5337ca7..1fedd7d 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -250,11 +250,11 @@ int generic_cpu_disable(void)
> > if (cpu == boot_cpuid)
> > return -EBUSY;
> >  
> > -   cpu_clear(cpu, cpu_online_map);
> >  #ifdef CONFIG_PPC64
> > vdso_data->processorCount--;
> > fixup_irqs(cpu_online_map);
> >  #endif
> > +   cpu_clear(cpu, cpu_online_map);
> > return 0;
> >  }
> >  
> 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] powerpc: use sys_pause for 32bit pause entry point

2008-08-31 Thread Stephen Rothwell
On Mon, 1 Sep 2008 03:23:30 +0200 Christoph Hellwig <[EMAIL PROTECTED]> wrote:
>
> sys32_pause is a useless copy of the generic sys_pause.
> 
> 
> Signed-off-by: Christoph Hellwig <[EMAIL PROTECTED]>

Acked-by: Stephen Rothwell <[EMAIL PROTECTED]>

-- 
Cheers,
Stephen Rothwell[EMAIL PROTECTED]
http://www.canb.auug.org.au/~sfr/


pgpgDbDE2wmJ4.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] prevent powerpc from invoking irq handlers on offline CPUs

2008-08-31 Thread Benjamin Herrenschmidt
On Sun, 2008-08-31 at 19:06 -0700, Paul E. McKenney wrote:
> On Mon, Sep 01, 2008 at 10:34:44AM +1000, Benjamin Herrenschmidt wrote:
> > On Sun, 2008-08-31 at 10:31 -0700, Paul E. McKenney wrote:
> > > Make powerpc refrain from clearing a given to-be-offlined CPU's bit in the
> > > cpu_online_mask until it has processed pending irqs.  This change
> > > prevents other CPUs from being blindsided by an apparently offline CPU
> > > nevertheless changing globally visible state.
> > > 
> > > Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
> > > ---
> > 
> > Sounds reasonable... the only possible worry here is if somebody tries
> > an IPI ... The IPI code will and the target CPU mask with the online
> > map, so it may try to send to the to-be-offlined CPU and timeout, no ?
> 
> OK.  Do we need separate IPI and online masks?

Shouldn't we already have routed all interrupts to other CPUs anyway ?

IE. The affinity of all interrupts should have been updated. So the
only thing we're going to get here are possibly IPIs and decrementer, 
I don't see it being a big deal making sure we test we are online when
receiving it.

Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


interrupting gpios on mpc5200

2008-08-31 Thread Jon Smirl
How do I use an interrupting gpio with this new gpiolib support for
the mpc5200?  gpio_to_irq() doesn't appear to be implemented.  If you
can outline for me how it is supposed to be hooked up I can work on
it.

-- 
Jon Smirl
[EMAIL PROTECTED]
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH] prevent powerpc from invoking irq handlers on offline CPUs

2008-08-31 Thread Paul E. McKenney
On Mon, Sep 01, 2008 at 01:14:40PM +1000, Benjamin Herrenschmidt wrote:
> On Sun, 2008-08-31 at 19:06 -0700, Paul E. McKenney wrote:
> > On Mon, Sep 01, 2008 at 10:34:44AM +1000, Benjamin Herrenschmidt wrote:
> > > On Sun, 2008-08-31 at 10:31 -0700, Paul E. McKenney wrote:
> > > > Make powerpc refrain from clearing a given to-be-offlined CPU's bit in 
> > > > the
> > > > cpu_online_mask until it has processed pending irqs.  This change
> > > > prevents other CPUs from being blindsided by an apparently offline CPU
> > > > nevertheless changing globally visible state.
> > > > 
> > > > Signed-off-by: Paul E. McKenney <[EMAIL PROTECTED]>
> > > > ---
> > > 
> > > Sounds reasonable... the only possible worry here is if somebody tries
> > > an IPI ... The IPI code will and the target CPU mask with the online
> > > map, so it may try to send to the to-be-offlined CPU and timeout, no ?
> > 
> > OK.  Do we need separate IPI and online masks?
> 
> Shouldn't we already have routed all interrupts to other CPUs anyway ?
> 
> IE. The affinity of all interrupts should have been updated. So the
> only thing we're going to get here are possibly IPIs and decrementer, 
> I don't see it being a big deal making sure we test we are online when
> receiving it.

It did look to me that the CPU removed itself from the interrupt queue
before re-enabling interrupts, so makes sense to me...

Thanx, Paul
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical

2008-08-31 Thread Benjamin Herrenschmidt

> +#ifdef CONFIG_PTE_64BIT
> +#define PTE_FLAGS_OFFSET 4   /* offset of PTE flags, in bytes */
> +#define LNX_PTE_SIZE 8   /* size of a linux PTE, in bytes */
> +#else
> +#define PTE_FLAGS_OFFSET 0
> +#define LNX_PTE_SIZE 4
> +#endif

s/LNX_PTE_SIZE/PTE_BYTES or PTE_SIZE, no need for that horrible LNX
prefix. In fact, if it's only used by the asm, then ditch it and
have asm-offsets.c create something based on sizeof(pte_t).

>  #define pte_none(pte)((pte_val(pte) & ~_PTE_NONE_MASK) == 0)
>  #define pte_present(pte) (pte_val(pte) & _PAGE_PRESENT)
> -#define pte_clear(mm,addr,ptep)  do { pte_update(ptep, ~0, 0); } while 
> (0)
> +#define pte_clear(mm, addr, ptep) \
> + do { pte_update(ptep, ~_PAGE_HASHPTE, 0); } while (0)

Where does this previous definition of pte_clear comes from ? It's bogus
(and it's not like that upstream). Your "updated" ones looks ok though.

But whatever tree has the "previous" one would break hash based ppc32
if merged or is there some other related changes in Kumar tree that
make the above safe ?
 
>  #define pmd_none(pmd)(!pmd_val(pmd))
>  #define  pmd_bad(pmd)(pmd_val(pmd) & _PMD_BAD)
> @@ -664,8 +670,30 @@ static inline void set_pte_at(struct mm_struct *mm, 
> unsigned long addr,
> pte_t *ptep, pte_t pte)
>  {
>  #if _PAGE_HASHPTE != 0
> +#ifndef CONFIG_PTE_64BIT
>   pte_update(ptep, ~_PAGE_HASHPTE, pte_val(pte) & ~_PAGE_HASHPTE);
>  #else
> + /*
> +  * We have to do the write of the 64b pte as 2 stores.  This
> +  * code assumes that the entry we're storing to is currently
> +  * not valid and that all callers have the page table lock.
> +  * Having the entry be not valid protects readers who might read
> +  * between the first and second stores.
> +  */
> + unsigned int tmp;

Do you know for sure the entry isn't valid ? On ppc64, we explicitely
test for that and if it was valid, we clear and flush it. The generic
code has been going on and off about calling set_pte_at() on an already
valid PTE, I wouldn't rely too much on it guaranteeing it will not
happen. The 32b PTE code is safe because it preserves _PAGE_HASHPTE .

Note also that once you have (which you don't now) the guarantee
that your previous PTE is invalid, then you can safely do two normal
stores instead of a lwarx/stwcx. loop. In any case, having the stw in
the middle of the loop doesn't sound very useful.

> + __asm__ __volatile__("\
> +1:   lwarx   %0,0,%4\n\
> + rlwimi  %L2,%0,0,30,30\n\
> + stw %2,0(%3)\n\
> + eieio\n\
> + stwcx.  %L2,0,%4\n\
> + bne-1b"
> + : "=&r" (tmp), "=m" (*ptep)
> + : "r" (pte), "r" (ptep), "r" ((unsigned long)(ptep) + 4), "m" (*ptep)
> + : "cc");
> +#endif   /* CONFIG_PTE_64BIT */
> +#else /* _PAGE_HASHPTE == 0 */
>  #if defined(CONFIG_PTE_64BIT) && defined(CONFIG_SMP)
>   __asm__ __volatile__("\
>   stw%U0%X0 %2,%0\n\
> diff --git a/arch/powerpc/mm/hash_low_32.S b/arch/powerpc/mm/hash_low_32.S
> index b9ba7d9..d63e20a 100644
> --- a/arch/powerpc/mm/hash_low_32.S
> +++ b/arch/powerpc/mm/hash_low_32.S
> @@ -75,7 +75,7 @@ _GLOBAL(hash_page_sync)
>   * Returns to the caller if the access is illegal or there is no
>   * mapping for the address.  Otherwise it places an appropriate PTE
>   * in the hash table and returns from the exception.
> - * Uses r0, r3 - r8, ctr, lr.
> + * Uses r0, r3 - r8, r10, ctr, lr.
>   */
>   .text
>  _GLOBAL(hash_page)
> @@ -106,9 +106,15 @@ _GLOBAL(hash_page)
>   addir5,r5,[EMAIL PROTECTED] /* kernel page table */
>   rlwimi  r3,r9,32-12,29,29   /* MSR_PR -> _PAGE_USER */
>  112: add r5,r5,r7/* convert to phys addr */
> +#ifndef CONFIG_PTE_64BIT
>   rlwimi  r5,r4,12,20,29  /* insert top 10 bits of address */
>   lwz r8,0(r5)/* get pmd entry */
>   rlwinm. r8,r8,0,0,19/* extract address of pte page */
> +#else
> + rlwinm  r8,r4,13,19,29  /* Compute pgdir/pmd offset */
> + lwzxr8,r8,r5/* Get L1 entry */
> + rlwinm. r8,r8,0,0,20/* extract pt base address */
> +#endif

Any reason you wrote the above using a different technique ? If you
believe that rlwinm/lwzx is going to be more efficient than rlwimi/lwz,
maybe we should fix the old one ... or am I missing something totally
obvious ? :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [PATCH v2] POWERPC: Allow 32-bit pgtable code to support 36-bit physical

2008-08-31 Thread Benjamin Herrenschmidt

> Could the stw to the same reservation granule as the stwcx cancel the 
> reservation on some implementations?

It might I suppose ... In any case, see my replies to Becky.

>   Plus, if you're assuming that the 
> entry is currently invalid and all callers have the page table lock, do 
> we need the lwarx/stwcx at all?  At the least, it should check 
> PTE_ATOMIC_UPDATES.

It shouldn't need atomic operations -if- the current entry is invalid
-and- _PAGE_HASHPTE is clear. (Ie, the current entry is invalid -and- 
the hash table has been updated).

In any case, the stw can just move out of the loop.

It might be worth just doing something along the lines of

if (pte_val(*ptep) & _PAGE_PRESENT)
pte_clear(pte);
if (pte_val(*ptep) & _PAGE_HASHPTE)
flush_hash_entry(mm, ptep, addr);
asm v. { "stw ; eieio; stw" };

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?

2008-08-31 Thread Benjamin Herrenschmidt
On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote:
> Hi, all:
> My problem seems basically solved.
> We we used to call vmalloc() in the memory management part of our
> source, but it seems to be the key unreliable point resulting in the
> problem. vmalloc() always assigns some virtual addresses whose
> corresponding physical addresses are out of memory size (there is only
> 32MB DRAM in our 405 board). Once instructions try to access these
> illegal physical address, machine check happens

That should -never- happen.

Have you verified, as I asked you a while ago, that you are actually
passing the right amount of memory to your kernel from the device-tree
or the bootloader ?

Ben.

> Afterwards, we call kmalloc() instead and it works basically as what
> we want. But problems of the memory management still exist because
> therea are program check exception sometimes and page always:
> 
> -bash-3.2# PROGRAM: reason: 0x800, nip: 0xc028bf20
> Oops: Exception in kernel mode, sig: 4 [#1]
> NIP: C028BF20 LR: C028BF20 CTR: C31C6078
> REGS: c028be80 TRAP: 0700   Not tainted  (2.6.19.2-eldk-xm.1.0)
> MSR: 00029030   CR:   XER: 
> TASK = c0228a30[0] 'swapper' THREAD: c028a000
> GPR00:  C028BF30 C0228A30 C034B7B0 C028BF20  0001
>  
> GPR08: 0003 C31D 2282 00029030 2BDD9FE1 C03B3164 066F
> 2B1F1DC8 
> GPR16: C03B3050 0FFEA478 1001 C31D C028BEF0 C31CA2E4 00021030
> C028A000 
> GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030
> C03B3050 
> NIP [C028BF20] init_thread_union+0x1f20/0x2000
> LR [C028BF20] init_thread_union+0x1f20/0x2000
> Call Trace:
> [C028BF30] [0FFEA478] 0xffea478 (unreliable)
> Instruction dump:
>         
>         
> Kernel panic - not syncing: Attempted to kill the idle task!
>  <0>Rebooting in 180 seconds..
> 
> And there is bad page:
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Backtrace:
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Bad page state in process 'loader.xm'
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Trying to fix it up, but a reboot is needed
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Bad page state in process 'loader.xm'
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: Trying to fix it up, but a reboot is needed
> Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> 405 kernel: page:c02f0e60 flags:0x0400 mapping: mapcount:0
> count:1
> 
> I will do some traces for fixing those problems.
> 
> And could anyone like to give some explanation between vmalloc() and
> kmalloc()? Based on our work, there seems to be great difference.
> 
> Thank you very much!
> 
> Best Wishes
> 
> Zhou Rui
> 2008-08-31
> 
> 在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> > Hi,
> > I think maybe you have known this project named XtratuM
> > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> > implementation on PPC440 has been basically finished
> > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2)
> >  and I know there was discussion about it in this mail list before. XtratuM 
> > is an ADEOS based nano kernel. It aims for realtime and is designed to 
> > provide virtual timer, virtual interrupt and memory space sperations for 
> > domains. Each domain is loaded by a userspace program (instead of the root 
> > domain as a kernel module) and the loader will load the domain's (ELF 
> > staticly excutable) PT_LOAD section into memory, and then raise a properly 
> > system call (passing the structurized loaded data as arguments) to load the 
> > domain via load_domain_sys() of XtratuM, and at the last step of loading 
> > the domain, xtratum will jump to the entry code of the new domain(asm 
> > wrappered start() routine) and then everything should be fine. 0x10a0 
> > is the entry point of the test domain, and that is why I need to start 
> > execution from it.
> > 
> > I think I can say something of my analysis so far for the cause of my
> > problem. Thanks for the mention of memory size. Once the kernel module
> > of XtratuM is loaded, the symbols of it are placed to virtual addresses
> > like 0xc3xx. Because in normal state, address translation is enabled
> > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> > the domain, because the entry point 0x10a0 is not in TLB and it
> > should be reloaded, Data TLB Miss Exception arises and DTLBMiss is
> > called. The exception clears MSR[IR, DR], so address translation is
> > disabled and physical address should be used at this moment. If we want
> > something at the virtual address of 0xc3xx, we must access the
> > physical addresses like 0x03xx. Nevertheless, the limitation of 32MB
> > memory makes the valid

Re: Efficient memcpy()/memmove() for G2/G3 cores...

2008-08-31 Thread David Jander
On Sunday 31 August 2008 10:28:43 Benjamin Herrenschmidt wrote:
> O> > It would be useful of somebody interested in getting things things
>
> > > > into glibc did the necessary FSF copyright assignment stuff and
> > > > worked toward integrating them.
> > >
> > > Ben makes a very good point!
> >
> > Sounds reasonable... but I am still wondering about what you mean
> > with "things"?
>
> Typo. I meant "these things", that is, variants of various libc
> functions optimized for a given processor type.

Ok, we'd have to _make_ those "things" first then ;-)

> > AFAICS there is almost nothing there (besides the memcpy() routine from
> > Gunnar von Boehn, which is apparently still far from optimal). And I was
> > asking for someone to correct me here ;-)
>
> No idea, as we said, it's mostly up to users of the processors (or to a
> certain extent, manufacturers, hint hint hint) to do that work.

Ok, I get the point.

> > > There is also a framework for adding and maintaining optimizations of
> > > this type:
> > >
> > > http://penguinppc.org/dev/glibc/glibc-powerpc-cpu-addon.html
> >
> > I had already stumbled across this one, but it seems to focus on G3 or
> > newer processors (power4). There is no optimal memcpy() for
> > G2/PPC603/e300.
>
> It focuses on what the people doing it have access to, are paid to work
> on, or other material constraints. It's up to others from the community
> to fill the gaps.

That's all I need to know ;-)

> > >[...]
> > > So it does no good to complain here. If you have core you want to
> > > contribute, Get your FSF CR assignment and join #glibc on freenode IRC.
> >
> > I am not complaining. I was only wondering if it is just me or there
> > really is very little that has been done (for either uClibc, glibc, or
> > whatever for powerpc) to improve performance of (linux-) applications on
> > "lower"-power platforms (G2 core), AFAICS there is a LOT that can be
> > gained by simple tweaks.
>
> Well, possibly, then you are welcome to work on those tweaks and if they
> indeed improve things, submit patches to glibc :-) I'm sure Steve and
> Ryan will be happy to help with the submission process.

Sounds encouraging. I'll try my best (in the limited amount of time I have).

>[...]
> You don't have to do it all at once. A  simple tweak of one function
> such as memcpy, if it's measurably improving performances without
> notable regressions could be a first step, and then tweak after tweak...
>
> It's a common mistake to try to do too much "out of tree" and then
> struggle and give up when it's time to merge that stuff because there
> are too many areas that won't necessarily be acceptable "as is".
>
> One little bit at a time is generally a better approach.

Ok, I take your advice.

> > OTOH, maybe it is easier and simpler to start with a collection of
> > functions in a shared-library, that may be suited for preloading via
> > LD_PRELOAD or /etc/ld_preload...
> >
> > Maybe once this collection is more stable (in terms of that heavy
> > tweaking has stopped) one could try the pilgrimage towards glibc
> > inclusion
>
> I believe that's the wrong approach as it leads to never-merged out-of
> tree code.

Hmm... you mean, it'll be easier to keep patching (improving) things once they 
are already in glibc? Interesting.

Thanks a lot for your comments.

Best regards,

-- 
David Jander
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev


Re: [Evolves!] Why does one "stw" fail with address translation disabled in PPC405EP?

2008-08-31 Thread Zhou Rui

在 2008-09-01一的 15:42 +1000,Benjamin Herrenschmidt写道:
> On Sun, 2008-08-31 at 13:50 +0200, Zhou Rui wrote:
> > Hi, all:
> > My problem seems basically solved.
> > We we used to call vmalloc() in the memory management part of our
> > source, but it seems to be the key unreliable point resulting in the
> > problem. vmalloc() always assigns some virtual addresses whose
> > corresponding physical addresses are out of memory size (there is only
> > 32MB DRAM in our 405 board). Once instructions try to access these
> > illegal physical address, machine check happens
> 
> That should -never- happen.
> 
> Have you verified, as I asked you a while ago, that you are actually
> passing the right amount of memory to your kernel from the device-tree
> or the bootloader ?
> 
> Ben.

I added "mem=32M" to linux command line of the bootloader, and got the
same machine check.

Best Wishes

Zhou Rui
2008-09-01

> 
> > Afterwards, we call kmalloc() instead and it works basically as what
> > we want. But problems of the memory management still exist because
> > therea are program check exception sometimes and page always:
> > 
> > -bash-3.2# PROGRAM: reason: 0x800, nip: 0xc028bf20
> > Oops: Exception in kernel mode, sig: 4 [#1]
> > NIP: C028BF20 LR: C028BF20 CTR: C31C6078
> > REGS: c028be80 TRAP: 0700   Not tainted  (2.6.19.2-eldk-xm.1.0)
> > MSR: 00029030   CR:   XER: 
> > TASK = c0228a30[0] 'swapper' THREAD: c028a000
> > GPR00:  C028BF30 C0228A30 C034B7B0 C028BF20  0001
> >  
> > GPR08: 0003 C31D 2282 00029030 2BDD9FE1 C03B3164 066F
> > 2B1F1DC8 
> > GPR16: C03B3050 0FFEA478 1001 C31D C028BEF0 C31CA2E4 00021030
> > C028A000 
> > GPR24: C028BEF0 C0228B44 C0228468 C03B3050 C028BF10 C31C60C4 00029030
> > C03B3050 
> > NIP [C028BF20] init_thread_union+0x1f20/0x2000
> > LR [C028BF20] init_thread_union+0x1f20/0x2000
> > Call Trace:
> > [C028BF30] [0FFEA478] 0xffea478 (unreliable)
> > Instruction dump:
> >         
> >         
> > Kernel panic - not syncing: Attempted to kill the idle task!
> >  <0>Rebooting in 180 seconds..
> > 
> > And there is bad page:
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Backtrace:
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Bad page state in process 'loader.xm'
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Trying to fix it up, but a reboot is needed
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Bad page state in process 'loader.xm'
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: Trying to fix it up, but a reboot is needed
> > Message from syslogd@ at Thu Jan  1 01:32:00 1970 ...
> > 405 kernel: page:c02f0e60 flags:0x0400 mapping: mapcount:0
> > count:1
> > 
> > I will do some traces for fixing those problems.
> > 
> > And could anyone like to give some explanation between vmalloc() and
> > kmalloc()? Based on our work, there seems to be great difference.
> > 
> > Thank you very much!
> > 
> > Best Wishes
> > 
> > Zhou Rui
> > 2008-08-31
> > 
> > 在 2008-08-25一的 21:16 +0200,Zhou Rui写道:
> > > Hi,
> > > I think maybe you have known this project named XtratuM
> > > (http://www.xtratum.org). I'm porting it from x86 to PPC405. The
> > > implementation on PPC440 has been basically finished
> > > (ftp://dslab.lzu.edu.cn/pub/xtratum/xtratum-ppc/snapshots/xtratum-ppc-20071205.tar.bz2)
> > >  and I know there was discussion about it in this mail list before. 
> > > XtratuM is an ADEOS based nano kernel. It aims for realtime and is 
> > > designed to provide virtual timer, virtual interrupt and memory space 
> > > sperations for domains. Each domain is loaded by a userspace program 
> > > (instead of the root domain as a kernel module) and the loader will load 
> > > the domain's (ELF staticly excutable) PT_LOAD section into memory, and 
> > > then raise a properly system call (passing the structurized loaded data 
> > > as arguments) to load the domain via load_domain_sys() of XtratuM, and at 
> > > the last step of loading the domain, xtratum will jump to the entry code 
> > > of the new domain(asm wrappered start() routine) and then everything 
> > > should be fine. 0x10a0 is the entry point of the test domain, and 
> > > that is why I need to start execution from it.
> > > 
> > > I think I can say something of my analysis so far for the cause of my
> > > problem. Thanks for the mention of memory size. Once the kernel module
> > > of XtratuM is loaded, the symbols of it are placed to virtual addresses
> > > like 0xc3xx. Because in normal state, address translation is enabled
> > > (MSR[IR, DR] = [1, 1]), these addresses are okay. However, when loading
> > > the domain, because the entry point 0x10a0 is not in TLB and it
> > > should be