RE: [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD

2012-05-21 Thread Chen Peter-B29397
 
> 
> USB controller may access a wrong address for the dTD (endpoint transfer
> descriptor) and then hang. This happens a lot when doing tests with
> g_ether module and iperf, a tool for measuring maximum TCP and UDP
> bandwidth.
> 
> This hardware bug is explained in detail by errata number 2858 for i.MX23:
> http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf
> 

> All (?) SOCs with an IP from chipidea suffer from this problem.
> mv_udc_core fixes this bug by commit daec765.  There still may be
> unfixed drivers.
> 
> Signed-off-by: Christoph Fritz 
> Signed-off-by: Christian Hemp 
> ---
>  drivers/usb/gadget/fsl_udc_core.c |   15 ++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/usb/gadget/fsl_udc_core.c
> b/drivers/usb/gadget/fsl_udc_core.c
> index 55abfb6..72f2139 100644
> --- a/drivers/usb/gadget/fsl_udc_core.c
> +++ b/drivers/usb/gadget/fsl_udc_core.c
> @@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs;
>  /* it is initialized in probe()  */
>  static struct fsl_udc *udc_controller = NULL;
> 
> +static struct ep_td_struct *last_free_td;
> +
>  static const struct usb_endpoint_descriptor
>  fsl_ep0_desc = {
>   .bLength =  USB_DT_ENDPOINT_SIZE,
> @@ -180,8 +182,13 @@ static void done(struct fsl_ep *ep, struct fsl_req
> *req, int status)
>   curr_td = next_td;
>   if (j != req->dtd_count - 1) {
>   next_td = curr_td->next_td_virt;
> + dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma);
> + } else {
> + if (last_free_td != NULL)
> + dma_pool_free(udc->td_pool, last_free_td,
> + last_free_td->td_dma);
> + last_free_td = curr_td;
>   }
> - dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma);
>   }
> 
>   if (req->mapped) {
> @@ -2579,6 +2586,8 @@ static int __init fsl_udc_probe(struct
> platform_device *pdev)
>   goto err_unregister;
>   }
> 
> + last_free_td = NULL;
> +
>   ret = usb_add_gadget_udc(&pdev->dev, &udc_controller->gadget);
>   if (ret)
>   goto err_del_udc;
> @@ -2633,6 +2642,10 @@ static int __exit fsl_udc_remove(struct
> platform_device *pdev)
>   kfree(udc_controller->status_req);
>   kfree(udc_controller->eps);
> 
> + if (last_free_td != NULL)
> + dma_pool_free(udc_controller->td_pool, last_free_td,
> + last_free_td->td_dma);
> +
>   dma_pool_destroy(udc_controller->td_pool);
>   free_irq(udc_controller->irq, udc_controller);
>   iounmap(dr_regs);

Reviewed-by: Peter Chen 
> --
> 1.7.2.5
> 
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [patch] hvc_xen: NULL dereference on allocation failure

2012-05-21 Thread Konrad Rzeszutek Wilk
On Tue, May 15, 2012 at 11:20:23AM +0100, Stefano Stabellini wrote:
> On Tue, 15 May 2012, Dan Carpenter wrote:
> > If kzalloc() returns a NULL here, we pass a NULL to
> > xencons_disconnect_backend() which will cause an Oops.
> > 
> > Also I removed the __GFP_ZERO while I was at it since kzalloc() implies
> > __GFP_ZERO.
> > 
> > Signed-off-by: Dan Carpenter 
> 
> Acked-by: Stefano Stabellini 

applied.
>  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD

2012-05-21 Thread Felipe Balbi
Hi,

On Mon, May 21, 2012 at 08:57:22AM +0200, Christoph Fritz wrote:
> USB controller may access a wrong address for the dTD (endpoint transfer
> descriptor) and then hang. This happens a lot when doing tests with
> g_ether module and iperf, a tool for measuring maximum TCP and UDP
> bandwidth.
> 
> This hardware bug is explained in detail by errata number 2858 for i.MX23:
> http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf
> 
> All (?) SOCs with an IP from chipidea suffer from this problem.
> mv_udc_core fixes this bug by commit daec765.  There still may be
> unfixed drivers.
> 
> Signed-off-by: Christoph Fritz 
> Signed-off-by: Christian Hemp 
> ---
>  drivers/usb/gadget/fsl_udc_core.c |   15 ++-
>  1 files changed, 14 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/usb/gadget/fsl_udc_core.c 
> b/drivers/usb/gadget/fsl_udc_core.c
> index 55abfb6..72f2139 100644
> --- a/drivers/usb/gadget/fsl_udc_core.c
> +++ b/drivers/usb/gadget/fsl_udc_core.c
> @@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs;
>  /* it is initialized in probe()  */
>  static struct fsl_udc *udc_controller = NULL;
>  
> +static struct ep_td_struct *last_free_td;

I don't want to see global variables anymore. In fact, please convert
this to the new udc_start()/udc_stop() calls and use the generic
map/unmap routines.

That'll help you get rid of a bunch of useless code on the driver. After
that you should remove all  header includes and drop the ARCH
dependency.

You can also drop the big-/little-endian helpers as you can make use of
generic writel()/readl() routines.

Please make sure these series comes in with enough time to reach v3.6
merge window in about 3 months.

You can put this fix together on that series after you drop the global.

-- 
balbi


signature.asc
Description: Digital signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: Build regressions/improvements in v3.4

2012-05-21 Thread Geert Uytterhoeven
On Mon, May 21, 2012 at 9:51 PM, Geert Uytterhoeven
 wrote:
> JFYI, when comparing v3.4 to v3.4-rc7[3], the summaries are:
>  - build errors: +5/-14

  + error: via-pmu-event.c: undefined reference to
`.input_allocate_device':  => .init.text+0x8aa4)
  + error: via-pmu-event.c: undefined reference to
`.input_free_device':  => .init.text+0x8b68)
  + error: via-pmu-event.c: undefined reference to
`.input_register_device':  => .init.text+0x8b54)

powerpc-randconfig

  + kernel/fork.c: error: implicit declaration of function
'alloc_task_struct_node' [-Werror=implicit-function-declaration]:  =>
266:2
  + kernel/fork.c: error: implicit declaration of function
'free_task_struct' [-Werror=implicit-function-declaration]:  => 174:2

frv-defconfig (ouch)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Stephen Rothwell
Hi all,

Last nights boot tests on various PowerPC systems failed like this:

calling  .numa_group_init+0x0/0x3c @ 1
initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs
calling  .numa_init+0x0/0x1dc @ 1
Unable to handle kernel paging request for data at address 0x1688
Faulting instruction address: 0xc016e154
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA pSeries
Modules linked in:
NIP: c016e154 LR: c01b9140 CTR: 
REGS: c003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
MSR: 80009032   CR: 24044022  XER: 0003
SOFTE: 1
CFAR: 562c
DAR: 1688, DSISR: 4000
TASK = c003fc8c8000[1] 'swapper/0' THREAD: c003fc8c4000 CPU: 0
GPR00:  c003fc8c7950 c0d05b30 12d0 
GPR04:  1680  c003fe032f60 
GPR08: 000400540001  c980 c0d24fe0 
GPR12: 24044024 cf33b000 01a3fa78 009bac00 
GPR16: 00e1f338 02d513f0 1680  
GPR20: 0001 c003fc8c7c00  0001 
GPR24: 0001 c0d1b490  1680 
GPR28:   c0c7ce58 c003fe009200 
NIP [c016e154] .__alloc_pages_nodemask+0xc4/0x8f0
LR [c01b9140] .new_slab+0xd0/0x3c0
Call Trace:
[c003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
[c003fc8c7ae0] [c01b9140] .new_slab+0xd0/0x3c0
[c003fc8c7b90] [c01b9844] .__slab_alloc+0x254/0x5b0
[c003fc8c7cd0] [c01bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
[c003fc8c7d80] [c0ba36d0] .numa_init+0x98/0x1dc
[c003fc8c7e10] [c000ace4] .do_one_initcall+0x1a4/0x1e0
[c003fc8c7ed0] [c0b7b354] .kernel_init+0x124/0x2e0
[c003fc8c7f90] [c00211c8] .kernel_thread+0x54/0x70
Instruction dump:
5400d97e 7b170020 0b00 eb3e8000 3b80 80190088 2f80 40de0014 
7860efe2 787c6fe2 78000fa4 7f9c0378  83f9 2fa0 7fff1838 
---[ end trace 31fd0ba7d8756001 ]---

swapper/0 (1) used greatest stack depth: 10864 bytes left
Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b

I may be completely wrong, but I guess the obvious target would be the
sched/numa branch that came in via the tip tree.

Config file attached.  I haven't had a chance to try to bisect this yet.

Anyone have any ideas?
-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


dotconfig.bz2
Description: Binary data


pgpBVPIkm2SpZ.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

powerpc -next rebase WARNING

2012-05-21 Thread Benjamin Herrenschmidt
Folks, bad news ... my fault.

I accidentally forgot a --signoff on a git am command last week, meaning
that a pair of patches are in -next and not signed off by me.

For various (legal) reasons that cannot go into Linus tree as-is, so I
have to rebase the tree to fix it.

Sorry about that ...

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread David Rientjes
On Tue, 22 May 2012, Stephen Rothwell wrote:

> Unable to handle kernel paging request for data at address 0x1688
> Faulting instruction address: 0xc016e154
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in:
> NIP: c016e154 LR: c01b9140 CTR: 
> REGS: c003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
> MSR: 80009032   CR: 24044022  XER: 0003
> SOFTE: 1
> CFAR: 562c
> DAR: 1688, DSISR: 4000
> TASK = c003fc8c8000[1] 'swapper/0' THREAD: c003fc8c4000 CPU: 0
> GPR00:  c003fc8c7950 c0d05b30 12d0 
> GPR04:  1680  c003fe032f60 
> GPR08: 000400540001  c980 c0d24fe0 
> GPR12: 24044024 cf33b000 01a3fa78 009bac00 
> GPR16: 00e1f338 02d513f0 1680  
> GPR20: 0001 c003fc8c7c00  0001 
> GPR24: 0001 c0d1b490  1680 
> GPR28:   c0c7ce58 c003fe009200 
> NIP [c016e154] .__alloc_pages_nodemask+0xc4/0x8f0
> LR [c01b9140] .new_slab+0xd0/0x3c0
> Call Trace:
> [c003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
> [c003fc8c7ae0] [c01b9140] .new_slab+0xd0/0x3c0
> [c003fc8c7b90] [c01b9844] .__slab_alloc+0x254/0x5b0
> [c003fc8c7cd0] [c01bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
> [c003fc8c7d80] [c0ba36d0] .numa_init+0x98/0x1dc
> [c003fc8c7e10] [c000ace4] .do_one_initcall+0x1a4/0x1e0
> [c003fc8c7ed0] [c0b7b354] .kernel_init+0x124/0x2e0
> [c003fc8c7f90] [c00211c8] .kernel_thread+0x54/0x70
> Instruction dump:
> 5400d97e 7b170020 0b00 eb3e8000 3b80 80190088 2f80 40de0014 
> 7860efe2 787c6fe2 78000fa4 7f9c0378  83f9 2fa0 7fff1838 
> ---[ end trace 31fd0ba7d8756001 ]---
> 
> swapper/0 (1) used greatest stack depth: 10864 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> 
> I may be completely wrong, but I guess the obvious target would be the
> sched/numa branch that came in via the tip tree.
> 
> Config file attached.  I haven't had a chance to try to bisect this yet.
> 
> Anyone have any ideas?

Yeah, it's sched/numa since that's what introduced numa_init().  It does 
for_each_node() for each node and does a kmalloc_node() even though that 
node may not be online.  Slub ends up passing this node to the page 
allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
caught this and your config confirms its not enabled.

sched/numa either needs a memory hotplug notifier or it needs to pass 
NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
following should fix it.


sched, numa: Allocate node_queue on any node for offline nodes

struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
not (yet) online, otherwise the page allocator has a bad zonelist.

Signed-off-by: David Rientjes 
---
diff --git a/kernel/sched/numa.c b/kernel/sched/numa.c
--- a/kernel/sched/numa.c
+++ b/kernel/sched/numa.c
@@ -885,7 +885,8 @@ static __init int numa_init(void)
 
for_each_node(node) {
struct node_queue *nq = kmalloc_node(sizeof(*nq),
-   GFP_KERNEL | __GFP_ZERO, node);
+   GFP_KERNEL | __GFP_ZERO,
+   node_online(node) ? node : NUMA_NO_NODE);
BUG_ON(!nq);
 
spin_lock_init(&nq->lock);
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Michael Neuling
> Hi all,
> 
> Last nights boot tests on various PowerPC systems failed like this:
> 
> calling  .numa_group_init+0x0/0x3c @ 1
> initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs
> calling  .numa_init+0x0/0x1dc @ 1
> Unable to handle kernel paging request for data at address 0x1688
> Faulting instruction address: 0xc016e154
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=32 NUMA pSeries
> Modules linked in:
> NIP: c016e154 LR: c01b9140 CTR: 
> REGS: c003fc8c76d0 TRAP: 0300   Not tainted  (3.4.0-autokern1)
> MSR: 80009032   CR: 24044022  XER: 0003
> SOFTE: 1
> CFAR: 562c
> DAR: 1688, DSISR: 4000
> TASK = c003fc8c8000[1] 'swapper/0' THREAD: c003fc8c4000 CPU: 0
> GPR00:  c003fc8c7950 c0d05b30 12d0 
> GPR04:  1680  c003fe032f60 
> GPR08: 000400540001  c980 c0d24fe0 
> GPR12: 24044024 cf33b000 01a3fa78 009bac00 
> GPR16: 00e1f338 02d513f0 1680  
> GPR20: 0001 c003fc8c7c00  0001 
> GPR24: 0001 c0d1b490  1680 
> GPR28:   c0c7ce58 c003fe009200 
> NIP [c016e154] .__alloc_pages_nodemask+0xc4/0x8f0
> LR [c01b9140] .new_slab+0xd0/0x3c0
> Call Trace:
> [c003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable)
> [c003fc8c7ae0] [c01b9140] .new_slab+0xd0/0x3c0
> [c003fc8c7b90] [c01b9844] .__slab_alloc+0x254/0x5b0
> [c003fc8c7cd0] [c01bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260
> [c003fc8c7d80] [c0ba36d0] .numa_init+0x98/0x1dc
> [c003fc8c7e10] [c000ace4] .do_one_initcall+0x1a4/0x1e0
> [c003fc8c7ed0] [c0b7b354] .kernel_init+0x124/0x2e0
> [c003fc8c7f90] [c00211c8] .kernel_thread+0x54/0x70
> Instruction dump:
> 5400d97e 7b170020 0b00 eb3e8000 3b80 80190088 2f80 40de0014 
> 7860efe2 787c6fe2 78000fa4 7f9c0378  83f9 2fa0 7fff1838 
> ---[ end trace 31fd0ba7d8756001 ]---
> 
> swapper/0 (1) used greatest stack depth: 10864 bytes left
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
> 
> I may be completely wrong, but I guess the obvious target would be the
> sched/numa branch that came in via the tip tree.
> 
> Config file attached.  I haven't had a chance to try to bisect this yet.
> 
> Anyone have any ideas?

I'm getting similar here:


console [tty0] enabled
console [hvc0] enabled
pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
Unable to handle kernel paging request for data at address 0x1388
Faulting instruction address: 0xc014a070
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c014a070 LR: c01978cc CTR: c00b6870
REGS: c0007e5836b0 TRAP: 0300   Tainted: GW (3.4.0-rc6-mikey)
MSR: 90009032   CR: 28004022  XER: 0200
SOFTE: 1
CFAR: 50fc
DAR: 1388, DSISR: 4000
TASK = c0007e56[1] 'swapper/0' THREAD: c0007e58 CPU: 0
GPR00:  c0007e583930 c0c034d8 12d0 
GPR04:  1380  0001 
GPR08: c0007e0dff60  c0ca05a0  
GPR12: 28004024 cff2   
GPR16:   0001 1380 
GPR20: 0001 c0e14900 c0e148f0 0001 
GPR24: c0c6f378  1380 02aa 
GPR28:   c0b576b0 c0007e021200 
NIP [c014a070] .__alloc_pages_nodemask+0xd0/0x910
LR [c01978cc] .new_slab+0xcc/0x3d0
Call Trace:
[c0007e583930] [c0007e5839c0] 0xc0007e5839c0 (unreliable)
[c0007e583ac0] [c01978cc] .new_slab+0xcc/0x3d0
[c0007e583b70] [c072ae98] .__slab_alloc+0x38c/0x4f8
[c0007e583cb0] [c0198190] .kmem_cache_alloc_node_trace+0x90/0x260
[c0007e583d60] [c0a5a404] .numa_init+0x9c/0x188
[c0007e583e00] [c000aa30] .do_one_initcall+0x60/0x1e0
[c0007e583ec0] [c0a40b60] .kernel_init+0x128/0x294
[c0007e583f90] [c0020788] .kernel_thread+0x54/0x70
Instruction dump:
0b00 eb1e8000 3b80 801800a8 2f80 409e001c 7860efe3 3800 
41820008 3802 787c6fe2 7f9c0

Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread David Rientjes
On Tue, 22 May 2012, Michael Neuling wrote:

> console [tty0] enabled
> console [hvc0] enabled
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> Unable to handle kernel paging request for data at address 0x1388
> Faulting instruction address: 0xc014a070
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 NUMA pSeries
> Modules linked in:
> NIP: c014a070 LR: c01978cc CTR: c00b6870
> REGS: c0007e5836b0 TRAP: 0300   Tainted: GW (3.4.0-rc6-mikey)
> MSR: 90009032   CR: 28004022  XER: 0200
> SOFTE: 1
> CFAR: 50fc
> DAR: 1388, DSISR: 4000
> TASK = c0007e56[1] 'swapper/0' THREAD: c0007e58 CPU: 0
> GPR00:  c0007e583930 c0c034d8 12d0 
> GPR04:  1380  0001 
> GPR08: c0007e0dff60  c0ca05a0  
> GPR12: 28004024 cff2   
> GPR16:   0001 1380 
> GPR20: 0001 c0e14900 c0e148f0 0001 
> GPR24: c0c6f378  1380 02aa 
> GPR28:   c0b576b0 c0007e021200 
> NIP [c014a070] .__alloc_pages_nodemask+0xd0/0x910
> LR [c01978cc] .new_slab+0xcc/0x3d0
> Call Trace:
> [c0007e583930] [c0007e5839c0] 0xc0007e5839c0 (unreliable)
> [c0007e583ac0] [c01978cc] .new_slab+0xcc/0x3d0
> [c0007e583b70] [c072ae98] .__slab_alloc+0x38c/0x4f8
> [c0007e583cb0] [c0198190] .kmem_cache_alloc_node_trace+0x90/0x260
> [c0007e583d60] [c0a5a404] .numa_init+0x9c/0x188
> [c0007e583e00] [c000aa30] .do_one_initcall+0x60/0x1e0
> [c0007e583ec0] [c0a40b60] .kernel_init+0x128/0x294
> [c0007e583f90] [c0020788] .kernel_thread+0x54/0x70
> Instruction dump:
> 0b00 eb1e8000 3b80 801800a8 2f80 409e001c 7860efe3 3800 
> 41820008 3802 787c6fe2 7f9c0378  801800a4 3b60 2fa9 
> ---[ end trace 31fd0ba7d8756002 ]---
> 
> Which seems to be this code in __alloc_pages_nodemask
> ---
> /*
>  * Check the zones suitable for the gfp_mask contain at least one
>  * valid zone. It's possible to have an empty zonelist as a result
>  * of GFP_THISNODE and a memoryless node
>  */
> if (unlikely(!zonelist->_zonerefs->zone))
> c014a070:   e9 3a 00 08 ld  r9,8(r26)
> ---
> 
> r26 is coming from r5 which is the struct zonelist *zonelist parameter
> to __alloc_pages_nodemask.  Having 1380 in there is clearly
> a bogus pointer.
> 
> Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288
>   b4cdf91 sched/numa: Implement numa balancer
> 
> Trying David's patch just posted doesn't fix it.
> 

Hmm, what does CONFIG_DEBUG_VM say?
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Michael Neuling
> > Trying David's patch just posted doesn't fix it.
> > 
> 
> Hmm, what does CONFIG_DEBUG_VM say?

No set.

Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Michael Neuling
Michael Neuling  wrote:

> > > Trying David's patch just posted doesn't fix it.
> > > 
> > 
> > Hmm, what does CONFIG_DEBUG_VM say?
> 
> No set.

Sorry, should have read "Not set"

mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread David Rientjes
On Tue, 22 May 2012, Michael Neuling wrote:

> > > > Trying David's patch just posted doesn't fix it.
> > > > 
> > > 
> > > Hmm, what does CONFIG_DEBUG_VM say?
> > 
> > No set.
> 
> Sorry, should have read "Not set"
> 

I mean if it's set, what does it emit to the kernel log with my patch 
applied?

I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I 
was thinking it would have caught this if either you or Stephen enable it.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Michael Neuling
David Rientjes  wrote:

> On Tue, 22 May 2012, Michael Neuling wrote:
> 
> > > > > Trying David's patch just posted doesn't fix it.
> > > > > 
> > > > 
> > > > Hmm, what does CONFIG_DEBUG_VM say?
> > > 
> > > No set.
> > 
> > Sorry, should have read "Not set"
> > 
> 
> I mean if it's set, what does it emit to the kernel log with my patch 
> applied?
> 
> I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I 
> was thinking it would have caught this if either you or Stephen enable it.

Sorry, got it... CONFIG_DEBUG_VM enabled below...

pid_max: default: 32768 minimum: 301
Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
Mount-cache hash table entries: 4096
Initializing cgroup subsys cpuacct
Initializing cgroup subsys devices
Initializing cgroup subsys freezer
POWER7 performance monitor hardware support registered
[ cut here ]
kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA pSeries
Modules linked in:
NIP: c0199164 LR: c01993e0 CTR: c00b6b70
REGS: c0007e583830 TRAP: 0700   Tainted: GW (3.4.0-rc6-mikey)
MSR: 90029032   CR: 28004028  XER: 0200
SOFTE: 1
CFAR: c01993c4
TASK = c0007e56[1] 'swapper/0' THREAD: c0007e58 CPU: 0
GPR00: 0001 c0007e583ab0 c0c035a0 12d0 
GPR04:  0001 c0e14900 000505550001 
GPR08: 0001 12d0 c0c6f398 0001 
GPR12: 28004022 cff2   
GPR16:   1380  
GPR20: 0001 c0e14900 c0e148f0 00210d00 
GPR24: 0001 00d0 02aa  
GPR28: 00d0 0001 c0b58fc8 c0007e021200 
NIP [c0199164] .new_slab+0xb4/0x440
LR [c01993e0] .new_slab+0x330/0x440
Call Trace:
[c0007e583ab0] [c01993e0] .new_slab+0x330/0x440 (unreliable)
[c0007e583b60] [c072ce84] .__slab_alloc+0x3bc/0x52c
[c0007e583ca0] [c0199b08] .kmem_cache_alloc_node_trace+0x98/0x280
[c0007e583d60] [c0a5a440] .numa_init+0x9c/0x188
[c0007e583e00] [c000aa30] .do_one_initcall+0x60/0x1e0
[c0007e583ec0] [c0a40b60] .kernel_init+0x128/0x294
[c0007e583f90] [c0020788] .kernel_thread+0x54/0x70
Instruction dump:
7b5b8402 7f6407b4 7c1ce378 7d29e038 7b990020 61291200 79230020 419202b8 
2b9d00ff 78840020 3801 409d0240 <0b00> e95e8140 792977e2 7bab1f24 
---[ end trace 31fd0ba7d8756002 ]---


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread David Rientjes
On Tue, 22 May 2012, Michael Neuling wrote:

> Sorry, got it... CONFIG_DEBUG_VM enabled below...
> 
> pid_max: default: 32768 minimum: 301
> Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> POWER7 performance monitor hardware support registered
> [ cut here ]
> kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!

Yeah, this is what I was expecting, it's tripping on

VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));

and slub won't pass nid < 0.  You're sure my patch is applied? :)
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Stephen Rothwell
Hi David,

On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes  
wrote:
>
> Yeah, it's sched/numa since that's what introduced numa_init().  It does 
> for_each_node() for each node and does a kmalloc_node() even though that 
> node may not be online.  Slub ends up passing this node to the page 
> allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
> caught this and your config confirms its not enabled.
> 
> sched/numa either needs a memory hotplug notifier or it needs to pass 
> NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
> following should fix it.
> 
> 
> sched, numa: Allocate node_queue on any node for offline nodes
> 
> struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
> not (yet) online, otherwise the page allocator has a bad zonelist.
> 
> Signed-off-by: David Rientjes 

Thanks, that fixes it.

Tested-by: Stephen Rothwell 

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgpIKSOMWWL04.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Michael Neuling
David Rientjes  wrote:

> On Tue, 22 May 2012, Michael Neuling wrote:
> 
> > Sorry, got it... CONFIG_DEBUG_VM enabled below...
> > 
> > pid_max: default: 32768 minimum: 301
> > Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes)
> > Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes)
> > Mount-cache hash table entries: 4096
> > Initializing cgroup subsys cpuacct
> > Initializing cgroup subsys devices
> > Initializing cgroup subsys freezer
> > POWER7 performance monitor hardware support registered
> > [ cut here ]
> > kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318!
> 
> Yeah, this is what I was expecting, it's tripping on
> 
>   VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid));
> 
> and slub won't pass nid < 0.  You're sure my patch is applied? :)

I did have your patch applied but at "b4cdf91 sched/numa: Implement numa
balancer" (where git bisect spotted the fail).  

If I apply your patch on the full next-20120521 it does fix the problem.

Sorry for the confusion.

Thanks!
Mikey
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: linux-next: PowerPC boot failures in next-20120521

2012-05-21 Thread Stephen Rothwell
On Tue, 22 May 2012 13:03:54 +1000 Stephen Rothwell  
wrote:
>
> On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes  
> wrote:
> >
> > Yeah, it's sched/numa since that's what introduced numa_init().  It does 
> > for_each_node() for each node and does a kmalloc_node() even though that 
> > node may not be online.  Slub ends up passing this node to the page 
> > allocator through alloc_pages_exact_node().  CONFIG_DEBUG_VM would have 
> > caught this and your config confirms its not enabled.
> > 
> > sched/numa either needs a memory hotplug notifier or it needs to pass 
> > NUMA_NO_NODE for nodes that aren't online.  Until we get the former, the 
> > following should fix it.
> > 
> > 
> > sched, numa: Allocate node_queue on any node for offline nodes
> > 
> > struct node_queue must be allocated with NUMA_NO_NODE for nodes that are 
> > not (yet) online, otherwise the page allocator has a bad zonelist.
> > 
> > Signed-off-by: David Rientjes 
> 
> Thanks, that fixes it.
> 
> Tested-by: Stephen Rothwell 

And I will put that patch in linux-next until it (or something better)
appears.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au


pgplC0VqGASvZ.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: powerpc -next rebase WARNING

2012-05-21 Thread Benjamin Herrenschmidt
On Tue, 2012-05-22 at 11:51 +1000, Benjamin Herrenschmidt wrote:
> Folks, bad news ... my fault.
> 
> I accidentally forgot a --signoff on a git am command last week, meaning
> that a pair of patches are in -next and not signed off by me.
> 
> For various (legal) reasons that cannot go into Linus tree as-is, so I
> have to rebase the tree to fix it.
> 
> Sorry about that ...

Note that the rebase only affects the top 3 commits, so if your
tree is based on something older you're fine (Kumar, you seem to
be ok, I haven't checked Josh).

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev