RE: [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD
> > USB controller may access a wrong address for the dTD (endpoint transfer > descriptor) and then hang. This happens a lot when doing tests with > g_ether module and iperf, a tool for measuring maximum TCP and UDP > bandwidth. > > This hardware bug is explained in detail by errata number 2858 for i.MX23: > http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf > > All (?) SOCs with an IP from chipidea suffer from this problem. > mv_udc_core fixes this bug by commit daec765. There still may be > unfixed drivers. > > Signed-off-by: Christoph Fritz > Signed-off-by: Christian Hemp > --- > drivers/usb/gadget/fsl_udc_core.c | 15 ++- > 1 files changed, 14 insertions(+), 1 deletions(-) > > diff --git a/drivers/usb/gadget/fsl_udc_core.c > b/drivers/usb/gadget/fsl_udc_core.c > index 55abfb6..72f2139 100644 > --- a/drivers/usb/gadget/fsl_udc_core.c > +++ b/drivers/usb/gadget/fsl_udc_core.c > @@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs; > /* it is initialized in probe() */ > static struct fsl_udc *udc_controller = NULL; > > +static struct ep_td_struct *last_free_td; > + > static const struct usb_endpoint_descriptor > fsl_ep0_desc = { > .bLength = USB_DT_ENDPOINT_SIZE, > @@ -180,8 +182,13 @@ static void done(struct fsl_ep *ep, struct fsl_req > *req, int status) > curr_td = next_td; > if (j != req->dtd_count - 1) { > next_td = curr_td->next_td_virt; > + dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma); > + } else { > + if (last_free_td != NULL) > + dma_pool_free(udc->td_pool, last_free_td, > + last_free_td->td_dma); > + last_free_td = curr_td; > } > - dma_pool_free(udc->td_pool, curr_td, curr_td->td_dma); > } > > if (req->mapped) { > @@ -2579,6 +2586,8 @@ static int __init fsl_udc_probe(struct > platform_device *pdev) > goto err_unregister; > } > > + last_free_td = NULL; > + > ret = usb_add_gadget_udc(&pdev->dev, &udc_controller->gadget); > if (ret) > goto err_del_udc; > @@ -2633,6 +2642,10 @@ static int __exit fsl_udc_remove(struct > platform_device *pdev) > kfree(udc_controller->status_req); > kfree(udc_controller->eps); > > + if (last_free_td != NULL) > + dma_pool_free(udc_controller->td_pool, last_free_td, > + last_free_td->td_dma); > + > dma_pool_destroy(udc_controller->td_pool); > free_irq(udc_controller->irq, udc_controller); > iounmap(dr_regs); Reviewed-by: Peter Chen > -- > 1.7.2.5 > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [patch] hvc_xen: NULL dereference on allocation failure
On Tue, May 15, 2012 at 11:20:23AM +0100, Stefano Stabellini wrote: > On Tue, 15 May 2012, Dan Carpenter wrote: > > If kzalloc() returns a NULL here, we pass a NULL to > > xencons_disconnect_backend() which will cause an Oops. > > > > Also I removed the __GFP_ZERO while I was at it since kzalloc() implies > > __GFP_ZERO. > > > > Signed-off-by: Dan Carpenter > > Acked-by: Stefano Stabellini applied. > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] usb: fsl_udc: errata - postpone freeing current dTD
Hi, On Mon, May 21, 2012 at 08:57:22AM +0200, Christoph Fritz wrote: > USB controller may access a wrong address for the dTD (endpoint transfer > descriptor) and then hang. This happens a lot when doing tests with > g_ether module and iperf, a tool for measuring maximum TCP and UDP > bandwidth. > > This hardware bug is explained in detail by errata number 2858 for i.MX23: > http://cache.freescale.com/files/dsp/doc/errata/IMX23CE.pdf > > All (?) SOCs with an IP from chipidea suffer from this problem. > mv_udc_core fixes this bug by commit daec765. There still may be > unfixed drivers. > > Signed-off-by: Christoph Fritz > Signed-off-by: Christian Hemp > --- > drivers/usb/gadget/fsl_udc_core.c | 15 ++- > 1 files changed, 14 insertions(+), 1 deletions(-) > > diff --git a/drivers/usb/gadget/fsl_udc_core.c > b/drivers/usb/gadget/fsl_udc_core.c > index 55abfb6..72f2139 100644 > --- a/drivers/usb/gadget/fsl_udc_core.c > +++ b/drivers/usb/gadget/fsl_udc_core.c > @@ -65,6 +65,8 @@ static struct usb_sys_interface *usb_sys_regs; > /* it is initialized in probe() */ > static struct fsl_udc *udc_controller = NULL; > > +static struct ep_td_struct *last_free_td; I don't want to see global variables anymore. In fact, please convert this to the new udc_start()/udc_stop() calls and use the generic map/unmap routines. That'll help you get rid of a bunch of useless code on the driver. After that you should remove all header includes and drop the ARCH dependency. You can also drop the big-/little-endian helpers as you can make use of generic writel()/readl() routines. Please make sure these series comes in with enough time to reach v3.6 merge window in about 3 months. You can put this fix together on that series after you drop the global. -- balbi signature.asc Description: Digital signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: Build regressions/improvements in v3.4
On Mon, May 21, 2012 at 9:51 PM, Geert Uytterhoeven wrote: > JFYI, when comparing v3.4 to v3.4-rc7[3], the summaries are: > - build errors: +5/-14 + error: via-pmu-event.c: undefined reference to `.input_allocate_device': => .init.text+0x8aa4) + error: via-pmu-event.c: undefined reference to `.input_free_device': => .init.text+0x8b68) + error: via-pmu-event.c: undefined reference to `.input_register_device': => .init.text+0x8b54) powerpc-randconfig + kernel/fork.c: error: implicit declaration of function 'alloc_task_struct_node' [-Werror=implicit-function-declaration]: => 266:2 + kernel/fork.c: error: implicit declaration of function 'free_task_struct' [-Werror=implicit-function-declaration]: => 174:2 frv-defconfig (ouch) Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
linux-next: PowerPC boot failures in next-20120521
Hi all, Last nights boot tests on various PowerPC systems failed like this: calling .numa_group_init+0x0/0x3c @ 1 initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs calling .numa_init+0x0/0x1dc @ 1 Unable to handle kernel paging request for data at address 0x1688 Faulting instruction address: 0xc016e154 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=32 NUMA pSeries Modules linked in: NIP: c016e154 LR: c01b9140 CTR: REGS: c003fc8c76d0 TRAP: 0300 Not tainted (3.4.0-autokern1) MSR: 80009032 CR: 24044022 XER: 0003 SOFTE: 1 CFAR: 562c DAR: 1688, DSISR: 4000 TASK = c003fc8c8000[1] 'swapper/0' THREAD: c003fc8c4000 CPU: 0 GPR00: c003fc8c7950 c0d05b30 12d0 GPR04: 1680 c003fe032f60 GPR08: 000400540001 c980 c0d24fe0 GPR12: 24044024 cf33b000 01a3fa78 009bac00 GPR16: 00e1f338 02d513f0 1680 GPR20: 0001 c003fc8c7c00 0001 GPR24: 0001 c0d1b490 1680 GPR28: c0c7ce58 c003fe009200 NIP [c016e154] .__alloc_pages_nodemask+0xc4/0x8f0 LR [c01b9140] .new_slab+0xd0/0x3c0 Call Trace: [c003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable) [c003fc8c7ae0] [c01b9140] .new_slab+0xd0/0x3c0 [c003fc8c7b90] [c01b9844] .__slab_alloc+0x254/0x5b0 [c003fc8c7cd0] [c01bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260 [c003fc8c7d80] [c0ba36d0] .numa_init+0x98/0x1dc [c003fc8c7e10] [c000ace4] .do_one_initcall+0x1a4/0x1e0 [c003fc8c7ed0] [c0b7b354] .kernel_init+0x124/0x2e0 [c003fc8c7f90] [c00211c8] .kernel_thread+0x54/0x70 Instruction dump: 5400d97e 7b170020 0b00 eb3e8000 3b80 80190088 2f80 40de0014 7860efe2 787c6fe2 78000fa4 7f9c0378 83f9 2fa0 7fff1838 ---[ end trace 31fd0ba7d8756001 ]--- swapper/0 (1) used greatest stack depth: 10864 bytes left Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b I may be completely wrong, but I guess the obvious target would be the sched/numa branch that came in via the tip tree. Config file attached. I haven't had a chance to try to bisect this yet. Anyone have any ideas? -- Cheers, Stephen Rothwells...@canb.auug.org.au dotconfig.bz2 Description: Binary data pgpBVPIkm2SpZ.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
powerpc -next rebase WARNING
Folks, bad news ... my fault. I accidentally forgot a --signoff on a git am command last week, meaning that a pair of patches are in -next and not signed off by me. For various (legal) reasons that cannot go into Linus tree as-is, so I have to rebase the tree to fix it. Sorry about that ... Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
On Tue, 22 May 2012, Stephen Rothwell wrote: > Unable to handle kernel paging request for data at address 0x1688 > Faulting instruction address: 0xc016e154 > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=32 NUMA pSeries > Modules linked in: > NIP: c016e154 LR: c01b9140 CTR: > REGS: c003fc8c76d0 TRAP: 0300 Not tainted (3.4.0-autokern1) > MSR: 80009032 CR: 24044022 XER: 0003 > SOFTE: 1 > CFAR: 562c > DAR: 1688, DSISR: 4000 > TASK = c003fc8c8000[1] 'swapper/0' THREAD: c003fc8c4000 CPU: 0 > GPR00: c003fc8c7950 c0d05b30 12d0 > GPR04: 1680 c003fe032f60 > GPR08: 000400540001 c980 c0d24fe0 > GPR12: 24044024 cf33b000 01a3fa78 009bac00 > GPR16: 00e1f338 02d513f0 1680 > GPR20: 0001 c003fc8c7c00 0001 > GPR24: 0001 c0d1b490 1680 > GPR28: c0c7ce58 c003fe009200 > NIP [c016e154] .__alloc_pages_nodemask+0xc4/0x8f0 > LR [c01b9140] .new_slab+0xd0/0x3c0 > Call Trace: > [c003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable) > [c003fc8c7ae0] [c01b9140] .new_slab+0xd0/0x3c0 > [c003fc8c7b90] [c01b9844] .__slab_alloc+0x254/0x5b0 > [c003fc8c7cd0] [c01bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260 > [c003fc8c7d80] [c0ba36d0] .numa_init+0x98/0x1dc > [c003fc8c7e10] [c000ace4] .do_one_initcall+0x1a4/0x1e0 > [c003fc8c7ed0] [c0b7b354] .kernel_init+0x124/0x2e0 > [c003fc8c7f90] [c00211c8] .kernel_thread+0x54/0x70 > Instruction dump: > 5400d97e 7b170020 0b00 eb3e8000 3b80 80190088 2f80 40de0014 > 7860efe2 787c6fe2 78000fa4 7f9c0378 83f9 2fa0 7fff1838 > ---[ end trace 31fd0ba7d8756001 ]--- > > swapper/0 (1) used greatest stack depth: 10864 bytes left > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > I may be completely wrong, but I guess the obvious target would be the > sched/numa branch that came in via the tip tree. > > Config file attached. I haven't had a chance to try to bisect this yet. > > Anyone have any ideas? Yeah, it's sched/numa since that's what introduced numa_init(). It does for_each_node() for each node and does a kmalloc_node() even though that node may not be online. Slub ends up passing this node to the page allocator through alloc_pages_exact_node(). CONFIG_DEBUG_VM would have caught this and your config confirms its not enabled. sched/numa either needs a memory hotplug notifier or it needs to pass NUMA_NO_NODE for nodes that aren't online. Until we get the former, the following should fix it. sched, numa: Allocate node_queue on any node for offline nodes struct node_queue must be allocated with NUMA_NO_NODE for nodes that are not (yet) online, otherwise the page allocator has a bad zonelist. Signed-off-by: David Rientjes --- diff --git a/kernel/sched/numa.c b/kernel/sched/numa.c --- a/kernel/sched/numa.c +++ b/kernel/sched/numa.c @@ -885,7 +885,8 @@ static __init int numa_init(void) for_each_node(node) { struct node_queue *nq = kmalloc_node(sizeof(*nq), - GFP_KERNEL | __GFP_ZERO, node); + GFP_KERNEL | __GFP_ZERO, + node_online(node) ? node : NUMA_NO_NODE); BUG_ON(!nq); spin_lock_init(&nq->lock); ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
> Hi all, > > Last nights boot tests on various PowerPC systems failed like this: > > calling .numa_group_init+0x0/0x3c @ 1 > initcall .numa_group_init+0x0/0x3c returned 0 after 0 usecs > calling .numa_init+0x0/0x1dc @ 1 > Unable to handle kernel paging request for data at address 0x1688 > Faulting instruction address: 0xc016e154 > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=32 NUMA pSeries > Modules linked in: > NIP: c016e154 LR: c01b9140 CTR: > REGS: c003fc8c76d0 TRAP: 0300 Not tainted (3.4.0-autokern1) > MSR: 80009032 CR: 24044022 XER: 0003 > SOFTE: 1 > CFAR: 562c > DAR: 1688, DSISR: 4000 > TASK = c003fc8c8000[1] 'swapper/0' THREAD: c003fc8c4000 CPU: 0 > GPR00: c003fc8c7950 c0d05b30 12d0 > GPR04: 1680 c003fe032f60 > GPR08: 000400540001 c980 c0d24fe0 > GPR12: 24044024 cf33b000 01a3fa78 009bac00 > GPR16: 00e1f338 02d513f0 1680 > GPR20: 0001 c003fc8c7c00 0001 > GPR24: 0001 c0d1b490 1680 > GPR28: c0c7ce58 c003fe009200 > NIP [c016e154] .__alloc_pages_nodemask+0xc4/0x8f0 > LR [c01b9140] .new_slab+0xd0/0x3c0 > Call Trace: > [c003fc8c7950] [2e6e756d615f696e] 0x2e6e756d615f696e (unreliable) > [c003fc8c7ae0] [c01b9140] .new_slab+0xd0/0x3c0 > [c003fc8c7b90] [c01b9844] .__slab_alloc+0x254/0x5b0 > [c003fc8c7cd0] [c01bb7a4] .kmem_cache_alloc_node_trace+0x94/0x260 > [c003fc8c7d80] [c0ba36d0] .numa_init+0x98/0x1dc > [c003fc8c7e10] [c000ace4] .do_one_initcall+0x1a4/0x1e0 > [c003fc8c7ed0] [c0b7b354] .kernel_init+0x124/0x2e0 > [c003fc8c7f90] [c00211c8] .kernel_thread+0x54/0x70 > Instruction dump: > 5400d97e 7b170020 0b00 eb3e8000 3b80 80190088 2f80 40de0014 > 7860efe2 787c6fe2 78000fa4 7f9c0378 83f9 2fa0 7fff1838 > ---[ end trace 31fd0ba7d8756001 ]--- > > swapper/0 (1) used greatest stack depth: 10864 bytes left > Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b > > I may be completely wrong, but I guess the obvious target would be the > sched/numa branch that came in via the tip tree. > > Config file attached. I haven't had a chance to try to bisect this yet. > > Anyone have any ideas? I'm getting similar here: console [tty0] enabled console [hvc0] enabled pid_max: default: 32768 minimum: 301 Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes) Mount-cache hash table entries: 4096 Initializing cgroup subsys cpuacct Initializing cgroup subsys devices Initializing cgroup subsys freezer POWER7 performance monitor hardware support registered Unable to handle kernel paging request for data at address 0x1388 Faulting instruction address: 0xc014a070 Oops: Kernel access of bad area, sig: 11 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: NIP: c014a070 LR: c01978cc CTR: c00b6870 REGS: c0007e5836b0 TRAP: 0300 Tainted: GW (3.4.0-rc6-mikey) MSR: 90009032 CR: 28004022 XER: 0200 SOFTE: 1 CFAR: 50fc DAR: 1388, DSISR: 4000 TASK = c0007e56[1] 'swapper/0' THREAD: c0007e58 CPU: 0 GPR00: c0007e583930 c0c034d8 12d0 GPR04: 1380 0001 GPR08: c0007e0dff60 c0ca05a0 GPR12: 28004024 cff2 GPR16: 0001 1380 GPR20: 0001 c0e14900 c0e148f0 0001 GPR24: c0c6f378 1380 02aa GPR28: c0b576b0 c0007e021200 NIP [c014a070] .__alloc_pages_nodemask+0xd0/0x910 LR [c01978cc] .new_slab+0xcc/0x3d0 Call Trace: [c0007e583930] [c0007e5839c0] 0xc0007e5839c0 (unreliable) [c0007e583ac0] [c01978cc] .new_slab+0xcc/0x3d0 [c0007e583b70] [c072ae98] .__slab_alloc+0x38c/0x4f8 [c0007e583cb0] [c0198190] .kmem_cache_alloc_node_trace+0x90/0x260 [c0007e583d60] [c0a5a404] .numa_init+0x9c/0x188 [c0007e583e00] [c000aa30] .do_one_initcall+0x60/0x1e0 [c0007e583ec0] [c0a40b60] .kernel_init+0x128/0x294 [c0007e583f90] [c0020788] .kernel_thread+0x54/0x70 Instruction dump: 0b00 eb1e8000 3b80 801800a8 2f80 409e001c 7860efe3 3800 41820008 3802 787c6fe2 7f9c0
Re: linux-next: PowerPC boot failures in next-20120521
On Tue, 22 May 2012, Michael Neuling wrote: > console [tty0] enabled > console [hvc0] enabled > pid_max: default: 32768 minimum: 301 > Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes) > Mount-cache hash table entries: 4096 > Initializing cgroup subsys cpuacct > Initializing cgroup subsys devices > Initializing cgroup subsys freezer > POWER7 performance monitor hardware support registered > Unable to handle kernel paging request for data at address 0x1388 > Faulting instruction address: 0xc014a070 > Oops: Kernel access of bad area, sig: 11 [#1] > SMP NR_CPUS=1024 NUMA pSeries > Modules linked in: > NIP: c014a070 LR: c01978cc CTR: c00b6870 > REGS: c0007e5836b0 TRAP: 0300 Tainted: GW (3.4.0-rc6-mikey) > MSR: 90009032 CR: 28004022 XER: 0200 > SOFTE: 1 > CFAR: 50fc > DAR: 1388, DSISR: 4000 > TASK = c0007e56[1] 'swapper/0' THREAD: c0007e58 CPU: 0 > GPR00: c0007e583930 c0c034d8 12d0 > GPR04: 1380 0001 > GPR08: c0007e0dff60 c0ca05a0 > GPR12: 28004024 cff2 > GPR16: 0001 1380 > GPR20: 0001 c0e14900 c0e148f0 0001 > GPR24: c0c6f378 1380 02aa > GPR28: c0b576b0 c0007e021200 > NIP [c014a070] .__alloc_pages_nodemask+0xd0/0x910 > LR [c01978cc] .new_slab+0xcc/0x3d0 > Call Trace: > [c0007e583930] [c0007e5839c0] 0xc0007e5839c0 (unreliable) > [c0007e583ac0] [c01978cc] .new_slab+0xcc/0x3d0 > [c0007e583b70] [c072ae98] .__slab_alloc+0x38c/0x4f8 > [c0007e583cb0] [c0198190] .kmem_cache_alloc_node_trace+0x90/0x260 > [c0007e583d60] [c0a5a404] .numa_init+0x9c/0x188 > [c0007e583e00] [c000aa30] .do_one_initcall+0x60/0x1e0 > [c0007e583ec0] [c0a40b60] .kernel_init+0x128/0x294 > [c0007e583f90] [c0020788] .kernel_thread+0x54/0x70 > Instruction dump: > 0b00 eb1e8000 3b80 801800a8 2f80 409e001c 7860efe3 3800 > 41820008 3802 787c6fe2 7f9c0378 801800a4 3b60 2fa9 > ---[ end trace 31fd0ba7d8756002 ]--- > > Which seems to be this code in __alloc_pages_nodemask > --- > /* > * Check the zones suitable for the gfp_mask contain at least one > * valid zone. It's possible to have an empty zonelist as a result > * of GFP_THISNODE and a memoryless node > */ > if (unlikely(!zonelist->_zonerefs->zone)) > c014a070: e9 3a 00 08 ld r9,8(r26) > --- > > r26 is coming from r5 which is the struct zonelist *zonelist parameter > to __alloc_pages_nodemask. Having 1380 in there is clearly > a bogus pointer. > > Bisecting it points to b4cdf91668c27a5a6a5a3ed4234756c042dd8288 > b4cdf91 sched/numa: Implement numa balancer > > Trying David's patch just posted doesn't fix it. > Hmm, what does CONFIG_DEBUG_VM say? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
> > Trying David's patch just posted doesn't fix it. > > > > Hmm, what does CONFIG_DEBUG_VM say? No set. Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
Michael Neuling wrote: > > > Trying David's patch just posted doesn't fix it. > > > > > > > Hmm, what does CONFIG_DEBUG_VM say? > > No set. Sorry, should have read "Not set" mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
On Tue, 22 May 2012, Michael Neuling wrote: > > > > Trying David's patch just posted doesn't fix it. > > > > > > > > > > Hmm, what does CONFIG_DEBUG_VM say? > > > > No set. > > Sorry, should have read "Not set" > I mean if it's set, what does it emit to the kernel log with my patch applied? I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I was thinking it would have caught this if either you or Stephen enable it. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
David Rientjes wrote: > On Tue, 22 May 2012, Michael Neuling wrote: > > > > > > Trying David's patch just posted doesn't fix it. > > > > > > > > > > > > > Hmm, what does CONFIG_DEBUG_VM say? > > > > > > No set. > > > > Sorry, should have read "Not set" > > > > I mean if it's set, what does it emit to the kernel log with my patch > applied? > > I made CONFIG_DEBUG_VM catch !node_online(node) about six months ago, so I > was thinking it would have caught this if either you or Stephen enable it. Sorry, got it... CONFIG_DEBUG_VM enabled below... pid_max: default: 32768 minimum: 301 Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes) Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes) Mount-cache hash table entries: 4096 Initializing cgroup subsys cpuacct Initializing cgroup subsys devices Initializing cgroup subsys freezer POWER7 performance monitor hardware support registered [ cut here ] kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: NIP: c0199164 LR: c01993e0 CTR: c00b6b70 REGS: c0007e583830 TRAP: 0700 Tainted: GW (3.4.0-rc6-mikey) MSR: 90029032 CR: 28004028 XER: 0200 SOFTE: 1 CFAR: c01993c4 TASK = c0007e56[1] 'swapper/0' THREAD: c0007e58 CPU: 0 GPR00: 0001 c0007e583ab0 c0c035a0 12d0 GPR04: 0001 c0e14900 000505550001 GPR08: 0001 12d0 c0c6f398 0001 GPR12: 28004022 cff2 GPR16: 1380 GPR20: 0001 c0e14900 c0e148f0 00210d00 GPR24: 0001 00d0 02aa GPR28: 00d0 0001 c0b58fc8 c0007e021200 NIP [c0199164] .new_slab+0xb4/0x440 LR [c01993e0] .new_slab+0x330/0x440 Call Trace: [c0007e583ab0] [c01993e0] .new_slab+0x330/0x440 (unreliable) [c0007e583b60] [c072ce84] .__slab_alloc+0x3bc/0x52c [c0007e583ca0] [c0199b08] .kmem_cache_alloc_node_trace+0x98/0x280 [c0007e583d60] [c0a5a440] .numa_init+0x9c/0x188 [c0007e583e00] [c000aa30] .do_one_initcall+0x60/0x1e0 [c0007e583ec0] [c0a40b60] .kernel_init+0x128/0x294 [c0007e583f90] [c0020788] .kernel_thread+0x54/0x70 Instruction dump: 7b5b8402 7f6407b4 7c1ce378 7d29e038 7b990020 61291200 79230020 419202b8 2b9d00ff 78840020 3801 409d0240 <0b00> e95e8140 792977e2 7bab1f24 ---[ end trace 31fd0ba7d8756002 ]--- ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
On Tue, 22 May 2012, Michael Neuling wrote: > Sorry, got it... CONFIG_DEBUG_VM enabled below... > > pid_max: default: 32768 minimum: 301 > Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes) > Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes) > Mount-cache hash table entries: 4096 > Initializing cgroup subsys cpuacct > Initializing cgroup subsys devices > Initializing cgroup subsys freezer > POWER7 performance monitor hardware support registered > [ cut here ] > kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318! Yeah, this is what I was expecting, it's tripping on VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); and slub won't pass nid < 0. You're sure my patch is applied? :) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
Hi David, On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes wrote: > > Yeah, it's sched/numa since that's what introduced numa_init(). It does > for_each_node() for each node and does a kmalloc_node() even though that > node may not be online. Slub ends up passing this node to the page > allocator through alloc_pages_exact_node(). CONFIG_DEBUG_VM would have > caught this and your config confirms its not enabled. > > sched/numa either needs a memory hotplug notifier or it needs to pass > NUMA_NO_NODE for nodes that aren't online. Until we get the former, the > following should fix it. > > > sched, numa: Allocate node_queue on any node for offline nodes > > struct node_queue must be allocated with NUMA_NO_NODE for nodes that are > not (yet) online, otherwise the page allocator has a bad zonelist. > > Signed-off-by: David Rientjes Thanks, that fixes it. Tested-by: Stephen Rothwell -- Cheers, Stephen Rothwells...@canb.auug.org.au pgpIKSOMWWL04.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
David Rientjes wrote: > On Tue, 22 May 2012, Michael Neuling wrote: > > > Sorry, got it... CONFIG_DEBUG_VM enabled below... > > > > pid_max: default: 32768 minimum: 301 > > Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes) > > Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes) > > Mount-cache hash table entries: 4096 > > Initializing cgroup subsys cpuacct > > Initializing cgroup subsys devices > > Initializing cgroup subsys freezer > > POWER7 performance monitor hardware support registered > > [ cut here ] > > kernel BUG at /scratch/mikey/src/linux-next/include/linux/gfp.h:318! > > Yeah, this is what I was expecting, it's tripping on > > VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); > > and slub won't pass nid < 0. You're sure my patch is applied? :) I did have your patch applied but at "b4cdf91 sched/numa: Implement numa balancer" (where git bisect spotted the fail). If I apply your patch on the full next-20120521 it does fix the problem. Sorry for the confusion. Thanks! Mikey ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: linux-next: PowerPC boot failures in next-20120521
On Tue, 22 May 2012 13:03:54 +1000 Stephen Rothwell wrote: > > On Mon, 21 May 2012 18:53:37 -0700 (PDT) David Rientjes > wrote: > > > > Yeah, it's sched/numa since that's what introduced numa_init(). It does > > for_each_node() for each node and does a kmalloc_node() even though that > > node may not be online. Slub ends up passing this node to the page > > allocator through alloc_pages_exact_node(). CONFIG_DEBUG_VM would have > > caught this and your config confirms its not enabled. > > > > sched/numa either needs a memory hotplug notifier or it needs to pass > > NUMA_NO_NODE for nodes that aren't online. Until we get the former, the > > following should fix it. > > > > > > sched, numa: Allocate node_queue on any node for offline nodes > > > > struct node_queue must be allocated with NUMA_NO_NODE for nodes that are > > not (yet) online, otherwise the page allocator has a bad zonelist. > > > > Signed-off-by: David Rientjes > > Thanks, that fixes it. > > Tested-by: Stephen Rothwell And I will put that patch in linux-next until it (or something better) appears. -- Cheers, Stephen Rothwells...@canb.auug.org.au pgplC0VqGASvZ.pgp Description: PGP signature ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc -next rebase WARNING
On Tue, 2012-05-22 at 11:51 +1000, Benjamin Herrenschmidt wrote: > Folks, bad news ... my fault. > > I accidentally forgot a --signoff on a git am command last week, meaning > that a pair of patches are in -next and not signed off by me. > > For various (legal) reasons that cannot go into Linus tree as-is, so I > have to rebase the tree to fix it. > > Sorry about that ... Note that the rebase only affects the top 3 commits, so if your tree is based on something older you're fine (Kumar, you seem to be ok, I haven't checked Josh). Cheers, Ben. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev