On 02/24/2017 10:54 PM, David Laight wrote: > From: Hamish Martin >> Sent: 24 February 2017 00:52 >> This patch series adds the ability to configure the THREAD_SHIFT value and >> thereby alter the stack size on powerpc systems. We are particularly >> interested >> in configuring for a 32k stack on PPC64. >> >> Using an NXP T2081 (e6500 PPC64 cores) we are observing stack overflows as a >> result of applying a DTS overlay containing some I2C devices. Our scenario is >> an ethernet switch chassis with plug-in cards. The I2C is driven from the >> T2081 >> through a PCA9548 mux on the main board. When we detect insertion of the >> plugin >> card we schedule work for a call to of_overlay_create() to install a DTS >> overlay for the plugin board. This DTS overlay contains a further PCA9548 mux >> with more devices hanging off it including a PCA9539 GPIO expander. The >> ultimate installed I2C tree is: >> >> T2081 --- PCA9548 MUX --- PCA9548 MUX --- PCA9539 GPIO Expander >> >> When we install the overlay the devices described in the overlay are probed >> and >> we see a large number of stack frames used as a result. If this is coupled >> with >> an interrupt happening that requires moderate to high stack use we observe >> stack corruption. Here is an example long stack (from a 4.10-rc8 kernel) that >> does not show corruption but does demonstrate the length and frame sizes >> involved. > ... > > ISTM that the device probe needs to be iterative rather than recursive so > that deeply nested buses don't require deep stacks. > > Switching stacks on interrupt entry would also make it much less likely that > you'll get an unexpected stack corruption. > > The problem with just doubling the stack size is that code will just eat > it all up and, in a few years, something will hit the limit again. > > David > > Thanks for your comments David.
Yes, restructuring bus/device probing would be a better solution in the longer term. It's what Ben H was alluding to in his reply to v1 of this patch series. I do suspect that would be a fairly large undertaking across the kernel drivers that may expose numerous bugs. However, I still feel this series has validity given the inequity between PPC64 and PPC32 minimum stack frame overhead. Flipping it on its head, would anyone even consider a patch that sought to make things more equivalent between PPC32 and PPC64 where the patch involved reducing the PPC32 stack from 8k to something like 6k? I think not. And of course, finally, this is user selectable and we don't seek to modify the current default behaviour. Cheers, Hamish M.