On Mon, Feb 25, 2013 at 11:36:03PM +1300, Andrew Turner wrote: > On Mon, 25 Feb 2013 10:50:19 +0200 > Konstantin Belousov <kostik...@gmail.com> wrote: > > > On Mon, Feb 25, 2013 at 08:13:13PM +1300, Andrew Turner wrote: > > > On Thu, 21 Feb 2013 19:02:50 +0000 (UTC) > > > John Baldwin <j...@freebsd.org> wrote: > > > > > > > Author: jhb > > > > Date: Thu Feb 21 19:02:50 2013 > > > > New Revision: 247116 > > > > URL: http://svnweb.freebsd.org/changeset/base/247116 > > > > > > > > Log: > > > > Further refine the handling of stop signals in the NFS client. > > > > The changes in r246417 were incomplete as they did not add > > > > explicit calls to sigdeferstop() around all the places that > > > > previously passed SBDRY to _sleep(). In addition, > > > > nfs_getcacheblk() could trigger a write RPC from getblk() > > > > resulting in sigdeferstop() recursing. Rather than manually > > > > deferring stop signals in specific places, change the VFS_*() and > > > > VOP_*() methods to defer stop signals for filesystems which > > > > request this behavior via a new VFCF_SBDRY flag. Note that this > > > > has to be a VFC flag rather than a MNTK flag so that it works > > > > properly with VFS_MOUNT() when the mount is not yet fully > > > > constructed. For now, only the NFS clients are set this new flag > > > > in VFS_SET(). A few other related changes: > > > > - Add an assertion to ensure that TDF_SBDRY doesn't leak to > > > > userland. > > > > - When a lookup request uses VOP_READLINK() to follow a symlink, > > > > mark the request as being on behalf of the thread performing the > > > > lookup (cnp_thread) rather than using a NULL thread pointer. This > > > > causes NFS to properly handle signals during this VOP on an > > > > interruptible mount. > > > > > > > > PR: kern/176179 > > > > Reported by: Russell Cattelan (sigdeferstop() recursion) > > > > Reviewed by: kib > > > > MFC after: 1 month > > > > > > This change is causing init to crash for me on armv6. I'm > > > netbooting a PandaBoard and it appears init is receiving a SIGABRT > > > before it gets into main(). > > > > > > Do you have any idea where I could look to track down why it is > > > doing this? > > > > It is weird. SIGABRT sent by the kernel usually means that execve(2) > > already destroyed the previous address space of the process, but the > > new image cannot be activated, most likely due to image format error > > discovered too late, or resource shortage. > > > > Could it be that some NFS RPC fails after the patch, but I cannot > > imagine why. You would need to track this. Also, verify that the init > > binary is correct. > > > > I tried amd64 netboot, and it worked fine. > > It looks like this change is not the issue, it just changed the > symptom enough for me to not realise I was seeing an issue where > it would crash the kernel before. I reinstated this change but only > allowed the kernel to access half the memory and it booted correctly. > > The real issue appears to be related to something in the vm layer not > working on ARM boards with too much memory (somewhere between 512MiB > and 1GiB).
Hm, do you have r246926, r246929 and r247046 ?
pgpPBEU4xWySk.pgp
Description: PGP signature