On Thu, Apr 18, 2013 at 02:51:43PM -0700, Carl Shapiro wrote: > On Wed, Apr 17, 2013 at 1:21 AM, Konstantin Belousov > <kostik...@gmail.com>wrote: > > > Did you ensured with e.g. ktrace and procstat -v that your assumptions > > hold, i.e. the addresses supplied as wait4(2) arguments are valid ? > > Please provide the minimal test case demonstrating the behaviour. > > > > Yes. I instrumented my code to check for a wait4 failure, print the > addresses of the status and rusage arguments, and dump the contents of > /proc/curproc/map. The addresses of the status and rusage arguments are > always in the range of a mapping and marked as read write. It would be of some interest to see the evidence.
Is your code multithreaded ? > > I have yet to distill the failure to a minimal test case. The test case I > do have is the test harness for the Go language. After running for about > 45 minutes I can observe a failure. I have been working to produce > something smaller and faster. The test case is required to decide whether the bug is in the application or in the OS. > > > > MADV_FREE should only result in the possible lost of the previous > > content of the page, not in the faulting of the page access. From the > > inspection of the code, I do not see how MADV_FREE could result in > > the memory address becoming invalid. > > > > I see. What has lead us to believe this might be an issue with page faults > is that writing zeroes to the page with memset before passing it to wait4 > makes the error go away. There is no difference in the access performed by copyout vs. access caused by the usermode write. > > Do you have any advice about how one might go about instrumenting wait4 to > generate more information about a failed copyout? Are tools such as dtrace > useful in these situations or might it be too invasive? Because of the > protracted test cycle and my lack of knowledge in this area, conducting > experiments is quite painful at the moment. No, I cannot give an advice, I think we should first decide which code to blame. BTW, you could try enabling sysctl machdep.uprintf_signal. Oh, you did not specified the architecture and version of the system.
pgp0WoI0ucOnE.pgp
Description: PGP signature