This is off-topic for performance-discuss. You might consider posing this question to the Observability Community, they're the maintainers of libproc.
This is their webpage: http://www.opensolaris.org/os/community/observability/ Mailing list info is here: http://mail.opensolaris.org/mailman/listinfo/observability-discuss -j On Fri, Oct 26, 2007 at 11:18:52AM -0700, Alexandra (Sasha) Fedorova wrote: > Hello! Here is a riddle for you :) > > We are experiencing a strange problem when using Psyscall: > > We have a MASTER process that grabs a SLAVE process in order to monitor > hardware counters on behalf of the SLAVE. It does so via the following > commands: > > /* Initialization */ > 1. Grab the SLAVE process: pctx_capture() > 2. Set up SLAVE to count its instructions: cpc_bind_pctx + other cpc > library init routines > 3. Setting MASTER to detect when SLAVE stops: write_cm(pid, PCSTRACE, > NULL); (SLAVE will stop on SIGEMT signal, which will be thrown when the > instruction counter in the slave overflows) > 4. Setting the SLAVE to run: write_cm(pid, PCRUN, NULL) > > Now, when SLAVE?s instruction counter overflow it is stopped, MASTER detects > this. Then the following happens: > > /* Control loop */ > 1. MASTER calls Psyscall on the SLAVE > 2. Psyscall is set up to call cpc_request_preset() and cpc_set_restart() > in the SLAVE > 3. MASTER sets SLAVE running again: write_cm(pid, PCRUN, NULL) > 4. SLAVE runs until its instruction counter overflows again, at which > point the sequence executed in the control loop repeats. > > Control loop usually executes successfully a dozen times. After that, SLAVE > either crashes with SIGFAULT or exits prematurely. We don?t know why. When > SLAVE runs by itself, it never crashes (it?s a simple program that adds up a > bunch of number in the loop.) So we figure this is due to MASTER messing with > SLAVE. Core file does not give us much information. > > We tried substituting item #2 in control loop with a routine that > re-intializes the hardware counters from scratch. The result was the same: > SLAVE crashes or exits. This is little wonder: counter are re-initialized by > calling cpc_bind_pctx, and cpc_bind_pctx calls Psyscall! > > So we blame the crash on the fact that we use Psyscall. > > Any ideas how we might approach debugging this? Are we doing something > illegal? > > Thank you. > > > This message posted from opensolaris.org > _______________________________________________ > perf-discuss mailing list > perf-discuss@opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org