This is off-topic for performance-discuss.  You might consider posing
this question to the Observability Community, they're the maintainers of
libproc.

This is their webpage:

http://www.opensolaris.org/os/community/observability/

Mailing list info is here:

http://mail.opensolaris.org/mailman/listinfo/observability-discuss

-j

On Fri, Oct 26, 2007 at 11:18:52AM -0700, Alexandra (Sasha) Fedorova wrote:
> Hello! Here is a riddle for you :)
> 
> We are experiencing a strange problem when using Psyscall:
> 
> We have a MASTER process that grabs a SLAVE process in order to monitor 
> hardware counters on behalf of the SLAVE. It does so via the following 
> commands:
> 
> /* Initialization */
> 1.    Grab the SLAVE process: pctx_capture()
> 2.    Set up SLAVE to count its instructions: cpc_bind_pctx + other cpc 
> library init routines
> 3.    Setting MASTER to detect when SLAVE stops: write_cm(pid, PCSTRACE, 
> NULL); (SLAVE will stop on SIGEMT signal, which will be thrown when the 
> instruction counter in the slave overflows)
> 4.    Setting the SLAVE to run: write_cm(pid, PCRUN, NULL)
> 
> Now, when SLAVE?s instruction counter overflow it is stopped, MASTER detects 
> this. Then the following happens:
> 
> /* Control loop */
> 1.    MASTER calls Psyscall on the SLAVE
> 2.    Psyscall is set up to call cpc_request_preset() and cpc_set_restart() 
> in the SLAVE
> 3.    MASTER sets SLAVE running again: write_cm(pid, PCRUN, NULL)
> 4.    SLAVE runs until its instruction counter overflows again, at which 
> point the sequence executed in the control loop repeats.
> 
> Control loop usually executes successfully a dozen times. After that, SLAVE 
> either crashes with SIGFAULT or exits prematurely. We don?t know why. When 
> SLAVE runs by itself, it never crashes (it?s a simple program that adds up a 
> bunch of number in the loop.) So we figure this is due to MASTER messing with 
> SLAVE. Core file does not give us much information.
> 
> We tried substituting item #2 in control loop with a routine that 
> re-intializes the hardware counters from scratch. The result was the same: 
> SLAVE crashes or exits. This is little wonder: counter are re-initialized by 
> calling cpc_bind_pctx, and cpc_bind_pctx calls Psyscall! 
> 
> So we blame the crash on the fact that we use Psyscall.
> 
> Any ideas how we might approach debugging this? Are we doing something 
> illegal?
> 
> Thank you.
>  
>  
> This message posted from opensolaris.org
> _______________________________________________
> perf-discuss mailing list
> perf-discuss@opensolaris.org
_______________________________________________
perf-discuss mailing list
perf-discuss@opensolaris.org

Reply via email to