Hello! Here is a riddle for you :) We are experiencing a strange problem when using Psyscall:
We have a MASTER process that grabs a SLAVE process in order to monitor hardware counters on behalf of the SLAVE. It does so via the following commands: /* Initialization */ 1. Grab the SLAVE process: pctx_capture() 2. Set up SLAVE to count its instructions: cpc_bind_pctx + other cpc library init routines 3. Setting MASTER to detect when SLAVE stops: write_cm(pid, PCSTRACE, NULL); (SLAVE will stop on SIGEMT signal, which will be thrown when the instruction counter in the slave overflows) 4. Setting the SLAVE to run: write_cm(pid, PCRUN, NULL) Now, when SLAVE’s instruction counter overflow it is stopped, MASTER detects this. Then the following happens: /* Control loop */ 1. MASTER calls Psyscall on the SLAVE 2. Psyscall is set up to call cpc_request_preset() and cpc_set_restart() in the SLAVE 3. MASTER sets SLAVE running again: write_cm(pid, PCRUN, NULL) 4. SLAVE runs until its instruction counter overflows again, at which point the sequence executed in the control loop repeats. Control loop usually executes successfully a dozen times. After that, SLAVE either crashes with SIGFAULT or exits prematurely. We don’t know why. When SLAVE runs by itself, it never crashes (it’s a simple program that adds up a bunch of number in the loop.) So we figure this is due to MASTER messing with SLAVE. Core file does not give us much information. We tried substituting item #2 in control loop with a routine that re-intializes the hardware counters from scratch. The result was the same: SLAVE crashes or exits. This is little wonder: counter are re-initialized by calling cpc_bind_pctx, and cpc_bind_pctx calls Psyscall! So we blame the crash on the fact that we use Psyscall. Any ideas how we might approach debugging this? Are we doing something illegal? Thank you. This message posted from opensolaris.org _______________________________________________ perf-discuss mailing list perf-discuss@opensolaris.org