From: sourab gupta [mailto:sourabgu...@gmail.com] Sent: Wednesday, August 2, 2017 8:17 AM To: Landwerlin, Lionel G <lionel.g.landwer...@intel.com> Cc: Kamble, Sagar A <sagar.a.kam...@intel.com>; intel-gfx@lists.freedesktop.org; Sourab Gupta <sourab.gu...@intel.com> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.
On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <lionel.g.landwer...@intel.com<mailto:lionel.g.landwer...@intel.com>> wrote: On 01/08/17 19:05, sourab gupta wrote: On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kam...@intel.com<mailto:sagar.a.kam...@intel.com>> wrote: -----Original Message----- From: Landwerlin, Lionel G Sent: Monday, July 31, 2017 9:16 PM To: Kamble, Sagar A <sagar.a.kam...@intel.com<mailto:sagar.a.kam...@intel.com>>; intel-gfx@lists.freedesktop.org<mailto:intel-gfx@lists.freedesktop.org> Cc: Sourab Gupta <sourab.gu...@intel.com<mailto:sourab.gu...@intel.com>> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info. On 31/07/17 08:59, Sagar Arun Kamble wrote: > From: Sourab Gupta <sourab.gu...@intel.com<mailto:sourab.gu...@intel.com>> > > This patch introduces a framework to capture OA counter reports associated > with Render command stream. We can then associate the reports captured > through this mechanism with their corresponding context id's. This can be > further extended to associate any other metadata information with the > corresponding samples (since the association with Render command stream > gives us the ability to capture these information while inserting the > corresponding capture commands into the command stream). > > The OA reports generated in this way are associated with a corresponding > workload, and thus can be used the delimit the workload (i.e. sample the > counters at the workload boundaries), within an ongoing stream of periodic > counter snapshots. > > There may be usecases wherein we need more than periodic OA capture mode > which is supported currently. This mode is primarily used for two usecases: > - Ability to capture system wide metrics, alongwith the ability to map > the reports back to individual contexts (particularly for HSW). > - Ability to inject tags for work, into the reports. This provides > visibility into the multiple stages of work within single context. > > The userspace will be able to distinguish between the periodic and CS based > OA reports by the virtue of source_info sample field. > > The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA > counters, and is inserted at BB boundaries. > The data thus captured will be stored in a separate buffer, which will > be different from the buffer used otherwise for periodic OA capture mode. > The metadata information pertaining to snapshot is maintained in a list, > which also has offsets into the gem buffer object per captured snapshot. > In order to track whether the gpu has completed processing the node, > a field pertaining to corresponding gem request is added, which is tracked > for completion of the command. > > Both periodic and CS based reports are associated with a single stream > (corresponding to render engine), and it is expected to have the samples > in the sequential order according to their timestamps. Now, since these > reports are collected in separate buffers, these are merge sorted at the > time of forwarding to userspace during the read call. > > v2: Aligning with the non-perf interface (custom drm ioctl based). Also, > few related patches are squashed together for better readability > > v3: Updated perf sample capture emit hook name. Reserving space upfront > in the ring for emitting sample capture commands and using > req->fence.seqno for tracking samples. Added SRCU protection for streams. > Changed the stream last_request tracking to resv object. (Chris) > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved > stream to global per-engine structure. (Sagar) > Update unpin and put in the free routines to i915_vma_unpin_and_release. > Making use of perf stream cs_buffer vma resv instead of separate resv obj. > Pruned perf stream vma resv during gem_idle. (Chris) > Changed payload field ctx_id to u64 to keep all sample data aligned at 8 > bytes. (Lionel) > stall/flush prior to sample capture is not added. Do we need to give this > control to user to select whether to stall/flush at each sample? > > Signed-off-by: Sourab Gupta > <sourab.gu...@intel.com<mailto:sourab.gu...@intel.com>> > Signed-off-by: Robert Bragg > <rob...@sixbynine.org<mailto:rob...@sixbynine.org>> > Signed-off-by: Sagar Arun Kamble > <sagar.a.kam...@intel.com<mailto:sagar.a.kam...@intel.com>> > --- > drivers/gpu/drm/i915/i915_drv.h | 101 ++- > drivers/gpu/drm/i915/i915_gem.c | 1 + > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 + > drivers/gpu/drm/i915/i915_perf.c | 1185 > ++++++++++++++++++++++------ > drivers/gpu/drm/i915/intel_engine_cs.c | 4 + > drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + > drivers/gpu/drm/i915/intel_ringbuffer.h | 5 + > include/uapi/drm/i915_drm.h | 15 + > 8 files changed, 1073 insertions(+), 248 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 2c7456f..8b1cecf 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops { > * The stream will always be disabled before this is called. > */ > void (*destroy)(struct i915_perf_stream *stream); > + > + /* > + * @emit_sample_capture: Emit the commands in the command streamer > + * for a particular gpu engine. > + * > + * The commands are inserted to capture the perf sample data at > + * specific points during workload execution, such as before and after > + * the batch buffer. > + */ > + void (*emit_sample_capture)(struct i915_perf_stream *stream, > + struct drm_i915_gem_request *request, > + bool preallocate); > +}; > + It seems the motivation for this following enum is mostly to deal with the fact that engine->perf_srcu is set before the OA unit is configured. Would it possible to set it later so that we get rid of the enum? <Sagar> I will try to make this as just binary state. This enum is defining the state of the stream. I too got confused with purpose of IN_PROGRESS. SRCU is used for synchronizing stream state check. IN_PROGRESS will enable us to not advertently try to access the stream vma for inserting the samples, but I guess depending on disabled/enabled should suffice. Hi Sagar/Lionel, Hi Sourab, Thanks again for your input on this. The purpose of the tristate was to workaround a particular kludge of working with just enabled/disabled boolean state. I'll explain below. Let's say we have only boolean state. i915_perf_emit_sample_capture() function would depend on stream->enabled in order to insert the MI_RPC command in RCS. If you see i915_perf_enable_locked(), stream->enabled is set before stream->ops->enable(). The stream->ops->enable() function actually enables the OA hardware to capture reports, and if MI_RPC commands are submitted before OA hw is enabled, it may hang the gpu. Do you remember if this is documented anywhere? I couldn't find anything in the MI_RPC instruction. Sorry, I don't happen to remember any documentation. Probably, you can check this out by submitting MI_RPC without enabling OA. Thing is gen7_oa_enable is depending on enabled to be true. Lionel, can I change it to : if (!stream->enabled) { I915_WRITE(GEN7_OACONTROL, (ctx_id & GEN7_OACONTROL_CTX_MASK) | (period_exponent << With stream->enabled set to true at the end of i915_perf_enable_locked. Also, we can't change the order of calling these operations inside i915_perf_enable_locked() since gen7_update_oacontrol_locked() function depends on stream->enabled flag to enable the OA hw unit (i.e. it needs the flag to be true). We can probably work around that by passing some arguments. To workaround this problem, I introduced a tristate here. If you can suggest some alternate solution to this problem, we can remove this tristate kludge here. Regards, Sourab
_______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx