On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin < lionel.g.landwer...@intel.com> wrote:
> On 01/08/17 19:05, sourab gupta wrote: > > > > On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kam...@intel.com> > wrote: > >> >> >> -----Original Message----- >> From: Landwerlin, Lionel G >> Sent: Monday, July 31, 2017 9:16 PM >> To: Kamble, Sagar A <sagar.a.kam...@intel.com>; >> intel-gfx@lists.freedesktop.org >> Cc: Sourab Gupta <sourab.gu...@intel.com> >> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing >> command stream based OA reports and ctx id info. >> >> On 31/07/17 08:59, Sagar Arun Kamble wrote: >> > From: Sourab Gupta <sourab.gu...@intel.com> >> > >> > This patch introduces a framework to capture OA counter reports >> associated >> > with Render command stream. We can then associate the reports captured >> > through this mechanism with their corresponding context id's. This can >> be >> > further extended to associate any other metadata information with the >> > corresponding samples (since the association with Render command stream >> > gives us the ability to capture these information while inserting the >> > corresponding capture commands into the command stream). >> > >> > The OA reports generated in this way are associated with a corresponding >> > workload, and thus can be used the delimit the workload (i.e. sample the >> > counters at the workload boundaries), within an ongoing stream of >> periodic >> > counter snapshots. >> > >> > There may be usecases wherein we need more than periodic OA capture mode >> > which is supported currently. This mode is primarily used for two >> usecases: >> > - Ability to capture system wide metrics, alongwith the ability to >> map >> > the reports back to individual contexts (particularly for HSW). >> > - Ability to inject tags for work, into the reports. This provides >> > visibility into the multiple stages of work within single >> context. >> > >> > The userspace will be able to distinguish between the periodic and CS >> based >> > OA reports by the virtue of source_info sample field. >> > >> > The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA >> > counters, and is inserted at BB boundaries. >> > The data thus captured will be stored in a separate buffer, which will >> > be different from the buffer used otherwise for periodic OA capture >> mode. >> > The metadata information pertaining to snapshot is maintained in a list, >> > which also has offsets into the gem buffer object per captured snapshot. >> > In order to track whether the gpu has completed processing the node, >> > a field pertaining to corresponding gem request is added, which is >> tracked >> > for completion of the command. >> > >> > Both periodic and CS based reports are associated with a single stream >> > (corresponding to render engine), and it is expected to have the samples >> > in the sequential order according to their timestamps. Now, since these >> > reports are collected in separate buffers, these are merge sorted at the >> > time of forwarding to userspace during the read call. >> > >> > v2: Aligning with the non-perf interface (custom drm ioctl based). Also, >> > few related patches are squashed together for better readability >> > >> > v3: Updated perf sample capture emit hook name. Reserving space upfront >> > in the ring for emitting sample capture commands and using >> > req->fence.seqno for tracking samples. Added SRCU protection for >> streams. >> > Changed the stream last_request tracking to resv object. (Chris) >> > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved >> > stream to global per-engine structure. (Sagar) >> > Update unpin and put in the free routines to i915_vma_unpin_and_release. >> > Making use of perf stream cs_buffer vma resv instead of separate resv >> obj. >> > Pruned perf stream vma resv during gem_idle. (Chris) >> > Changed payload field ctx_id to u64 to keep all sample data aligned at 8 >> > bytes. (Lionel) >> > stall/flush prior to sample capture is not added. Do we need to give >> this >> > control to user to select whether to stall/flush at each sample? >> > >> > Signed-off-by: Sourab Gupta <sourab.gu...@intel.com> >> > Signed-off-by: Robert Bragg <rob...@sixbynine.org> >> > Signed-off-by: Sagar Arun Kamble <sagar.a.kam...@intel.com> >> > --- >> > drivers/gpu/drm/i915/i915_drv.h | 101 ++- >> > drivers/gpu/drm/i915/i915_gem.c | 1 + >> > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 8 + >> > drivers/gpu/drm/i915/i915_perf.c | 1185 >> ++++++++++++++++++++++------ >> > drivers/gpu/drm/i915/intel_engine_cs.c | 4 + >> > drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + >> > drivers/gpu/drm/i915/intel_ringbuffer.h | 5 + >> > include/uapi/drm/i915_drm.h | 15 + >> > 8 files changed, 1073 insertions(+), 248 deletions(-) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_drv.h >> b/drivers/gpu/drm/i915/i915_drv.h >> > index 2c7456f..8b1cecf 100644 >> > --- a/drivers/gpu/drm/i915/i915_drv.h >> > +++ b/drivers/gpu/drm/i915/i915_drv.h >> > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops { >> > * The stream will always be disabled before this is called. >> > */ >> > void (*destroy)(struct i915_perf_stream *stream); >> > + >> > + /* >> > + * @emit_sample_capture: Emit the commands in the command streamer >> > + * for a particular gpu engine. >> > + * >> > + * The commands are inserted to capture the perf sample data at >> > + * specific points during workload execution, such as before and >> after >> > + * the batch buffer. >> > + */ >> > + void (*emit_sample_capture)(struct i915_perf_stream *stream, >> > + struct drm_i915_gem_request *request, >> > + bool preallocate); >> > +}; >> > + >> >> It seems the motivation for this following enum is mostly to deal with >> the fact that engine->perf_srcu is set before the OA unit is configured. >> Would it possible to set it later so that we get rid of the enum? >> >> <Sagar> I will try to make this as just binary state. This enum is >> defining the state of the stream. I too got confused with purpose of >> IN_PROGRESS. >> SRCU is used for synchronizing stream state check. >> IN_PROGRESS will enable us to not advertently try to access the stream >> vma for inserting the samples, but I guess depending on disabled/enabled >> should >> suffice. >> > > Hi Sagar/Lionel, > > > Hi Sourab, > > Thanks again for your input on this. > > > The purpose of the tristate was to workaround a particular kludge of > working with just enabled/disabled boolean state. I'll explain below. > > Let's say we have only boolean state. > i915_perf_emit_sample_capture() function would depend on > stream->enabled in order to insert the MI_RPC command in RCS. > If you see i915_perf_enable_locked(), stream->enabled is set before > stream->ops->enable(). The stream->ops->enable() function actually > enables the OA hardware to capture reports, and if MI_RPC commands > are submitted before OA hw is enabled, it may hang the gpu. > > > Do you remember if this is documented anywhere? > I couldn't find anything in the MI_RPC instruction. > > Sorry, I don't happen to remember any documentation. Probably, you can check this out by submitting MI_RPC without enabling OA. > > Also, we can't change the order of calling these operations inside > i915_perf_enable_locked() since gen7_update_oacontrol_locked() > function depends on stream->enabled flag to enable the OA > hw unit (i.e. it needs the flag to be true). > > > We can probably work around that by passing some arguments. > > To workaround this problem, I introduced a tristate here. > If you can suggest some alternate solution to this problem, > we can remove this tristate kludge here. > Regards, > Sourab > > > >
_______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx