i915: Framework for capturing command stream based OA reports and ctx id info.

sourab gupta Tue, 01 Aug 2017 19:47:53 -0700

On Wed, Aug 2, 2017 at 2:28 AM, Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:


> On 01/08/17 19:05, sourab gupta wrote:
>
>
>
> On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A <sagar.a.kam...@intel.com>
> wrote:
>
>>
>>
>> -----Original Message-----
>> From: Landwerlin, Lionel G
>> Sent: Monday, July 31, 2017 9:16 PM
>> To: Kamble, Sagar A <sagar.a.kam...@intel.com>;
>> intel-gfx@lists.freedesktop.org
>> Cc: Sourab Gupta <sourab.gu...@intel.com>
>> Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing
>> command stream based OA reports and ctx id info.
>>
>> On 31/07/17 08:59, Sagar Arun Kamble wrote:
>> > From: Sourab Gupta <sourab.gu...@intel.com>
>> >
>> > This patch introduces a framework to capture OA counter reports
>> associated
>> > with Render command stream. We can then associate the reports captured
>> > through this mechanism with their corresponding context id's. This can
>> be
>> > further extended to associate any other metadata information with the
>> > corresponding samples (since the association with Render command stream
>> > gives us the ability to capture these information while inserting the
>> > corresponding capture commands into the command stream).
>> >
>> > The OA reports generated in this way are associated with a corresponding
>> > workload, and thus can be used the delimit the workload (i.e. sample the
>> > counters at the workload boundaries), within an ongoing stream of
>> periodic
>> > counter snapshots.
>> >
>> > There may be usecases wherein we need more than periodic OA capture mode
>> > which is supported currently. This mode is primarily used for two
>> usecases:
>> >      - Ability to capture system wide metrics, alongwith the ability to
>> map
>> >        the reports back to individual contexts (particularly for HSW).
>> >      - Ability to inject tags for work, into the reports. This provides
>> >        visibility into the multiple stages of work within single
>> context.
>> >
>> > The userspace will be able to distinguish between the periodic and CS
>> based
>> > OA reports by the virtue of source_info sample field.
>> >
>> > The command MI_REPORT_PERF_COUNT can be used to capture snapshots of OA
>> > counters, and is inserted at BB boundaries.
>> > The data thus captured will be stored in a separate buffer, which will
>> > be different from the buffer used otherwise for periodic OA capture
>> mode.
>> > The metadata information pertaining to snapshot is maintained in a list,
>> > which also has offsets into the gem buffer object per captured snapshot.
>> > In order to track whether the gpu has completed processing the node,
>> > a field pertaining to corresponding gem request is added, which is
>> tracked
>> > for completion of the command.
>> >
>> > Both periodic and CS based reports are associated with a single stream
>> > (corresponding to render engine), and it is expected to have the samples
>> > in the sequential order according to their timestamps. Now, since these
>> > reports are collected in separate buffers, these are merge sorted at the
>> > time of forwarding to userspace during the read call.
>> >
>> > v2: Aligning with the non-perf interface (custom drm ioctl based). Also,
>> > few related patches are squashed together for better readability
>> >
>> > v3: Updated perf sample capture emit hook name. Reserving space upfront
>> > in the ring for emitting sample capture commands and using
>> > req->fence.seqno for tracking samples. Added SRCU protection for
>> streams.
>> > Changed the stream last_request tracking to resv object. (Chris)
>> > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
>> > stream to global per-engine structure. (Sagar)
>> > Update unpin and put in the free routines to i915_vma_unpin_and_release.
>> > Making use of perf stream cs_buffer vma resv instead of separate resv
>> obj.
>> > Pruned perf stream vma resv during gem_idle. (Chris)
>> > Changed payload field ctx_id to u64 to keep all sample data aligned at 8
>> > bytes. (Lionel)
>> > stall/flush prior to sample capture is not added. Do we need to give
>> this
>> > control to user to select whether to stall/flush at each sample?
>> >
>> > Signed-off-by: Sourab Gupta <sourab.gu...@intel.com>
>> > Signed-off-by: Robert Bragg <rob...@sixbynine.org>
>> > Signed-off-by: Sagar Arun Kamble <sagar.a.kam...@intel.com>
>> > ---
>> >   drivers/gpu/drm/i915/i915_drv.h            |  101 ++-
>> >   drivers/gpu/drm/i915/i915_gem.c            |    1 +
>> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
>> >   drivers/gpu/drm/i915/i915_perf.c           | 1185
>> ++++++++++++++++++++++------
>> >   drivers/gpu/drm/i915/intel_engine_cs.c     |    4 +
>> >   drivers/gpu/drm/i915/intel_ringbuffer.c    |    2 +
>> >   drivers/gpu/drm/i915/intel_ringbuffer.h    |    5 +
>> >   include/uapi/drm/i915_drm.h                |   15 +
>> >   8 files changed, 1073 insertions(+), 248 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
>> b/drivers/gpu/drm/i915/i915_drv.h
>> > index 2c7456f..8b1cecf 100644
>> > --- a/drivers/gpu/drm/i915/i915_drv.h
>> > +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
>> >        * The stream will always be disabled before this is called.
>> >        */
>> >       void (*destroy)(struct i915_perf_stream *stream);
>> > +
>> > +     /*
>> > +      * @emit_sample_capture: Emit the commands in the command streamer
>> > +      * for a particular gpu engine.
>> > +      *
>> > +      * The commands are inserted to capture the perf sample data at
>> > +      * specific points during workload execution, such as before and
>> after
>> > +      * the batch buffer.
>> > +      */
>> > +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
>> > +                                 struct drm_i915_gem_request *request,
>> > +                                 bool preallocate);
>> > +};
>> > +
>>
>> It seems the motivation for this following enum is mostly to deal with
>> the fact that engine->perf_srcu is set before the OA unit is configured.
>> Would it possible to set it later so that we get rid of the enum?
>>
>> <Sagar> I will try to make this as just binary state. This enum is
>> defining the state of the stream. I too got confused with purpose of
>> IN_PROGRESS.
>> SRCU is used for synchronizing stream state check.
>> IN_PROGRESS will enable us to not advertently try to access the stream
>> vma for inserting the samples, but I guess depending on disabled/enabled
>> should
>> suffice.
>>
>
> Hi Sagar/Lionel,
>
>
> Hi Sourab,
>
> Thanks again for your input on this.
>
>
> The purpose of the tristate was to workaround a particular kludge of
> working with just enabled/disabled boolean state. I'll explain below.
>
> Let's say we have only boolean state.
> i915_perf_emit_sample_capture() function would depend on
> stream->enabled in order to insert the MI_RPC command in RCS.
> If you see i915_perf_enable_locked(), stream->enabled is set before
> stream->ops->enable(). The stream->ops->enable() function actually
> enables the OA hardware to capture reports, and if MI_RPC commands
> are submitted before OA hw is enabled, it may hang the gpu.
>
>
> Do you remember if this is documented anywhere?
> I couldn't find anything in the MI_RPC instruction.
>
> Sorry, I don't happen to remember any documentation. Probably, you can
check this out by submitting MI_RPC without enabling OA.

>
> Also, we can't change the order of calling these operations inside
> i915_perf_enable_locked() since gen7_update_oacontrol_locked()
> function depends on stream->enabled flag to enable the OA
> hw unit (i.e. it needs the flag to be true).
>
>
> We can probably work around that by passing some arguments.
>
> To workaround this problem, I introduced a tristate here.
> If you can suggest some alternate solution to this problem,
> we can remove this tristate kludge here.
> Regards,
> Sourab
>
>
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

Reply via email to