i915: Framework for capturing command stream based OA reports and ctx id info.

Lionel Landwerlin Tue, 01 Aug 2017 13:58:52 -0700

On 01/08/17 19:05, sourab gupta wrote:

On Tue, Aug 1, 2017 at 2:59 PM, Kamble, Sagar A<sagar.a.kam...@intel.com <mailto:sagar.a.kam...@intel.com>> wrote:




    -----Original Message-----
    From: Landwerlin, Lionel G
    Sent: Monday, July 31, 2017 9:16 PM
    To: Kamble, Sagar A <sagar.a.kam...@intel.com
    <mailto:sagar.a.kam...@intel.com>>;
    intel-gfx@lists.freedesktop.org
    <mailto:intel-gfx@lists.freedesktop.org>
    Cc: Sourab Gupta <sourab.gu...@intel.com
    <mailto:sourab.gu...@intel.com>>
    Subject: Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for
    capturing command stream based OA reports and ctx id info.

    On 31/07/17 08:59, Sagar Arun Kamble wrote:
    > From: Sourab Gupta <sourab.gu...@intel.com
    <mailto:sourab.gu...@intel.com>>
    >
    > This patch introduces a framework to capture OA counter reports
    associated
    > with Render command stream. We can then associate the reports
    captured
    > through this mechanism with their corresponding context id's.
    This can be
    > further extended to associate any other metadata information
    with the
    > corresponding samples (since the association with Render command
    stream
    > gives us the ability to capture these information while
    inserting the
    > corresponding capture commands into the command stream).
    >
    > The OA reports generated in this way are associated with a
    corresponding
    > workload, and thus can be used the delimit the workload (i.e.
    sample the
    > counters at the workload boundaries), within an ongoing stream
    of periodic
    > counter snapshots.
    >
    > There may be usecases wherein we need more than periodic OA
    capture mode
    > which is supported currently. This mode is primarily used for
    two usecases:
    >      - Ability to capture system wide metrics, alongwith the
    ability to map
    >        the reports back to individual contexts (particularly for
    HSW).
    >      - Ability to inject tags for work, into the reports. This
    provides
    >        visibility into the multiple stages of work within single
    context.
    >
    > The userspace will be able to distinguish between the periodic
    and CS based
    > OA reports by the virtue of source_info sample field.
    >
    > The command MI_REPORT_PERF_COUNT can be used to capture
    snapshots of OA
    > counters, and is inserted at BB boundaries.
    > The data thus captured will be stored in a separate buffer,
    which will
    > be different from the buffer used otherwise for periodic OA
    capture mode.
    > The metadata information pertaining to snapshot is maintained in
    a list,
    > which also has offsets into the gem buffer object per captured
    snapshot.
    > In order to track whether the gpu has completed processing the node,
    > a field pertaining to corresponding gem request is added, which
    is tracked
    > for completion of the command.
    >
    > Both periodic and CS based reports are associated with a single
    stream
    > (corresponding to render engine), and it is expected to have the
    samples
    > in the sequential order according to their timestamps. Now,
    since these
    > reports are collected in separate buffers, these are merge
    sorted at the
    > time of forwarding to userspace during the read call.
    >
    > v2: Aligning with the non-perf interface (custom drm ioctl
    based). Also,
    > few related patches are squashed together for better readability
    >
    > v3: Updated perf sample capture emit hook name. Reserving space
    upfront
    > in the ring for emitting sample capture commands and using
    > req->fence.seqno for tracking samples. Added SRCU protection for
    streams.
    > Changed the stream last_request tracking to resv object. (Chris)
    > Updated perf.sample_lock spin_lock usage to avoid softlockups. Moved
    > stream to global per-engine structure. (Sagar)
    > Update unpin and put in the free routines to
    i915_vma_unpin_and_release.
    > Making use of perf stream cs_buffer vma resv instead of separate
    resv obj.
    > Pruned perf stream vma resv during gem_idle. (Chris)
    > Changed payload field ctx_id to u64 to keep all sample data
    aligned at 8
    > bytes. (Lionel)
    > stall/flush prior to sample capture is not added. Do we need to
    give this
    > control to user to select whether to stall/flush at each sample?
    >
    > Signed-off-by: Sourab Gupta <sourab.gu...@intel.com
    <mailto:sourab.gu...@intel.com>>
    > Signed-off-by: Robert Bragg <rob...@sixbynine.org
    <mailto:rob...@sixbynine.org>>
    > Signed-off-by: Sagar Arun Kamble <sagar.a.kam...@intel.com
    <mailto:sagar.a.kam...@intel.com>>
    > ---
    >   drivers/gpu/drm/i915/i915_drv.h |  101 ++-
    >   drivers/gpu/drm/i915/i915_gem.c |    1 +
    >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |    8 +
    >   drivers/gpu/drm/i915/i915_perf.c  | 1185
    ++++++++++++++++++++++------
    >   drivers/gpu/drm/i915/intel_engine_cs.c  |    4 +
    >   drivers/gpu/drm/i915/intel_ringbuffer.c |    2 +
    >   drivers/gpu/drm/i915/intel_ringbuffer.h |    5 +
    >   include/uapi/drm/i915_drm.h                |  15 +
    >   8 files changed, 1073 insertions(+), 248 deletions(-)
    >
    > diff --git a/drivers/gpu/drm/i915/i915_drv.h
    b/drivers/gpu/drm/i915/i915_drv.h
    > index 2c7456f..8b1cecf 100644
    > --- a/drivers/gpu/drm/i915/i915_drv.h
    > +++ b/drivers/gpu/drm/i915/i915_drv.h
    > @@ -1985,6 +1985,24 @@ struct i915_perf_stream_ops {
    >        * The stream will always be disabled before this is called.
    >        */
    >       void (*destroy)(struct i915_perf_stream *stream);
    > +
    > +     /*
    > +      * @emit_sample_capture: Emit the commands in the command
    streamer
    > +      * for a particular gpu engine.
    > +      *
    > +      * The commands are inserted to capture the perf sample
    data at
    > +      * specific points during workload execution, such as
    before and after
    > +      * the batch buffer.
    > +      */
    > +     void (*emit_sample_capture)(struct i915_perf_stream *stream,
    > +                                 struct drm_i915_gem_request
    *request,
    > +                                 bool preallocate);
    > +};
    > +

    It seems the motivation for this following enum is mostly to deal with
    the fact that engine->perf_srcu is set before the OA unit is
    configured.
    Would it possible to set it later so that we get rid of the enum?

    <Sagar> I will try to make this as just binary state. This enum is
    defining the state of the stream. I too got confused with purpose
    of IN_PROGRESS.
    SRCU is used for synchronizing stream state check.
    IN_PROGRESS will enable us to not advertently try to access the
    stream vma for inserting the samples, but I guess depending on
    disabled/enabled should
    suffice.


Hi Sagar/Lionel,


Hi Sourab,

Thanks again for your input on this.


The purpose of the tristate was to workaround a particular kludge of
working with just enabled/disabled boolean state. I'll explain below.

Let's say we have only boolean state.
i915_perf_emit_sample_capture() function would depend on
stream->enabled in order to insert the MI_RPC command in RCS.
If you see i915_perf_enable_locked(), stream->enabled is set before
stream->ops->enable(). The stream->ops->enable() function actually
enables the OA hardware to capture reports, and if MI_RPC commands
are submitted before OA hw is enabled, it may hang the gpu.


Do you remember if this is documented anywhere?
I couldn't find anything in the MI_RPC instruction.


Also, we can't change the order of calling these operations inside
i915_perf_enable_locked() since gen7_update_oacontrol_locked()
function depends on stream->enabled flag to enable the OA
hw unit (i.e. it needs the flag to be true).


We can probably work around that by passing some arguments.

To workaround this problem, I introduced a tristate here.
If you can suggest some alternate solution to this problem,
we can remove this tristate kludge here.
Regards,
Sourab

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 03/12] drm/i915: Framework for capturing command stream based OA reports and ctx id info.

Reply via email to