On Thu, 2017-03-16 at 03:09 -0700, Chris Wilson wrote:
> On Thu, Mar 16, 2017 at 03:22:03PM +0530, sourab gupta wrote:
> > On Thu, 2017-03-16 at 02:03 -0700, Chris Wilson wrote:
> > > On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> > > > On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > > > > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gu...@intel.com 
> > > > > wrote:
> > > > > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct 
> > > > > > intel_engine_cs *engine,
> > > > > >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> > > > > >                           struct i915_gem_context *ctx,
> > > > > >                           uint32_t *reg_state);
> > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request 
> > > > > > *req);
> > > > > >  
> > > > > >  /* i915_gem_evict.c */
> > > > > >  int __must_check i915_gem_evict_something(struct 
> > > > > > i915_address_space *vm,
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> > > > > > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > > index aa75ea2..7af32c97 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > > > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct 
> > > > > > drm_i915_gem_object *obj,
> > > > > >     if (exec_len == 0)
> > > > > >             exec_len = params->batch->size - 
> > > > > > params->args_batch_start_offset;
> > > > > >  
> > > > > > +   i915_perf_command_stream_hook(params->request);
> > > > > 
> > > > > Could you have named it anything more cyptic and inconsistent?
> > > > 
> > > > Sorry. Would 'i915_perf_capture_metrics' work?
> > > > Can you please suggest an appropriate name otherwise.
> > > 
> > > The verb we use for writting into the command stream is emit. So
> > > i915_perf_emit_samples() (emit_record record is clumsy as it is not clear
> > > whether it is the verb or noun).
> > > 
> > Thanks for suggesting. I'll use 'i915_perf_emit_samples' here.
> > 
> > > > > 
> > > > > Quite clearly this can fail, so justify the non handling of errors.
> > > > > 
> > > > > DRM_ERROR is not error handling, it is an indication that this patch
> > > > > isn't ready.
> > > > 
> > > > Well, the intent was to have minimal effect to execbuf normal routine,
> > > > even if we fail. But, I guess I'm mistaken.
> > > > I'll rectify this to handle the case, if perf callback fails.
> > > 
> > > That all depends on whether or not you can handle the unbalanced
> > > metrics. If simply missing a report is fine, then just kill the
> > > DRM_ERROR.
> > > 
> > > The bigger question is whether the following emit can to fail -- once
> > > the batch is in the request, no more failures are tolerable. You have to
> > > preallocate reserved space.
> > > 
> > > Don't you need a flush before the emit following the batch?
> > > 
> > 
> > Ok. So, that would mean that we have to first of all reserve the space
> > required by two 'perf_emit_samples' calls, so that we can't fail for the
> > lack of space in the emit following the batch.
> > Probably, I could pass an additional boolean parameter 'reserve_space'
> > set to true in the first call, which would reserve the space for both
> > emit_samples() calls (through intel_ring_begin)?
> 
> Hmm, yes, you can just tweak the request->reserved_space in the first
> call to preallocate some space (and make it remains available) for the
> second.
> 
> perf_emit_samples(rq, bool preallocate) {
>       /* then in each sample callback */
>       cs_len = foo;
>       if (preallocate)
>               rq->reserved_space += cs_len;
>       else
>               rq->reserved_space -= cs_len;
>       cs = intel_ring_begin(rq, cs_len);
> }
>       
Ok. I would do that. Thanks for suggesting.

> > Would it still need the flush before the emit following the batch?
> > Ideally, the metrics should be collected as close to batch as possible.
> > So, if there are cache flush/tlb invalidate commands, it would introduce
> > some lag between the batch and following emit_samples command.
> > Sorry, I'm not able to gauge the need for flush here. I can understand
> > it's needed before the batch is programmed for flushing the cache/TLB
> > entries for the new workload to be submitted. But for the Sample_emit
> > commands, which generally only capture OA/timestamp/mmio metrics, is
> > this still required? 
> 
> Depends on the desired accuracy. If you want your metrics sampled after
> the user pipeline is completed, you need a stall at least, a flush if
> your metrics include e.g. fragment data. If you want samples
> taken in whilst the user's batch is still executing, then no.
> 
I guess, in that case, I would atleast need a stall to ensure user
pipeline is completed before metrics are collected.
Any additional inputs here, Robert?

> > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request 
> > > > > > *request)
> > > > > > +{
> > > > > > +   struct intel_engine_cs *engine = request->engine;
> > > > > > +   struct drm_i915_private *dev_priv = engine->i915;
> > > > > > +   struct i915_perf_stream *stream;
> > > > > > +
> > > > > > +   if (!dev_priv->perf.initialized)
> > > > > > +           return;
> > > > > > +
> > > > > > +   mutex_lock(&dev_priv->perf.streams_lock);
> > > > > 
> > > > > Justify a new global lock very, very carefully on execbuf.
> > > > 
> > > > The lock introduced here is to protect the the perf.streams list against
> > > > addition/deletion while we're processing the list during execbuf call.
> > > > The other places where the mutex is taken is when a new stream is being
> > > > created (using perf_open ioctl) or a stream is being destroyed
> > > > (perf_release ioctl), which just protect the list_add and list_del into
> > > > the perf.streams list.
> > > > As such, there should not be much impact on execbuf path.
> > > > Does this seem reasonable?
> > > 
> > > It doesn't sound like it needs a mutex, which will simplify the other
> > > users as well (if switched to, for example, an RCU protected list).
> > 
> > Ok. Sounds reasonable, I'll switch to using an RCU protected list here.
> 
> (I may be overthinking this, but it still seems overkill and made the
> timer callback uglier than expected.)
> 
Are you suggesting that using RCU here is overkill, and mutex would do?

> > > > > > +   list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > > > > > +           if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > > > > > +                                   stream->cs_mode)
> > > > > > +                   stream->ops->command_stream_hook(stream, 
> > > > > > request);
> > > > > > +   }
> > > > > > +   mutex_unlock(&dev_priv->perf.streams_lock);
> > > > > > +}
> > > > > 
> > > > > > +static void i915_perf_command_stream_hook_oa(struct 
> > > > > > i915_perf_stream *stream,
> > > > > > +                                   struct drm_i915_gem_request 
> > > > > > *request)
> > > > > > +{
> > > > > > +   struct drm_i915_private *dev_priv = request->i915;
> > > > > > +   struct i915_gem_context *ctx = request->ctx;
> > > > > > +   struct i915_perf_cs_sample *sample;
> > > > > > +   u32 addr = 0;
> > > > > > +   u32 cmd, *cs;
> > > > > > +
> > > > > > +   sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > > > > > +   if (sample == NULL) {
> > > > > > +           DRM_ERROR("Perf sample alloc failed\n");
> > > > > > +           return;
> > > > > > +   }
> > > > > > +
> > > > > > +   cs = intel_ring_begin(request, 4);
> > > > > > +   if (IS_ERR(cs)) {
> > > > > > +           kfree(sample);
> > > > > > +           return;
> > > > > > +   }
> > > > > > +
> > > > > > +   sample->ctx_id = ctx->hw_id;
> > > > > > +   i915_gem_request_assign(&sample->request, request);
> > > > > 
> > > > > > +
> > > > > > +   i915_gem_active_set(&stream->last_request, request);
> > > > > 
> > > > > Hint, you don't need you own request trap.
> > > > Sorry, I didn't understand you fully here. I'm storing the reference to
> > > > the last active request associated with the stream inside the
> > > > last_request 'gem_active' field. Do I need to do something differently
> > > > here?
> > > 
> > > It's the duplication.
> > 
> > Are you suggesting that request_assign() and active_set() is
> > duplication? 
> > Actually, I'm using the last_request active tracker for the purpose of
> > waiting on last request to complete, whenever the need.
> > But still I need to get reference for every request for which the
> > commands for collection of metrics are emitted. This is because I need
> > to check for their completion before collecting the associated metrics.
> 
> Missed the sample / stream difference. request_assign means update the
> request field, had you used
>       sample->request = i915_gem_request_get(request)
> it would have been clearer.
> 
> Note that the requests are not ordered for the stream, so you cannot
> track the last_request so easily.
> 
> > This is done inside 'append_command_stream_samples' function, which also
> > takes care of releasing the reference taken here.
> > Am I missing something w.r.t. the active_set() functionality?
> 
> I was mostly thrown by the idea that you were reassigning requests,
> which history has shown to be a bad idea (hence i915_gem_active).
> However, stream->last_request should be a reservation_object.
> -Chris
> 
If I understand correctly, I need to have last_request of type
reservation object to hold all fences corresponding to the requests
pertaining to the stream. So, instead of i915_gem_active_set, I need to
do something like this:
        reservation_object_add_excl_fence(obj->resv, &rq->fence); /* Or any
other api to be used here ? */
And, when we want to wait for requests submitted on the stream to
complete, we have to wait for fences associated with the 'last_request'
reservation_object, for e.g. as done in
'i915_gem_object_wait_reservation' function.
Is this all to it, or am I missing some piece of action with this
reservation_object?

Thanks,
Sourab

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to