from:"Navik, Ankit P"

Re: [Intel-gfx] [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU.

2018-12-11 Thread Navik, Ankit P

Hi Tvrtko,

> On Wed, Nov 7, 2018 at 4:08 PM Tvrtko Ursulin 
>  wrote:
> 
> 
> On 06/11/2018 04:13, Ankit Navik wrote:
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI   | % Power |
> > | Benefit |
> > |  (mW)   |
> > -|
> > 3D Mark (Ice storm)| 2.30%   |
> > TRex On screen | 2.49%   |
> > TRex Off screen| 1.32%   |
> > ManhattanOn screen | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4  | 3.42%   |
> 
> Were you able to find some benchmarks which regress? Maybe try Synmark2
> and more from gfxbench? Not all benchmarks there are equally important, and
> regressions on some are fine, but I think a fuller set would be interesting 
> to see.

We have not seen much improvement in GFX Carchase, but there was no degradation 
in performance.
Regards, Ankit 
> 
> Regards,
> 
> Tvrtko
> 
> >
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> >
> > Praveen Diwakar (4):
> >drm/i915: Get active pending request for given context
> >drm/i915: Update render power clock state configuration for given
> >  context
> >drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >  type
> >drm/i915: Predictive governor to control eu/slice/subslice
> >
> >   drivers/gpu/drm/i915/i915_debugfs.c| 88
> +-
> >   drivers/gpu/drm/i915/i915_drv.c|  1 +
> >   drivers/gpu/drm/i915/i915_drv.h| 10 
> >   drivers/gpu/drm/i915/i915_gem_context.c| 26 +
> >   drivers/gpu/drm/i915/i915_gem_context.h| 45 +++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  5 ++
> >   drivers/gpu/drm/i915/intel_device_info.c   | 44 ++-
> >   drivers/gpu/drm/i915/intel_lrc.c   | 20 ++-
> >   8 files changed, 235 insertions(+), 4 deletions(-)
> >
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: Get active pending request for given context

2018-12-11 Thread Navik, Ankit P

Hi Tvrtko, 

> On Tue, Nov 6, 2018 at 3:14 PM Tvrtko Ursulin 
>  wrote:
> 
> 
> On 06/11/2018 04:13, Ankit Navik wrote:
> > From: Praveen Diwakar 
> >
> > This patch gives us the active pending request count which is yet to
> > be submitted to the GPU
> >
> > Signed-off-by: Praveen Diwakar 
> > Signed-off-by: Yogesh Marathe 
> > Signed-off-by: Aravindan Muthukumar 
> > Signed-off-by: Kedar J Karanje 
> > Signed-off-by: Ankit Navik 
> > Suggested-by: Tvrtko Ursulin 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c| 1 +
> >   drivers/gpu/drm/i915/i915_drv.h| 5 +
> >   drivers/gpu/drm/i915/i915_gem_context.c| 1 +
> >   drivers/gpu/drm/i915/i915_gem_context.h| 6 ++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +
> >   drivers/gpu/drm/i915/intel_lrc.c   | 6 ++
> >   6 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> > b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16..d37c46e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -903,6 +903,7 @@ static int i915_driver_init_early(struct
> drm_i915_private *dev_priv,
> > mutex_init(&dev_priv->av_mutex);
> > mutex_init(&dev_priv->wm.wm_mutex);
> > mutex_init(&dev_priv->pps_mutex);
> > +   mutex_init(&dev_priv->pred_mutex);
> >
> > i915_memcpy_init_early(dev_priv);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 4aca534..137ec33 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1609,6 +1609,11 @@ struct drm_i915_private {
> >  * controller on different i2c buses. */
> > struct mutex gmbus_mutex;
> >
> > +   /** pred_mutex protects against councurrent usage of pending
> > +* request counter for multiple contexts
> > +*/
> > +   struct mutex pred_mutex;
> > +
> > /**
> >  * Base address of the gmbus and gpio block.
> >  */
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index b10770c..0bcbe32 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> > }
> >
> > trace_i915_context_create(ctx);
> > +   atomic_set(&ctx->req_cnt, 0);
> >
> > return ctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> > b/drivers/gpu/drm/i915/i915_gem_context.h
> > index b116e49..04e3ff7 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -194,6 +194,12 @@ struct i915_gem_context {
> >  * context close.
> >  */
> > struct list_head handles_list;
> > +
> > +   /** req_cnt: tracks the pending commands, based on which we decide
> to
> > +* go for low/medium/high load configuration of the GPU, this is
> > +* controlled via a mutex
> > +*/
> > +   atomic_t req_cnt;
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> > i915_gem_context *ctx) diff --git
> > a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 3f0c612..8afa2a5 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -2178,6 +2178,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >struct drm_syncobj **fences)
> >   {
> > struct i915_execbuffer eb;
> > +   struct drm_i915_private *dev_priv = to_i915(dev);
> > struct dma_fence *in_fence = NULL;
> > struct sync_file *out_fence = NULL;
> > int out_fence_fd = -1;
> > @@ -2390,6 +2391,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >  */
> > eb.request->batch = eb.batch;
> >
> > +   mutex_lock(&dev_priv->pred_mutex);
> > +   atomic_inc(&eb.ctx->req_cnt);
> 
> Point of going to atomic_t was to remove need for the mutex.
> 
> > +   mutex_unlock(&dev_priv->pred_mutex);
> > +
> > trace_i915_request_queue(eb.request, eb.batch_flags);
> > err = eb_submit(&eb);
> >   err_request:
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 1744792..bcbb66b 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -728,6 +728,12 @@ static void execlists_dequeue(struct intel_engine_cs
> *engine)
> > trace_i915_request_in(rq, port_index(port, execlists));
> > last = rq;
> > submit = true;
> > +
> > +   mutex_lock(&rq->i915->pred_mutex);
> > +   if (atomic_read(&rq->gem_context->req_cnt) > 0)
> > +   atomic_dec(&rq->gem_context->req_cnt);
> 
> Hitting underflow is a hint accounting does not work as expected. I really 
> think
> you need to fix it by gathering some ideas from the patches I've pointed at 
> in the
> previou

Re: [Intel-gfx] [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type

2018-12-11 Thread Navik, Ankit P

Hi Tvrtko, 

> On Tue, Dec 11, 2018 at 6:17 PM Tvrtko Ursulin 
>  wrote:
> 
> 
> On 11/12/2018 10:14, Ankit Navik wrote:
> > From: Praveen Diwakar 
> >
> > This patch will select optimum eu/slice/sub-slice configuration based
> > on type of load (low, medium, high) as input.
> > Based on our readings and experiments we have predefined set of
> > optimum configuration for each platform(CHT, KBL).
> > i915_gem_context_set_load_type will select optimum configuration from
> > pre-defined optimum configuration table(opt_config).
> >
> > It also introduce flag update_render_config which can set by any governor.
> >
> > v2:
> >   * Move static optimum_config to device init time.
> >   * Rename function to appropriate name, fix data types and patch ordering.
> >   * Rename prev_load_type to pending_load_type. (Tvrtko Ursulin)
> >
> > v3:
> >   * Add safe guard check in i915_gem_context_set_load_type.
> >   * Rename struct from optimum_config to i915_sseu_optimum_config to
> > avoid namespace clashes.
> >   * Reduces memcpy for space efficient.
> >   * Rebase.
> >   * Improved commit message. (Tvrtko Ursulin)
> >
> > Cc: Kedar J Karanje 
> > Cc: Yogesh Marathe 
> > Reviewed-by: Tvrtko Ursulin 
> 
> Again for the record no, I did not r-b this.

Sorry, I wasn't aware that r-b has to be added by reviewer. Will take care in 
next patch sets.
> 
> > Signed-off-by: Praveen Diwakar 
> > Signed-off-by: Aravindan Muthukumar 
> > Signed-off-by: Ankit Navik 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.h  |  3 ++
> >   drivers/gpu/drm/i915/i915_gem_context.c  | 18 
> >   drivers/gpu/drm/i915/i915_gem_context.h  | 25 +
> >   drivers/gpu/drm/i915/intel_device_info.c | 47
> ++--
> >   drivers/gpu/drm/i915/intel_lrc.c |  4 ++-
> >   5 files changed, 94 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 4aca534..4b9a8c5 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1681,6 +1681,9 @@ struct drm_i915_private {
> > struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /*
> assume 965 */
> > int num_fence_regs; /* 8 on pre-965, 16 otherwise */
> >
> > +   /* optimal slice/subslice/EU configration state */
> > +   struct i915_sseu_optimum_config opt_config[LOAD_TYPE_LAST];
> 
> Make it a pointer to struct i915_sseu_optimum_config.

Will incorporate this and other reviews in next patch.
> 
> > +
> > unsigned int fsb_freq, mem_freq, is_ddr3;
> > unsigned int skl_preferred_vco_freq;
> > unsigned int max_cdclk_freq;
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index d040858..c0ced72 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -392,10 +392,28 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> > ctx->subslice_cnt = hweight8(
> > INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> > ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> > +   ctx->load_type = 0;
> > +   ctx->pending_load_type = 0;
> 
> Not needed because we zero the allocation and probably depend on untouched
> fields being zero elsewhere.
> 
> >
> > return ctx;
> >   }
> >
> > +
> > +void i915_gem_context_set_load_type(struct i915_gem_context *ctx,
> > +   enum gem_load_type type)
> > +{
> > +   struct drm_i915_private *dev_priv = ctx->i915;
> > +
> > +   if (GEM_WARN_ON(type > LOAD_TYPE_LAST))
> > +   return;
> > +
> > +   /* Call opt_config to get correct configuration for eu,slice,subslice */
> > +   ctx->slice_cnt = dev_priv->opt_config[type].slice;
> > +   ctx->subslice_cnt = dev_priv->opt_config[type].subslice;
> > +   ctx->eu_cnt = dev_priv->opt_config[type].eu;
> > +   ctx->pending_load_type = type;
> > +}
> > +
> >   /**
> >* i915_gem_context_create_gvt - create a GVT GEM context
> >* @dev: drm device *
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> > b/drivers/gpu/drm/i915/i915_gem_context.h
> > index e000530..a0db13c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -53,6 +53,19 @@ struct intel_context_ops {
> > void (*destroy)(struct intel_context *ce);
> >   };
> >
> > +enum gem_load_type {
> > +   LOAD_TYPE_LOW,
> > +   LOAD_TYPE_MEDIUM,
> > +   LOAD_TYPE_HIGH,
> > +   LOAD_TYPE_LAST
> > +};
> > +
> > +struct i915_sseu_optimum_config {
> > +   u8 slice;
> > +   u8 subslice;
> > +   u8 eu;
> > +};
> > +
> >   /**
> >* struct i915_gem_context - client state
> >*
> > @@ -208,6 +221,16 @@ struct i915_gem_context {
> >
> > /** eu_cnt: used to set the # of eu to be enabled. */
> > u8 eu_cnt;
> > +
> > +   /** load_type: The designated load_type (high/medium/low) for a given
> > +* number of pending commands in the command queue.
> > +*/
> >

Re: [Intel-gfx] [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.

2018-12-11 Thread Navik, Ankit P

Hi Tvrtko, 

On Tue, Dec 11, 2018 at 5:18 PM Tvrtko Ursulin  
wrote:
> 
> 
> On 11/12/2018 10:14, Ankit Navik wrote:
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI   | % Power |
> > | Benefit |
> > |  (mW)   |
> > -|
> > 3D Mark (Ice storm)| 2.30%   |
> > TRex On screen | 2.49%   |
> > TRex Off screen| 1.32%   |
> > ManhattanOn screen | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4  | 3.42%   |
> > SynMark2   | 1.70%   |
> 
> Is this the aggregated SynMark2 result, like all sub-tests averaged or 
> something?

Yes, It is averaged result covering all the test cases.
> 
> I suggest you do want to list much more detail here, all individual sub-tests,
> different platforms, etc. The change you are proposing is quite big and the
> amount of research that you must demonstrate for people to take this seriously
> has to be equally exhaustive.

I will verify and add more details covering various platform and sub-tests.

Regards, Ankit 
> 
> Regards,
> 
> Tvrtko
> 
> >
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> > We have also observed GPU core residencies improves by 1.03%.
> >
> > Praveen Diwakar (4):
> >drm/i915: Get active pending request for given context
> >drm/i915: Update render power clock state configuration for given
> >  context
> >drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >  type
> >drm/i915: Predictive governor to control eu/slice/subslice
> >
> >   drivers/gpu/drm/i915/i915_debugfs.c  | 90
> +++-
> >   drivers/gpu/drm/i915/i915_drv.c  |  4 ++
> >   drivers/gpu/drm/i915/i915_drv.h  |  9 
> >   drivers/gpu/drm/i915/i915_gem_context.c  | 23 
> >   drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++
> >   drivers/gpu/drm/i915/i915_request.c  |  2 +
> >   drivers/gpu/drm/i915/intel_device_info.c | 47 -
> >   drivers/gpu/drm/i915/intel_lrc.c | 16 +-
> >   8 files changed, 226 insertions(+), 4 deletions(-)
> >
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.

2018-12-14 Thread Navik, Ankit P

Hi Joonas, 

On Fri, Dec 14, 2018 at 3:57 PM Joonas Lahtinen 
 wrote:
> 
> Quoting Ankit Navik (2018-12-11 12:14:17)
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> 
> On the overall strategy. This will be unsuitable to be merged as a debugfs
> interface. So is the idea to evolve into a sysfs interface? As this seems to 
> require
> tuning for each specific workload, I don't think that would scale too well if 
> you
> consider a desktop distro?

We started initially as debugfs interface. I have added the comment to
move the functionality into sysfs interface. Yes, I will consider the desktop
distro and share the detail results.
> 
> Also, there's the patch series to to enable/disable subslices with VME 
> hardware
> (the other dynamic slice shutdown/SSEU series) depending on the type of load
> being run. Certain workloads would hang the system if they're executed with 
> full
> subslice configuration. In that light, it would make more sense if the 
> applications
> would be the ones reporting their optimal running configuration.

I think, the series expose rpcs for gen 11 only for VME use case.
The patch I have tested on KBL (Gen 9). I will consider other gen 9 platform as 
well. 

> 
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI   | % Power |
> >| Benefit |
> >|  (mW)   |
> > -|
> > 3D Mark (Ice storm)| 2.30%   |
> > TRex On screen | 2.49%   |
> > TRex Off screen| 1.32%   |
> > ManhattanOn screen | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4  | 3.42%   |
> > SynMark2   | 1.70%   |
> 
> Just to verify, these numbers are true while there's no negative effect on the
> benchmark scores?

Yes, There is no impact on the benchmark scores.
Thank you Joonas for your valuable feedback.

Regards, Ankit

> 
> Regards, Joonas
> 
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> > We have also observed GPU core residencies improves by 1.03%.
> >
> > Praveen Diwakar (4):
> >   drm/i915: Get active pending request for given context
> >   drm/i915: Update render power clock state configuration for given
> > context
> >   drm/i915: set optimum eu/slice/sub-slice configuration based on load
> > type
> >   drm/i915: Predictive governor to control eu/slice/subslice
> >
> >  drivers/gpu/drm/i915/i915_debugfs.c  | 90
> +++-
> >  drivers/gpu/drm/i915/i915_drv.c  |  4 ++
> >  drivers/gpu/drm/i915/i915_drv.h  |  9 
> >  drivers/gpu/drm/i915/i915_gem_context.c  | 23 
> > drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++
> >  drivers/gpu/drm/i915/i915_request.c  |  2 +
> >  drivers/gpu/d

Re: [Intel-gfx] [PATCH v5 0/3] drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel

2019-11-25 Thread Navik, Ankit P



> -Original Message-
> From: Chris Wilson 
> Sent: Monday, November 25, 2019 1:46 PM
> To: Navik, Ankit P ; intel-
> g...@lists.freedesktop.org
> Cc: Navik, Ankit P ; Anand, Vipin
> 
> Subject: Re: [Intel-gfx] [PATCH v5 0/3] drm/i915: Context aware user agnostic
> EU/Slice/Sub-slice control within kernel
> 
> Quoting Ankit Navik (2019-11-25 06:39:02)
> > This patch sets improves GPU power consumption on Linux kernel based
> > OS such as Chromium OS, Ubuntu, etc. Following are the power savings.
> 
> The code is still as broken as it was when it was last posted.

I saw, CI is failed. Let me rebase again. 
It works at my end on 5.4.0-rc2+. 

Regards,
Ankit
> -Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v6 3/3] drm/i915: Predictive governor to control slice/subslice/eu

2019-11-27 Thread Navik, Ankit P

Hi Chris, 

Thank you for your feedback.

> On 26/11/2019 10:52, Charis Wilson wrote:
> 
> Quoting Tvrtko Ursulin (2019-11-26 10:51:22)
> > You mentioned you did some experiment where you did something on
> > context pinning and that it did not work so well. I don't know what
> > that was though. I don't think that was ever posted?
> >
> > What I am thinking is this: You drop the timer altogether. Instead in
> > __execlists_update_reg_state you look at your gem_context->req_cnt and
> > implement your logic there.
> 
> I noticed the same non-sequitur. Except I would push that either the entire
> measurement and hence patch series is bogus (beyond the patches themselves
> being trivially broken, tested much?), or that it really should be done from a
> timer and also adjust pinned contexts ala reconfigure_sseu.
> 
> A bunch of selftests and igt proving that different sseu setups do consume
> different power (i.e. that we can adjust sseu correctly), along with
> demonstrating good workloads where autotuning produces beneficial results is
> a must.
> 
> Also given Tvrtko's stats, this could all be done from userspace with an 
> extended
> CONTEXT_PARAM_SSEU, so why would we not do it that way?
> -Chris

We have verified this patch series on KBL, GLK & CML platforms with well
known standard benchmarks as mentioned in the cover letters.
Power savings are verified with various tools (RAPL counters, Moonsoon
power monitor and socwatch).
We have also verified Battery Life use cases which shows significant power
savings. So I hope you will understand the we have taken proper measurements
before pushing the patch series. 

Thanks & Regards, 
Ankit
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v6 3/3] drm/i915: Predictive governor to control slice/subslice/eu

2019-11-27 Thread Navik, Ankit P

> On 26/11/2019 04:21, Tvrtko Ursulin wrote:
> 
> On 26/11/2019 04:51, Ankit Navik wrote:
> > High resolution timer is used for predictive governor to control
> > eu/slice/subslice based on workloads.
> >
> > param is provided to enable/disable/update timer configuration
> >
> > V2:
> >   * Fix code style.
> >   * Move predictive_load_timer into a drm_i915_private
> > structure.
> >   * Make generic function to set optimum config. (Tvrtko Ursulin)
> >
> > V3:
> >   * Rebase.
> >   * Fix race condition for predictive load set.
> >   * Add slack to start hrtimer for more power efficient. (Tvrtko
> > Ursulin)
> >
> > V4:
> >   * Fix data type and initialization of mutex to protect predictive load
> > state.
> >   * Move predictive timer init to i915_gem_init_early. (Tvrtko Ursulin)
> >   * Move debugfs to kernel parameter.
> >
> > V5:
> >   * Rebase.
> >   * Remove mutex for pred_timer
> >
> > V6:
> >   * Rebase.
> >   * Fix warnings.
> >
> > Cc: Vipin Anand 
> > Signed-off-by: Ankit Navik 
> > ---
> >   drivers/gpu/drm/i915/Makefile   |   1 +
> >   drivers/gpu/drm/i915/gt/intel_deu.c | 104
> 
> >   drivers/gpu/drm/i915/gt/intel_deu.h |  31 +++
> >   drivers/gpu/drm/i915/i915_drv.h |   4 ++
> >   drivers/gpu/drm/i915/i915_gem.c |   4 ++
> >   drivers/gpu/drm/i915/i915_params.c  |   4 ++
> >   drivers/gpu/drm/i915/i915_params.h  |   1 +
> >   7 files changed, 149 insertions(+)
> >   create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.c
> >   create mode 100644 drivers/gpu/drm/i915/gt/intel_deu.h
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile
> > b/drivers/gpu/drm/i915/Makefile index e0fd10c..c1a98f3 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -77,6 +77,7 @@ obj-y += gt/
> >   gt-y += \
> > gt/intel_breadcrumbs.o \
> > gt/intel_context.o \
> > +   gt/intel_deu.o \
> > gt/intel_engine_cs.o \
> > gt/intel_engine_heartbeat.o \
> > gt/intel_engine_pm.o \
> > diff --git a/drivers/gpu/drm/i915/gt/intel_deu.c
> > b/drivers/gpu/drm/i915/gt/intel_deu.c
> > new file mode 100644
> > index 000..6c5b01c
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/gt/intel_deu.c
> > @@ -0,0 +1,104 @@
> > +/*
> > + * Copyright © 2019 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > +obtaining a
> > + * copy of this software and associated documentation files (the
> > +"Software"),
> > + * to deal in the Software without restriction, including without
> > +limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> > +sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom
> > +the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including
> > +the next
> > + * paragraph) shall be included in all copies or substantial portions
> > +of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > +EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > +MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> EVENT
> > +SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> DAMAGES
> > +OR OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > +ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > +OTHER
> > + * DEALINGS IN THE SOFTWARE.
> > + *
> > + * Authors:
> > + *Ankit Navik 
> > + */
> > +
> > +/**
> > + * DOC: Dynamic EU Control (DEU)
> > + *
> > + * DEU tries to re-configure EU allocation during runtime by
> > +predictive load
> > + * calculation of command queue to gain power saving.
> > + * It is transparent to user space and completely handled in the kernel.
> > + */
> > +
> > +#include "intel_deu.h"
> > +#include "i915_drv.h"
> > +#include "gem/i915_gem_context.h"
> > +
> > +/*
> > + * Anything above threshold is considered as HIGH load, less is
> > +considered
> > + * as LOW load and equal is considered as MEDIUM load.
> > + *
> > + * The threshold value of three active requests pending.
> > + */
> > +#define PENDING_THRESHOLD_MEDIUM 3
> > +
> > +#define SLACK_TIMER_NSEC 100 /* Timer range in nano second */
> > +
> > +enum hrtimer_restart predictive_load_cb(struct hrtimer *hrtimer) {
> > +   struct drm_i915_private *dev_priv =
> > +   container_of(hrtimer, typeof(*dev_priv), pred_timer);
> > +   struct i915_gem_context *ctx;
> > +   enum gem_load_type load_type;
> > +   unsigned int req_pending;
> > +
> > +   list_for_each_entry(ctx, &dev_priv->gem.contexts.list, link) {
> > +   req_pending = atomic_read(&ctx->req_cnt);
> > +
> > +   /*
> > +* Transitioning to low state whenever pending request is zero
> > +* would cause vacillation between low and high stat

Re: [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Dynamic EU configuration of Slice/Sub-slice/EU (rev6)

2020-03-13 Thread Navik, Ankit P

Hi Srinivas, 

This will break OA counter.
I am already working with Tvrtko to make patch scalable
and to make it as per the flow. 
Kindly wait for upcoming patch. 

PS: Coding guideline: 
https://www.kernel.org/doc/html/v4.17/process/submitting-patches.html 

Regards,
Ankit

> -Original Message-
> From: Patchwork 
> Sent: Friday, March 13, 2020 7:34 PM
> To: Navik, Ankit P 
> Cc: intel-gfx@lists.freedesktop.org
> Subject: ✗ Fi.CI.SPARSE: warning for Dynamic EU configuration of Slice/Sub-
> slice/EU (rev6)
> 
> == Series Details ==
> 
> Series: Dynamic EU configuration of Slice/Sub-slice/EU (rev6)
> URL   : https://patchwork.freedesktop.org/series/69980/
> State : warning
> 
> == Summary ==
> 
> $ dim sparse origin/drm-tip
> Sparse version: v0.6.0
> Commit: drm/i915: Get active pending request for given context
> +drivers/gpu/drm/i915/gt/intel_lrc.c:2162:61:expected struct atomic_t 
> const
> [usertype] *v
> +drivers/gpu/drm/i915/gt/intel_lrc.c:2162:61:got struct atomic_t [noderef]
>  *
> +drivers/gpu/drm/i915/gt/intel_lrc.c:2162:61: warning: incorrect type in
> argument 1 (different address spaces)
> +drivers/gpu/drm/i915/gt/intel_lrc.c:2163:64:expected struct atomic_t
> [usertype] *v
> +drivers/gpu/drm/i915/gt/intel_lrc.c:2163:64:got struct atomic_t [noderef]
>  *
> +drivers/gpu/drm/i915/gt/intel_lrc.c:2163:64: warning: incorrect type in
> +argument 1 (different address spaces)
> 
> Commit: drm/i915: set optimum eu/slice/sub-slice configuration based on load
> type
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3047:59:expected struct
> i915_gem_context *ctx
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3047:59:got struct i915_gem_context
> [noderef]  *const gem_context
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3047:59: warning: incorrect type in
> +argument 1 (different address spaces)
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3077:15: warning: dereference of
> +noderef expression
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3077:45: warning: dereference of
> +noderef expression
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3078:19: warning: dereference of
> +noderef expression
> +drivers/gpu/drm/i915/gt/intel_lrc.c:3078:48: warning: dereference of
> +noderef expression
> 
> Commit: drm/i915: Predictive governor to control slice/subslice/eu Okay!

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Get active pending request for given context

2018-11-05 Thread Navik, Ankit P

Hi Tvrtko,

> -Original Message-
> From: Tvrtko Ursulin [mailto:tvrtko.ursu...@linux.intel.com]
> Sent: Friday, September 21, 2018 6:10 PM
> To: J Karanje, Kedar ; intel-
> g...@lists.freedesktop.org
> Cc: Diwakar, Praveen ; Marathe, Yogesh
> ; Navik, Ankit P ;
> Muthukumar, Aravindan 
> Subject: Re: [Intel-gfx] [PATCH 1/4] drm/i915: Get active pending request for
> given context
> 
> 
> On 21/09/2018 10:13, kedar.j.kara...@intel.com wrote:
> > From: Praveen Diwakar 
> >
> > This patch gives us the active pending request count which is yet to
> > be submitted to the GPU
> >
> > Change-Id: I10c2828ad0f1a0b7af147835737134e07a2d5b6d
> > Signed-off-by: Praveen Diwakar 
> > Signed-off-by: Yogesh Marathe 
> > Signed-off-by: Aravindan Muthukumar 
> > Signed-off-by: Kedar J Karanje 
> > Signed-off-by: Ankit Navik 
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c| 1 +
> >   drivers/gpu/drm/i915/i915_drv.h| 5 +
> >   drivers/gpu/drm/i915/i915_gem_context.c| 1 +
> >   drivers/gpu/drm/i915/i915_gem_context.h| 6 ++
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 +
> >   drivers/gpu/drm/i915/intel_lrc.c   | 6 ++
> >   6 files changed, 24 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> > b/drivers/gpu/drm/i915/i915_drv.c index f8cfd16..d37c46e 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -903,6 +903,7 @@ static int i915_driver_init_early(struct
> drm_i915_private *dev_priv,
> > mutex_init(&dev_priv->av_mutex);
> > mutex_init(&dev_priv->wm.wm_mutex);
> > mutex_init(&dev_priv->pps_mutex);
> > +   mutex_init(&dev_priv->pred_mutex);
> >
> > i915_memcpy_init_early(dev_priv);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > b/drivers/gpu/drm/i915/i915_drv.h index 4aca534..137ec33 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1609,6 +1609,11 @@ struct drm_i915_private {
> >  * controller on different i2c buses. */
> > struct mutex gmbus_mutex;
> >
> > +   /** pred_mutex protects against councurrent usage of pending
> > +* request counter for multiple contexts
> > +*/
> > +   struct mutex pred_mutex;
> > +
> > /**
> >  * Base address of the gmbus and gpio block.
> >  */
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index b10770c..30932d9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -387,6 +387,7 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> > }
> >
> > trace_i915_context_create(ctx);
> > +   ctx->req_cnt = 0;
> >
> > return ctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> > b/drivers/gpu/drm/i915/i915_gem_context.h
> > index b116e49..243ea22 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -194,6 +194,12 @@ struct i915_gem_context {
> >  * context close.
> >  */
> > struct list_head handles_list;
> > +
> > +   /** req_cnt: tracks the pending commands, based on which we decide
> to
> > +* go for low/medium/high load configuration of the GPU, this is
> > +* controlled via a mutex
> > +*/
> > +   u64 req_cnt;
> 
> 64-bit is too wide and mutex causes you problems later in the series.
> I'd suggest atomic_t.
Changes done in v2. 
> 
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> > i915_gem_context *ctx) diff --git
> > a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > index 3f0c612..f799ff9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > @@ -2178,6 +2178,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >struct drm_syncobj **fences)
> >   {
> > struct i915_execbuffer eb;
> > +   struct drm_i915_private *dev_priv = to_i915(dev);
> > struct dma_fence *in_fence = NULL;
> > struct sync_file *out_fence = NULL;
> > int out_fence_fd = -1;
> > @@ -2390,6 +2391,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >  */
> > eb.request->batch = eb.batch;
> >
&

Re: [Intel-gfx] [PATCH 2/4] drm/i915: Update render power clock state configuration for given context

2018-11-05 Thread Navik, Ankit P

Hi Tvrtko, 

> -Original Message-
> From: Tvrtko Ursulin [mailto:tvrtko.ursu...@linux.intel.com]
> Sent: Friday, September 21, 2018 6:22 PM
> To: J Karanje, Kedar ; intel-
> g...@lists.freedesktop.org
> Cc: Diwakar, Praveen ; Marathe, Yogesh
> ; Navik, Ankit P ;
> Muthukumar, Aravindan 
> Subject: Re: [Intel-gfx] [PATCH 2/4] drm/i915: Update render power clock state
> configuration for given context
> 
> 
> On 21/09/2018 10:13, kedar.j.kara...@intel.com wrote:
> > From: Praveen Diwakar 
> >
> > This patch will update power clock state register at runtime base on
> > the flag update_render_config which can set by any governor which
> > computes load and want to update rpcs register.
> > subsequent patches will have a timer based governor which computes
> > pending load/request.
> >
> > Change-Id: I4e7d2f484b957d5bd496e1decc59a69e3bc6d186
> > Signed-off-by: Praveen Diwakar 
> > Signed-off-by: Yogesh Marathe 
> > Signed-off-by: Aravindan Muthukumar 
> > Signed-off-by: Kedar J Karanje 
> > Signed-off-by: Ankit Navik 
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c |  5 
> >   drivers/gpu/drm/i915/i915_gem_context.h | 14 +++
> >   drivers/gpu/drm/i915/intel_lrc.c| 41
> +
> >   3 files changed, 60 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 30932d9..2838c1d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -388,6 +388,11 @@ i915_gem_create_context(struct drm_i915_private
> > *dev_priv,
> >
> > trace_i915_context_create(ctx);
> > ctx->req_cnt = 0;
> > +   ctx->update_render_config = 0;
> > +   ctx->slice_cnt = hweight8(INTEL_INFO(dev_priv)->sseu.slice_mask);
> > +   ctx->subslice_cnt = hweight8(
> > +   INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> > +   ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> >
> > return ctx;
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> > b/drivers/gpu/drm/i915/i915_gem_context.h
> > index 243ea22..52e341c 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.h
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> > @@ -200,6 +200,20 @@ struct i915_gem_context {
> >  * controlled via a mutex
> >  */
> > u64 req_cnt;
> > +
> > +   /** slice_cnt: used to set the # of slices to be enabled. */
> > +   u8 slice_cnt;
> > +
> > +   /** subslice_cnt: used to set the # of subslices to be enabled. */
> > +   u8 subslice_cnt;
> > +
> > +   /** eu_cnt: used to set the # of eu to be enabled. */
> > +   u8 eu_cnt;
> > +
> > +   /** update_render_config: to track the updates to the render
> > +* configuration (S/SS/EU Configuration on the GPU)
> > +*/
> > +   bool update_render_config;
> >   };
> >
> >   static inline bool i915_gem_context_is_closed(const struct
> > i915_gem_context *ctx) diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> > b/drivers/gpu/drm/i915/intel_lrc.c
> > index 039fbdb..d2d0e7d 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -364,6 +364,36 @@ execlists_unwind_incomplete_requests(struct
> intel_engine_execlists *execlists)
> > spin_unlock_irqrestore(&engine->timeline.lock, flags);
> >   }
> >
> > +static u32
> > +get_context_rpcs_config(struct i915_gem_context *ctx) {
> > +   struct drm_i915_private *dev_priv = ctx->i915;
> > +   u32 rpcs = 0;
> > +
> > +   if (INTEL_GEN(dev_priv) < 8)
> > +   return 0;
> > +
> > +   if (INTEL_INFO(dev_priv)->sseu.has_slice_pg) {
> > +   rpcs |= GEN8_RPCS_S_CNT_ENABLE;
> > +   rpcs |= ctx->slice_cnt << GEN8_RPCS_S_CNT_SHIFT;
> > +   rpcs |= GEN8_RPCS_ENABLE;
> > +   }
> > +
> > +   if (INTEL_INFO(dev_priv)->sseu.has_subslice_pg) {
> > +   rpcs |= GEN8_RPCS_SS_CNT_ENABLE;
> > +   rpcs |= ctx->subslice_cnt << GEN8_RPCS_SS_CNT_SHIFT;
> > +   rpcs |= GEN8_RPCS_ENABLE;
> > +   }
> > +
> > +   if (INTEL_INFO(dev_priv)->sseu.has_eu_pg) {
> > +   rpcs |= ctx->eu_cnt << GEN8_RPCS_EU_MIN_SHIFT;
> > +   rpcs |= ctx->eu_cnt << GEN8_RPCS_EU_MAX_SHIFT;
> > +   rpcs |= GEN8_RPCS_ENABLE;
> > +   }
> > +
> > +   r

Re: [Intel-gfx] [PATCH 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type

2018-11-05 Thread Navik, Ankit P

Hi Tvrtko, 
Your review changes are incorporated in v2. 
Regards, Ankit
> -Original Message-
> From: Tvrtko Ursulin [mailto:tvrtko.ursu...@linux.intel.com]
> Sent: Friday, September 21, 2018 6:36 PM
> To: J Karanje, Kedar ; intel-
> g...@lists.freedesktop.org
> Cc: Diwakar, Praveen ; Marathe, Yogesh
> ; Navik, Ankit P ;
> Muthukumar, Aravindan 
> Subject: Re: [Intel-gfx] [PATCH 3/4] drm/i915: set optimum eu/slice/sub-slice
> configuration based on load type
> 
> 
> On 21/09/2018 10:13, kedar.j.kara...@intel.com wrote:
> > From: Praveen Diwakar 
> >
> > This patch will select optimum eu/slice/sub-slice configuration based
> > on type of load (low, medium, high) as input.
> > Based on our readings and experiments we have predefined set of
> > optimum configuration for each platform(CHT, KBL).
> > i915_set_optimum_config will select optimum configuration from
> > pre-defined optimum configuration table(opt_config).
> >
> > Change-Id: I3a6a2a6b01b3d3c97995f5403aef3c6fa989
> > Signed-off-by: Praveen Diwakar 
> > Signed-off-by: Yogesh Marathe 
> > Signed-off-by: Aravindan Muthukumar 
> > Signed-off-by: Kedar J Karanje 
> > Signed-off-by: Ankit Navik 
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c | 46
> +
> >   drivers/gpu/drm/i915/i915_gem_context.h | 32 +++
> >   2 files changed, 78 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> > b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2838c1d..1b76410 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -94,6 +94,32 @@
> >
> >   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
> >
> > +static struct optimum_config
> opt_config[TIER_VERSION_MAX][LOAD_TYPE_MAX] = {
> > +   {
> > +   /* Cherry trail low */
> > +   { 1, 1, 4},
> > +   /* Cherry trail medium */
> > +   { 1, 1, 6},
> > +   /* Cherry trail high */
> > +   { 1, 2, 6}
> > +   },
> > +   {
> > +   /* kbl gt2 low */
> > +   { 2, 3, 4},
> > +   /* kbl gt2 medium */
> > +   { 2, 3, 6},
> > +   /* kbl gt2 high */
> > +   { 2, 3, 8}
> > +   },
> > +   {
> > +   /* kbl gt3 low */
> > +   { 2, 3, 4},
> > +   /* kbl gt3 medium */
> > +   { 2, 3, 6},
> > +   /* kbl gt3 high */
> > +   { 2, 3, 8}
> > +   }
> > +};
> >   static void lut_close(struct i915_gem_context *ctx)
> >   {
> > struct i915_lut_handle *lut, *ln;
> > @@ -393,10 +419,30 @@ i915_gem_create_context(struct drm_i915_private
> *dev_priv,
> > ctx->subslice_cnt = hweight8(
> > INTEL_INFO(dev_priv)->sseu.subslice_mask[0]);
> > ctx->eu_cnt = INTEL_INFO(dev_priv)->sseu.eu_per_subslice;
> > +   ctx->load_type = 0;
> > +   ctx->prev_load_type = 0;
> >
> > return ctx;
> >   }
> >
> > +
> > +void i915_set_optimum_config(int type, struct i915_gem_context *ctx,
> 
> Type for type should be enum.
> 
> Name the function as i915_gem_context_set_load_type(ctx, type).
> 
> > +   enum gem_tier_versions version)
> > +{
> > +   struct intel_context *ce = &ctx->__engine[RCS];
> > +   u32 *reg_state = ce->lrc_reg_state;
> > +   u32 rpcs_config = 0;
> 
> Unused locals?
> 
> > +   /* Call opt_config to get correct configuration for eu,slice,subslice */
> > +   ctx->slice_cnt = (u8)opt_config[version][type].slice;
> > +   ctx->subslice_cnt = (u8)opt_config[version][type].subslice;
> > +   ctx->eu_cnt = (u8)opt_config[version][type].eu;
> 
> You could store the correct table for the device in dev_priv and use the 
> static
> table to assign/populate on device init time.
> 
> > +
> > +   /* Enabling this to update the rpcs */
> > +   if (ctx->prev_load_type != type)
> > +   ctx->update_render_config = 1;
> 
> ctx->update_render_config was unused in last patch. So patch ordering is
> a bit suboptimal but I don't know.
> 
> > +
> > +   ctx->prev_load_type = type;
> > +}
> >   /**
> >* i915_gem_context_create_gvt - create a GVT GEM context
> >* @dev: drm device *
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.h
> b/drivers/gpu/drm/i915/i915_gem_context.h
> > index 52e341c..50183e6 100644
> > --- a/drive

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Predictive governor to control eu/slice/subslice based on workload

2018-11-05 Thread Navik, Ankit P

Hi Tvrtko, 

> -Original Message-
> From: Tvrtko Ursulin [mailto:tvrtko.ursu...@linux.intel.com]
> Sent: Tuesday, September 25, 2018 1:56 PM
> To: Navik, Ankit P ; intel-gfx@lists.freedesktop.org
> Cc: J Karanje, Kedar ; Diwakar, Praveen
> ; Marathe, Yogesh
> ; Muthukumar, Aravindan
> 
> Subject: Re: [Intel-gfx] [PATCH 4/4] drm/i915: Predictive governor to control
> eu/slice/subslice based on workload
> 
> 
> On 25/09/2018 07:12, Navik, Ankit P wrote:
> > Hi Tvrtko,
> >
> > Thank you for your valuable comments. We have gone through it.
> > I'll be submitting revised patch-sets after incorporating all your review
> comments.
> 
> You're welcome!
> 
> Btw one more thing - you can suspend your timer when GPU goes idle and
> restart it when active. See for instance i915_pmu_gt_parked/unparked where to
> put the hooks.

We are working on this and we will send as new patch on top of series.
> 
> Regards,
> 
> Tvrtko
> 
> >> On 21/09/2018 10:13, kedar.j.kara...@intel.com wrote:
> >>> From: Praveen Diwakar 
> >>>
> >>> High resoluton timer is used for this purpose.
> >>>
> >>> Debugfs is provided to enable/disable/update timer configuration
> >>>
> >>> Change-Id: I35d692c5afe962fcad4573185bc6f744487711d0
> >>> Signed-off-by: Praveen Diwakar 
> >>> Signed-off-by: Yogesh Marathe 
> >>> Signed-off-by: Aravindan Muthukumar 
> >>> Signed-off-by: Kedar J Karanje 
> >>> Signed-off-by: Ankit Navik  >>> ---
> >>>drivers/gpu/drm/i915/i915_debugfs.c | 94
> >> -
> >>>drivers/gpu/drm/i915/i915_drv.h |  1 +
> >>>2 files changed, 94 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
> >>> b/drivers/gpu/drm/i915/i915_debugfs.c
> >>> index f9ce35d..81ba509 100644
> >>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> >>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> >>> @@ -4740,6 +4740,97 @@ static const struct drm_info_list
> >> i915_debugfs_list[] = {
> >>>   {"i915_drrs_status", i915_drrs_status, 0},
> >>>   {"i915_rps_boost_info", i915_rps_boost_info, 0},
> >>>};
> >>> +
> >>> +#define POLL_PERIOD_MS   (1000 * 1000)
> >>> +#define PENDING_REQ_00 /* No active request pending*/
> >>> +#define PENDING_REQ_33 /* Threshold value of 3 active request
> >> pending*/
> >>> +   /* Anything above this is considered as HIGH load
> >>> +* context
> >>> +*/
> >>> +   /* And less is considered as LOW load*/
> >>> +   /* And equal is considered as mediaum load */
> >>
> >> Wonky comments and some typos up to here.
Changes done in v2.
> >>
> >>> +
> >>> +static int predictive_load_enable;
> >>> +static int predictive_load_timer_init;
> >>> +
> >>> +static enum hrtimer_restart predictive_load_cb(struct hrtimer
> >>> +*hrtimer) {
> >>> + struct drm_i915_private *dev_priv =
> >>> + container_of(hrtimer, typeof(*dev_priv),
> >>> + pred_timer);
> >>> + enum intel_engine_id id;
> >>> + struct intel_engine_cs *engine;
> >>
> >> Some unused's.
Changes done in v2. 
> >>
> >>> + struct i915_gem_context *ctx;
> >>> + u64 req_pending;
> >>
> >> unsigned long, and also please try to order declaration so the right
> >> edge of text is moving in one direction only.
Changes done in v2.
> >>
> >>> +
> >>> + list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> >>> +
> >>> + if (!ctx->name)
> >>> + continue;
> >>
> >> What is this?
Just an error check for invalid context.
> >>
> >>> +
> >>> + mutex_lock(&dev_priv->pred_mutex);
> >>
> >> Here the mutex bites you since you cannot sleep in the timer callback.
> >> atomic_t would solve it. Or would a native unsigned int/long since
> >> lock to read a native word on x86 is not needed.
Changes done in v2.
> >>
> >>> + req_pending = ctx->req_cnt;
> >>> + mutex_unlock(&dev_priv->pred_mutex);
> >>&

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Predictive governor to control eu/slice/subslice based on workload

2018-09-24 Thread Navik, Ankit P

Hi Tvrtko, 

Thank you for your valuable comments. We have gone through it. 
I'll be submitting revised patch-sets after incorporating all your review 
comments.

> On 21/09/2018 10:13, kedar.j.kara...@intel.com wrote:
> > From: Praveen Diwakar 
> >
> > High resoluton timer is used for this purpose.
> >
> > Debugfs is provided to enable/disable/update timer configuration
> >
> > Change-Id: I35d692c5afe962fcad4573185bc6f744487711d0
> > Signed-off-by: Praveen Diwakar 
> > Signed-off-by: Yogesh Marathe 
> > Signed-off-by: Aravindan Muthukumar 
> > Signed-off-by: Kedar J Karanje 
> > Signed-off-by: Ankit Navik  > ---
> >   drivers/gpu/drm/i915/i915_debugfs.c | 94
> -
> >   drivers/gpu/drm/i915/i915_drv.h |  1 +
> >   2 files changed, 94 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c
> > b/drivers/gpu/drm/i915/i915_debugfs.c
> > index f9ce35d..81ba509 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -4740,6 +4740,97 @@ static const struct drm_info_list
> i915_debugfs_list[] = {
> > {"i915_drrs_status", i915_drrs_status, 0},
> > {"i915_rps_boost_info", i915_rps_boost_info, 0},
> >   };
> > +
> > +#define POLL_PERIOD_MS (1000 * 1000)
> > +#define PENDING_REQ_0  0 /* No active request pending*/
> > +#define PENDING_REQ_3  3 /* Threshold value of 3 active request
> pending*/
> > + /* Anything above this is considered as HIGH load
> > +  * context
> > +  */
> > + /* And less is considered as LOW load*/
> > + /* And equal is considered as mediaum load */
> 
> Wonky comments and some typos up to here.
> 
> > +
> > +static int predictive_load_enable;
> > +static int predictive_load_timer_init;
> > +
> > +static enum hrtimer_restart predictive_load_cb(struct hrtimer
> > +*hrtimer) {
> > +   struct drm_i915_private *dev_priv =
> > +   container_of(hrtimer, typeof(*dev_priv),
> > +   pred_timer);
> > +   enum intel_engine_id id;
> > +   struct intel_engine_cs *engine;
> 
> Some unused's.
> 
> > +   struct i915_gem_context *ctx;
> > +   u64 req_pending;
> 
> unsigned long, and also please try to order declaration so the right edge of 
> text
> is moving in one direction only.
> 
> > +
> > +   list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> > +
> > +   if (!ctx->name)
> > +   continue;
> 
> What is this?
> 
> > +
> > +   mutex_lock(&dev_priv->pred_mutex);
> 
> Here the mutex bites you since you cannot sleep in the timer callback.
> atomic_t would solve it. Or would a native unsigned int/long since lock to 
> read a
> native word on x86 is not needed.
> 
> > +   req_pending = ctx->req_cnt;
> > +   mutex_unlock(&dev_priv->pred_mutex);
> > +
> > +   if (req_pending == PENDING_REQ_0)
> > +   continue;
> > +
> > +   if (req_pending > PENDING_REQ_3)
> > +   ctx->load_type = LOAD_TYPE_HIGH;
> > +   else if (req_pending == PENDING_REQ_3)
> > +   ctx->load_type = LOAD_TYPE_MEDIUM;
> > +   else if (req_pending < PENDING_REQ_3)
> 
> Must be smaller if not greater or equal, but maybe the compiler does that for
> you.
> 
> > +   ctx->load_type = LOAD_TYPE_LOW;
> > +
> > +   i915_set_optimum_config(ctx->load_type, ctx,
> KABYLAKE_GT3);
> 
> Only KBL? Idea to put the table in dev_priv FTW! :)
> 
> ctx->load_type used only as a temporary uncovered here? :)
> 
> > +   }
> > +
> > +   hrtimer_forward_now(hrtimer,
> > +
>   ns_to_ktime(predictive_load_enable*POLL_PERIOD_MS));
> 
> Or HRTIMER_NORESTART if disabled? Hard to call it, details..
> 
> > +
> > +   return HRTIMER_RESTART;
> > +}
> > +
> > +static int
> > +i915_predictive_load_get(void *data, u64 *val) {
> > +   struct drm_i915_private *dev_priv = data;
> > +
> > +   *val = predictive_load_enable;
> > +   return 0;
> > +}
> > +
> > +static int
> > +i915_predictive_load_set(void *data, u64 val) {
> > +   struct drm_i915_private *dev_priv = data;
> > +   struct intel_device_info *info;
> > +
> > +   info = mkwrite_device_info(dev_priv);
> 
> Unused, why?
> 
> > +
> > +   predictive_load_enable = val;
> > +
> > +   if (predictive_load_enable) {
> > +   if (!predictive_load_timer_init) {
> > +   hrtimer_init(&dev_priv->pred_timer,
> CLOCK_MONOTONIC,
> > +   HRTIMER_MODE_REL);
> > +   dev_priv->pred_timer.function = predictive_load_cb;
> > +   predictive_load_timer_init = 1;
> 
> Move timer init to dev_priv setup.
> 
> > +   }
> > +   hrtimer_start(&dev_priv->pred_timer,
> > +
>   ns_to_ktime(predictive_load_enable*POLL_PERIOD_MS),
> > +   HRTIMER_MODE_REL_PINNED);
> 
> Two threads can race to here.
> 
> Also you can gi

Re: [Intel-gfx] [PATCH v2 0/4] Dynamic EU configuration of Slice/Subslice/EU.

Re: [Intel-gfx] [PATCH v2 1/4] drm/i915: Get active pending request for given context

Re: [Intel-gfx] [PATCH v3 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type

Re: [Intel-gfx] [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.

Re: [Intel-gfx] [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.

Re: [Intel-gfx] [PATCH v5 0/3] drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel

Re: [Intel-gfx] [PATCH v6 3/3] drm/i915: Predictive governor to control slice/subslice/eu

Re: [Intel-gfx] [PATCH v6 3/3] drm/i915: Predictive governor to control slice/subslice/eu

Re: [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Dynamic EU configuration of Slice/Sub-slice/EU (rev6)

Re: [Intel-gfx] [PATCH 1/4] drm/i915: Get active pending request for given context

Re: [Intel-gfx] [PATCH 2/4] drm/i915: Update render power clock state configuration for given context

Re: [Intel-gfx] [PATCH 3/4] drm/i915: set optimum eu/slice/sub-slice configuration based on load type

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Predictive governor to control eu/slice/subslice based on workload

Re: [Intel-gfx] [PATCH 4/4] drm/i915: Predictive governor to control eu/slice/subslice based on workload

14 matches

Site Navigation

Mail list logo

Footer information