Hi Chris,

On Wed, Mar 25, 2020 at 08:10:56AM +0000, Chris Wilson wrote:
> Measure and compare the energy consumed, as reported by the rapl MSR,
> by the GPU while in RC0 and RC6 states. Throw an error if RC6 does not
> at least halve the energy consumption of RC0, as this more than likely
> means we failed to enter RC0 correctly.
> 
> If we can't measure the energy draw with the MSR, then it will report 0
> for both measurements. Since the measurement works on all gen6+, this seems
> worth flagging as an error.
> 
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuopp...@linux.intel.com>
> Cc: Andi Shyti <andi.sh...@intel.com>

would be nice to have a revision history, given that I got quite 
some versions of this patch.

> +static u64 energy_uJ(struct intel_rc6 *rc6)
> +{
> +     unsigned long long power;
> +     u32 units;
> +
> +     if (rdmsrl_safe(MSR_RAPL_POWER_UNIT, &power))
> +             return 0;
> +
> +     units = (power & 0x1f00) >> 8;
> +
> +     if (rdmsrl_safe(MSR_PP1_ENERGY_STATUS, &power))
> +             return 0;
> +
> +     return (1000000 * power) >> units; /* convert to uJ */
> +}

shall we put this in a library?

>       res[0] = rc6_residency(rc6);
> +     dt = ktime_get();
> +     rc0_power = energy_uJ(rc6);
>       msleep(250);
> +     rc0_power = energy_uJ(rc6) - rc0_power;
> +     dt = ktime_sub(ktime_get(), dt);
>       res[1] = rc6_residency(rc6);
>       if ((res[1] - res[0]) >> 10) {
>               pr_err("RC6 residency increased by %lldus while disabled for 
> 250ms!\n",
> @@ -63,13 +85,23 @@ int live_rc6_manual(void *arg)
>               goto out_unlock;
>       }
>  
> +     rc0_power = div64_u64(NSEC_PER_SEC * rc0_power, ktime_to_ns(dt));
> +     if (!rc0_power) {

is this likely to happen?

>       res[0] = rc6_residency(rc6);
> +     dt = ktime_get();
> +     rc6_power = energy_uJ(rc6);
>       msleep(100);
> +     rc6_power = energy_uJ(rc6) - rc6_power;
> +     dt = ktime_sub(ktime_get(), dt);
>       res[1] = rc6_residency(rc6);
> -
>       if (res[1] == res[0]) {
>               pr_err("Did not enter RC6! RC6_STATE=%08x, RC6_CONTROL=%08x, 
> residency=%lld\n",
>                      intel_uncore_read_fw(gt->uncore, GEN6_RC_STATE),
> @@ -78,6 +110,15 @@ int live_rc6_manual(void *arg)
>               err = -EINVAL;
>       }
>  
> +     rc6_power = div64_u64(NSEC_PER_SEC * rc6_power, ktime_to_ns(dt));
> +     pr_info("GPU consumed %llduW in RC0 and %llduW in RC6\n",
> +             rc0_power, rc6_power);
> +     if (2 * rc6_power > rc0_power) {
> +             pr_err("GPU leaked energy while in RC6!\n");
> +             err = -EINVAL;
> +             goto out_unlock;
> +     }

nice,

Reviewed-by: Andi Shyti <andi.sh...@intel.com>

Thanks,
Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to