Re: [PATCH 3/3 v3] cpufreq: governor: Replace timers with utilization update callbacks

Viresh Kumar Thu, 04 Feb 2016 22:51:07 -0800

Will suck some more blood, sorry about that :)

On 05-02-16, 02:28, Rafael J. Wysocki wrote:
> The v3 addresses some review comments from Viresh and a couple of issues found
> by me.  Changes from the previous version:
> - Synchronize gov_cancel_work() with the (new) irq_work properly.
> - Add a comment about the (new) memory barrier.
> - Move samle_delay_ns to "shared" (struct cpu_common_dbs_info) so it is the


sample_delay_ns was already there, you moved last_sample_time instead :)

> @@ -139,7 +141,11 @@ struct cpu_common_dbs_info {
>       struct mutex timer_mutex;
>  
>       ktime_t time_stamp;
> +     u64 last_sample_time;
> +     s64 sample_delay_ns;
>       atomic_t skip_work;
> +     struct irq_work irq_work;

Just for my understanding, why can't we schedule a normal work directly? Is it
because of scheduler's hotpath and queue_work() is slow?

> Index: linux-pm/drivers/cpufreq/cpufreq_governor.c
> +void gov_set_update_util(struct cpu_common_dbs_info *shared,
> +                      unsigned int delay_us)
>  {
> +     struct cpufreq_policy *policy = shared->policy;
>       struct dbs_data *dbs_data = policy->governor_data;
> -     struct cpu_dbs_info *cdbs;
>       int cpu;
>  
> +     shared->sample_delay_ns = delay_us * NSEC_PER_USEC;
> +     shared->time_stamp = ktime_get();
> +     shared->last_sample_time = 0;

Calling this routine from update_sampling_rate() is still wrong. Because that
will also make last_sample_time = 0, which means that we will schedule the
irq-work on the next util update.

We surely didn't wanted that to happen, isn't it ?

>       for_each_cpu(cpu, policy->cpus) {
> -             cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> -             cdbs->timer.expires = jiffies + delay;
> -             add_timer_on(&cdbs->timer, cpu);
> +             struct cpu_dbs_info *cdbs = dbs_data->cdata->get_cpu_cdbs(cpu);
> +
> +             cpufreq_set_update_util_data(cpu, &cdbs->update_util);
>       }
>  }
> -EXPORT_SYMBOL_GPL(gov_add_timers);
> +EXPORT_SYMBOL_GPL(gov_set_update_util);

>  void gov_cancel_work(struct cpu_common_dbs_info *shared)
>  {
> -     /* Tell dbs_timer_handler() to skip queuing up work items. */
> +     /* Tell dbs_update_util_handler() to skip queuing up work items. */
>       atomic_inc(&shared->skip_work);
>       /*
> -      * If dbs_timer_handler() is already running, it may not notice the
> -      * incremented skip_work, so wait for it to complete to prevent its work
> -      * item from being queued up after the cancel_work_sync() below.
> -      */
> -     gov_cancel_timers(shared->policy);
> -     /*
> -      * In case dbs_timer_handler() managed to run and spawn a work item
> -      * before the timers have been canceled, wait for that work item to
> -      * complete and then cancel all of the timers set up by it.  If
> -      * dbs_timer_handler() runs again at that point, it will see the
> -      * positive value of skip_work and won't spawn any more work items.
> +      * If dbs_update_util_handler() is already running, it may not notice
> +      * the incremented skip_work, so wait for it to complete to prevent its
> +      * work item from being queued up after the cancel_work_sync() below.
>        */
> +     gov_clear_update_util(shared->policy);
> +     wait_for_completion(&shared->irq_work_done);

I may be wrong, but isn't running irq_work_sync() enough here instead ?

>       cancel_work_sync(&shared->work);
> -     gov_cancel_timers(shared->policy);
>       atomic_set(&shared->skip_work, 0);
>  }
>  EXPORT_SYMBOL_GPL(gov_cancel_work);

> Index: linux-pm/drivers/cpufreq/cpufreq_ondemand.c
> @@ -264,7 +260,7 @@ static void update_sampling_rate(struct
>               struct od_cpu_dbs_info_s *dbs_info;
>               struct cpu_dbs_info *cdbs;
>               struct cpu_common_dbs_info *shared;
> -             unsigned long next_sampling, appointed_at;
> +             ktime_t next_sampling, appointed_at;
>  
>               dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
>               cdbs = &dbs_info->cdbs;
> @@ -292,16 +288,19 @@ static void update_sampling_rate(struct
>                       continue;
>  
>               /*
> -              * Checking this for any CPU should be fine, timers for all of
> -              * them are scheduled together.
> +              * Checking this for any CPU sharing the policy should be fine,
> +              * they are all scheduled to sample at the same time.
>                */
> -             next_sampling = jiffies + usecs_to_jiffies(new_rate);
> -             appointed_at = dbs_info->cdbs.timer.expires;
> +             next_sampling = ktime_add_us(ktime_get(), new_rate);
>  
> -             if (time_before(next_sampling, appointed_at)) {
> -                     gov_cancel_work(shared);
> -                     gov_add_timers(policy, usecs_to_jiffies(new_rate));
> +             mutex_lock(&shared->timer_mutex);
> +             appointed_at = ktime_add_ns(shared->time_stamp,
> +                                         shared->sample_delay_ns);
> +             mutex_unlock(&shared->timer_mutex);
>  
> +             if (ktime_before(next_sampling, appointed_at)) {
> +                     gov_cancel_work(shared);
> +                     gov_set_update_util(shared, new_rate);

So, I don't think we need to call these heavy routines at all here. Just use the
above timer_mutex to update time_stamp and sample_delay_ns.

Over that, that particular change might turn out to be a big big bonus for us.
Why would we be taking the od_dbs_cdata.mutex in this routine anymore ? We
aren't removing/adding timers anymore, just update the sample_delay_ns and there
shouldn't be any races. Ofcourse you need to use the same timer_mutex in util's
handler as well around sample_delay_ns, I believe.

And that will also kill the circular dependency lockdep we have been chasing
badly :)

Or am I being over excited here ? :(

-- 
viresh

Re: [PATCH 3/3 v3] cpufreq: governor: Replace timers with utilization update callbacks

Reply via email to