Re: [PATCH RFC nohz_full v2 6/7] nohz_full: Add full-system-idle state machine

Frederic Weisbecker Mon, 01 Jul 2013 09:36:23 -0700

On Fri, Jun 28, 2013 at 01:10:21PM -0700, Paul E. McKenney wrote:
>  /*
> + * Unconditionally force exit from full system-idle state.  This is
> + * invoked when a normal CPU exits idle, but must be called separately
> + * for the timekeeping CPU (tick_do_timer_cpu).  The reason for this
> + * is that the timekeeping CPU is permitted to take scheduling-clock
> + * interrupts while the system is in system-idle state, and of course
> + * rcu_sysidle_exit() has no way of distinguishing a scheduling-clock
> + * interrupt from any other type of interrupt.
> + */
> +void rcu_sysidle_force_exit(void)
> +{
> +     int oldstate = ACCESS_ONCE(full_sysidle_state);
> +     int newoldstate;
> +
> +     /*
> +      * Each pass through the following loop attempts to exit full
> +      * system-idle state.  If contention proves to be a problem,
> +      * a trylock-based contention tree could be used here.
> +      */
> +     while (oldstate > RCU_SYSIDLE_SHORT) {
> +             newoldstate = cmpxchg(&full_sysidle_state,
> +                                   oldstate, RCU_SYSIDLE_NOT);
> +             if (oldstate == newoldstate &&
> +                 oldstate == RCU_SYSIDLE_FULL_NOTED) {
> +                     rcu_kick_nohz_cpu(tick_do_timer_cpu);
> +                     return; /* We cleared it, done! */
> +             }
> +             oldstate = newoldstate;
> +     }
> +     smp_mb(); /* Order initial oldstate fetch vs. later non-idle work. */
> +}
> +
> +/*
>   * Invoked to note entry to irq or task transition from idle.  Note that
>   * usermode execution does -not- count as idle here!  The caller must
>   * have disabled interrupts.
> @@ -2474,6 +2506,214 @@ static void rcu_sysidle_exit(struct rcu_dynticks 
> *rdtp, int irq)
>       atomic_inc(&rdtp->dynticks_idle);
>       smp_mb__after_atomic_inc();
>       WARN_ON_ONCE(!(atomic_read(&rdtp->dynticks_idle) & 0x1));
> +
> +     /*
> +      * If we are the timekeeping CPU, we are permitted to be non-idle
> +      * during a system-idle state.  This must be the case, because
> +      * the timekeeping CPU has to take scheduling-clock interrupts
> +      * during the time that the system is transitioning to full
> +      * system-idle state.  This means that the timekeeping CPU must
> +      * invoke rcu_sysidle_force_exit() directly if it does anything
> +      * more than take a scheduling-clock interrupt.
> +      */
> +     if (smp_processor_id() == tick_do_timer_cpu)
> +             return;
> +
> +     /* Update system-idle state: We are clearly no longer fully idle! */
> +     rcu_sysidle_force_exit();
> +}
> +
> +/*
> + * Check to see if the current CPU is idle.  Note that usermode execution
> + * does not count as idle.  The caller must have disabled interrupts.
> + */
> +static void rcu_sysidle_check_cpu(struct rcu_data *rdp, bool *isidle,
> +                               unsigned long *maxj)
> +{
> +     int cur;
> +     int curnmi;
> +     unsigned long j;
> +     struct rcu_dynticks *rdtp = rdp->dynticks;
> +
> +     /*
> +      * If some other CPU has already reported non-idle, if this is
> +      * not the flavor of RCU that tracks sysidle state, or if this
> +      * is an offline or the timekeeping CPU, nothing to do.
> +      */
> +     if (!*isidle || rdp->rsp != rcu_sysidle_state ||
> +         cpu_is_offline(rdp->cpu) || rdp->cpu == tick_do_timer_cpu)
> +             return;
> +     /* WARN_ON_ONCE(smp_processor_id() != tick_do_timer_cpu); */
> +
> +     /*
> +      * Pick up current idle and NMI-nesting counters, check.  We check
> +      * for NMIs using RCU's main ->dynticks counter.  This works because
> +      * any time ->dynticks has its low bit set, ->dynticks_idle will
> +      * too -- unless the only reason that ->dynticks's low bit is set
> +      * is due to an NMI from idle.  Which is exactly the case we need
> +      * to account for.
> +      */
> +     cur = atomic_read(&rdtp->dynticks_idle);
> +     curnmi = atomic_read(&rdtp->dynticks);
> +     if ((cur & 0x1) || (curnmi & 0x1)) {


I think you wanted to ignore NMIs this time because they don't read walltime?

By the way they can still read jiffies, but unlike irq_enter(), nmi_enter()
don't catch up with missing jiffies update. So the behaviour doesn't change
compared to !NO_HZ_FULL.

> +             *isidle = 0; /* We are not idle! */
> +             return;
> +     }
> +     smp_mb(); /* Read counters before timestamps. */
> +
> +     /* Pick up timestamps. */
> +     j = ACCESS_ONCE(rdtp->dynticks_idle_jiffies);
> +     /* If this CPU entered idle more recently, update maxj timestamp. */
> +     if (ULONG_CMP_LT(*maxj, j))
> +             *maxj = j;

So I'm a bit confused with the ordering so I'm probably going to ask a silly 
question.

What makes sure that we are not reading a stale value of rdtp->dynticks_idle
in the following scenario:

    CPU 0                          CPU 1
    
                                   //CPU 1 idle
                                   //rdtp(1)->dynticks_idle == 0

sysidle_check_cpu(CPU 1) {
    rdtp(1)->dynticks_idle == 0
}
cmpxchg(full_sysidle_state, 
        ...RCU_SYSIDLE_SHORT)
                                   rcu_irq_exit() {
                                         rdtp(1)->dynticks_idle = 1
                                         smp_mb()
                                         rcu_sysidle_force_exit() {
                                            full_sysidle_state == 
RCU_SYSIDLE_SHORT
                                            // no cmpxchg
                                            smp_mb()
                                   ...

[1]
sysidle_check_cpu(CPU 1) {
    rdtp(1)->dynticks_idle == 0
}

cmpxchg(RCU_SYSIDLE_FULL, ...)

[2]
sysidle_check_cpu(CPU 1) {
    rdtp(1)->dynticks_idle == 0
}

cmpxchg(RCU_SYSIDLE_FULL_NOTED, ...)


I mean in [1] and [2] I can't see something in the ordering that guarantees 
that we see
the new value rdtp(1)->dynticks_idle == 1.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC nohz_full v2 6/7] nohz_full: Add full-system-idle state machine

Reply via email to