On 12/12/2013 09:08 AM, Peter Zijlstra wrote:
In order to avoid the runtime condition and variable load turn
sched_clock_stable into a static_key.

Also provide a shorter implementation of local_clock() and
cpu_clock(int) when sched_clock_stable==1.

                         MAINLINE   PRE       POST

     sched_clock_stable: 1          1         1
     (cold) sched_clock: 329841     221876    215295
     (cold) local_clock: 301773     234692    220773
     (warm) sched_clock: 38375      25602     25659
     (warm) local_clock: 100371     33265     27242
     (warm) rdtsc:       27340      24214     24208
     sched_clock_stable: 0          0         0
     (cold) sched_clock: 382634     235941    237019
     (cold) local_clock: 396890     297017    294819
     (warm) sched_clock: 38194      25233     25609
     (warm) local_clock: 143452     71234     71232
     (warm) rdtsc:       27345      24245     24243

Signed-off-by: Peter Zijlstra<pet...@infradead.org>

Hi Peter,

This patch seems to be causing an issue with booting a KVM guest. It seems that 
it
causes the time to go random during early boot process:

        [    0.000000] Initmem setup node 30 [mem 0x12ee000000-0x138dffffff]
        [    0.000000]   NODE_DATA [mem 0xcfa42000-0xcfa72fff]
        [    0.000000]     NODE_DATA(30) on node 1
        [    0.000000] Initmem setup node 31 [mem 0x138e000000-0x142fffffff]
        [    0.000000]   NODE_DATA [mem 0xcfa11000-0xcfa41fff]
        [    0.000000]     NODE_DATA(31) on node 1
        [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
        [    0.000000] kvm-clock: cpu 0, msr 0:cf991001, boot clock
        [133538.294040] Zone ranges:
        [133538.294338]   DMA      [mem 0x00001000-0x00ffffff]
        [133538.294804]   DMA32    [mem 0x01000000-0xffffffff]
        [133538.295223]   Normal   [mem 0x100000000-0x142fffffff]
        [133538.295670] Movable zone start for each node

Looking at the code, initially I though that the problem is with:

        +void set_sched_clock_stable(void)
        +{
        +       if (!sched_clock_stable())
        +               static_key_slow_dec(&__sched_clock_stable);
        +}
        +
        +void clear_sched_clock_stable(void)
        +{
        +       /* XXX worry about clock continuity */
        +       if (sched_clock_stable())
        +               static_key_slow_inc(&__sched_clock_stable);
        +}

I think the jump label inc/dec is reversed here. We would want to inc it when 
enabling
and dec when disabling, no?
However, trying to reverse the two didn't help. I was still seeing the same odd 
behaviour.

I tried doing a simple conversion to using a simple var like before, which 
looks like this:

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 6bd6a67..a035932 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -76,26 +76,21 @@ EXPORT_SYMBOL_GPL(sched_clock);
 __read_mostly int sched_clock_running;

 #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
-static struct static_key __sched_clock_stable = STATIC_KEY_INIT;
+static int __sched_clock_stable;

 int sched_clock_stable(void)
 {
-       if (static_key_false(&__sched_clock_stable))
-               return false;
-       return true;
+       return __sched_clock_stable;
 }

 void set_sched_clock_stable(void)
 {
-       if (!sched_clock_stable())
-               static_key_slow_dec(&__sched_clock_stable);
+       __sched_clock_stable = 1;
 }

 static void __clear_sched_clock_stable(struct work_struct *work)
 {
-       /* XXX worry about clock continuity */
-       if (sched_clock_stable())
-               static_key_slow_inc(&__sched_clock_stable);
+       __sched_clock_stable = 0;
 }

 static DECLARE_WORK(sched_clock_work, __clear_sched_clock_stable);
@@ -340,7 +335,7 @@ EXPORT_SYMBOL_GPL(sched_clock_idle_wakeup_event);
  */
 u64 cpu_clock(int cpu)
 {
-       if (static_key_false(&__sched_clock_stable))
+       if (!sched_clock_stable())
                return sched_clock_cpu(cpu);

        return sched_clock();
@@ -355,7 +350,7 @@ u64 cpu_clock(int cpu)
  */
 u64 local_clock(void)
 {
-       if (static_key_false(&__sched_clock_stable))
+       if (!sched_clock_stable())
                return sched_clock_cpu(raw_smp_processor_id());

        return sched_clock();


This has corrected the issue:

        [    0.000000] Initmem setup node 31 [mem 0x138e000000-0x142fffffff]
        [    0.000000]   NODE_DATA [mem 0xcfa11000-0xcfa41fff]
        [    0.000000]     NODE_DATA(31) on node 1
        [    0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00
        [    0.000000] kvm-clock: cpu 0, msr 0:cf991001, boot clock
        [    0.000000] Zone ranges:
        [    0.000000]   DMA      [mem 0x00001000-0x00ffffff]
        [    0.000000]   DMA32    [mem 0x01000000-0xffffffff]
        [    0.000000]   Normal   [mem 0x100000000-0x142fffffff]
        [    0.000000] Movable zone start for each node
        [    0.000000] Early memory node ranges
                [ timing is correct for the rest of the boot]

At this point, I thought that there's something up with jump labels being used 
this early (?) and
tried compiling with CONFIG_JUMP_LABELS=n, this didn't solve the issue.

This makes me thing there's something different related to jumplabels we're 
missing, as the
no-jumplabel config should be very similar to the patch I did above, I just 
can't figure
out what it is.


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to