Ingo Molnar wrote:
> * Nicolas Boichat <[EMAIL PROTECTED]> wrote:
>
>   
>>> I found out which commit seems to cause these bugs:
>>> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d04f41e35343f1d788551fd3f753f51794f4afcf
>>>
>>> The latest GIT without this commit works fine, but doesn't with it.
>>>       
>> Sorry about blaming this commit. The problems happen randomly (about 1 
>> reboot over 2 is ok, at least with 2.6.21-rc4). I'll run more tests 
>> and post the results later.
>>     
>
> is this the 32-bit kernel? 

Yes it is.

> If yes then does commit 
> 4edc5db83f574dfcc8be35b7b96760ded543b360 (included in -rc5) fix it?
>   

I don't know precisely which commit you are talking about, but I
upgraded to 2.6.21-rc5, and the problem became more obvious.

>From what I experimented, I came to the conclusion that when the
frequency of the first CPU is wrongly detected, the timers get wrong
(i.e. tsc, hpet, with or without high-res timers).

It is not a new bug, it happened in 2.6.20 too (I also found an old
dmesg from 2.6.18-rc2 with the same frequency detection problem), except
that for some reasons it didn't cause that many obvious bugs, so I
didn't notice it at that time.

(in case you don't remember my first message, I have a Macbook Pro first
generation (Core Duo, so x86), see below for /proc/cpuinfo)

This is the diff of two consecutive soft reboots, with the same kernel
(CONFIG_HPET_TIMER unset, CONFIG_HIGH_RES_TIMERS set):

--- dmesg-2.6.21-rc5-nohpet-error-notime        2007-03-26 16:42:00.000000000 
+0800
+++ dmesg-2.6.21-rc5-nohpet-success-notime      2007-03-26 16:42:09.000000000 
+0800
@@ -100,7 +100,7 @@
 Enabling unmasked SIMD FPU exception support... done.
 Initializing CPU#0
 PID hash table entries: 4096 (order: 12, 16384 bytes)
-Detected 1952.283 MHz processor.
+Detected 1830.920 MHz processor.
 Console: colour VGA+ 80x25
 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
 Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)


The "+" boot reports the correct CPU frequency (1.83Ghz).

When you run something like this on both:
# ntpdate swisstime.ethz.ch
# ntpdate swisstime.ethz.ch
# sleep 60
# ntpdate swisstime.ethz.ch

You get, with the "+" boot, what you expect:

26 Mar 16:33:34 ntpdate[3385]: adjust time server 129.132.2.21 offset 0.120391 
sec
26 Mar 16:32:52 ntpdate[3376]: adjust time server 129.132.2.21 offset 0.137979 
sec
-- pause of about 60 seconds
26 Mar 16:33:55 ntpdate[3386]: adjust time server 129.132.2.21 offset 0.125751 
sec


i.e. the clock is still correct 60 seconds after you adjust it.

While, with the "-" boot, you get:

26 Mar 16:18:05 ntpdate[3444]: step time server 129.132.2.21 offset 3.829351 sec
26 Mar 16:18:08 ntpdate[3445]: adjust time server 129.132.2.21 offset 0.232023 
sec
-- pause
26 Mar 16:19:16 ntpdate[3447]: step time server 129.132.2.21 offset 4.320653 sec

The clock has shifted of about 4.1 seconds in 60 seconds, i.e. an error
of 6.8%, which is in the same range as the error between 1952 and 1833
Mhz (6.5%).
 
I get the same results with both CONFIG_HPET_TIMER and
CONFIG_HIGH_RES_TIMERS unset, some boots are ok, some aren't.

I couldn't get a correct boot with these two options enabled, but it's
probably because I didn't reboot enough times...

I you want some detailed kernel logs, see
http://www.boichat.ch/nicolas/linux/2nd/ .
Please tell me if you need more informations, or if you prefer I file a
bug in the bugzilla.

Best regards,

Nicolas

---

/proc/cpuinfo on an incorrect boot:

nunuche ~ # cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Genuine Intel(R) CPU           T2400  @ 1.83GHz
stepping        : 8
cpu MHz         : 1952.216
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 0
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor 
vmx est tm2 xtpr
bogomips        : 4689.63
clflush size    : 64

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 6
model           : 14
model name      : Genuine Intel(R) CPU           T2400  @ 1.83GHz
stepping        : 8
cpu MHz         : 1952.216
cache size      : 2048 KB
physical id     : 0
siblings        : 2
core id         : 1
cpu cores       : 2
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx constant_tsc pni monitor 
vmx est tm2 xtpr
bogomips        : 3661.27
clflush size    : 64


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to