On Sat, Jul 9, 2011 at 9:11 PM, Maurice Janssen <maur...@z74.net> wrote:
> On Sat, Jul 09, 2011 at 07:52:58AM -0400, Nick Holland wrote:
>>On 07/09/11 03:57, Maurice Janssen wrote:
>>> Hi,
>>>
>>> Is it possible to somehow force a program to run on a single CPU in an
>>> SMP system?
>>> The reason I ask that on some SMP-capable architectures, I'm having some
>>> problems with ntpd.  On hppa and sgi, the clock won't sync because ntpd
>>> sees replies with negative delay:
>>>
>>> Jul  9 08:58:19 hppa ntpd[21406]: reply from 192.168.4.12: negative
>>> delay -0.854615s, next query 3120s
>>>
>>> (reported as PR6592)
>>>
>>> If I run the bsd.sp kernel, the negative delays are gone and ntpd syncs
>>> without any problem.  I was wondering if the problem would occur if I
>>> could limit ntpd to a single CPU.  Diving into the code is way beyond my
>>> skills, so I was hoping that there is a utility like nice to achieve
this.
>>>
>>> Thanks,
>>> Maurice
>>
>>Things aren't that simple.
>>Time is an illusion.  Lunch time, doubly so
>>(obligatory Hitchhiker's quote)
>>Time on computers is complicated, doubly so on a multiprocessor system.
>>
>>ntpd isn't your problem, it's time on the SMP system.  Fiddling with
>>processor affinity (trying to attach particular tasks to particular
>>CPUs) wouldn't help if you could (and you can't).
>
> OK, thanks for making that clear.
>
>>Is time really drifting (consistently increasing error in one direction)
>>on these systems?  Or is it just "jittering" around proper time?
>
> The hppa was about 10 seconds behind proper time since boot (the machine
> is not powered on anymore).  The delay was quite stable.
>
> The sgi was close to proper time (within one second) and finally synced
> the clocked after about 4 hours.  But the 'negative delay' lines keep
> appearing in /var/log/daemon:
>
> Jul  9 07:10:40 sgi ntpd[25403]: ntp engine ready
> Jul  9 07:11:47 sgi ntpd[7566]: set local clock to Sat Jul  9 07:11:47 CEST
2011 (offset 66.702436s)
> Jul  9 07:11:53 sgi ntpd[25403]: reply from 192.168.4.10: negative delay
-0.422241s, next query 3298s

This sounds like the timekeeping code isn't synchronizing between
cpus. I don't know how those architectures are getting their time, but
it sounds like different cpus have slightly out of sync counters or
even frequencies.

This is the reason why we don't use instruction counters to measure
time on i386 and amd64.

//art

Reply via email to