Lots of good info, and a nice mind-dump that gives me a whole host of other things that need to be looked at... Umm. "thanks" :)
On Wed, Oct 21, 2009 at 11:10 PM, Perry Lorier <pe...@coders.net> wrote: > Rick Ernst wrote: > >> Resent, since I responded from the wrong address: >> --- >> The basic operation of IP SLA is as surmised; payload with timestamps >> and other telemetry data is sent to a 'responder' which manipulates >> the payload, including adding its own timestamps, and returns the >> altered payload. >> >> > > Yup :) It's the obvious way to do it :) > > I had to do a mental walk-through, but I think I see how drift can >> cause this. I'm going to generate some artificial data, graph it, and >> see if it matches the general waveshape I'm seeing. >> >> I purposefully have the traffic generators ntp syncing against the >> responders. I thought that would keep the clocks more closely in sync. >> I don't necessarily care if the time is 'right', just that it's the >> same. >> > > This causes major problems. What you're actually measuring here is how > well ntp can keep the clock sync'd under assymetric latency. ntp is trying > to do it's own measurements of one way delay, without the help of clocks to > measure clock drift as well. As you can see from your graphs ntp is not > coping[1]. > > You are far better to have each end sync to a local stratum 1 or stratum 2 > ntp source, preferably one over a different link to the one under test. If > you don't have a local stratum 1/2 time source at each end, you might be > able find one over a local exchange or other less congested link. If this > is very important to you then you should consider looking at running your > own stratum 1 clocks at each end syncronised off something like GPS, CDMA or > a T1 clock. > > What kind of difference should I expect if I sync both >> generators and responders against the same source, or not sync the >> responder? I'm thinking that having one source with constant drift may >> be better than both devices trying to walk/correct the time. >> >> > > Most hardware clocks in PC's/routers/switches etc have pretty atrocious > amounts of drift if left to free run[2], sometimes in the order of seconds > or occasionally minutes per week. To get useful numbers you really do need > to syncronise them to /something/. Synchronising them to each other causes > problems as ntp I think (I could be wrong) assumes mostly symmetrical > latency, and if the latency isn't symmetric assumes it's because one clock > is running fast/slow and will alter the clock's speed to account for it. > The great thing about ntp stratum 1 servers is that by definition they have > more or less the same time no matter where they are, so synchronising each > against a local ntp server will be a much much better solution. If possible > you should consider peering with at least 3 upstreams, preferably 4(!)[3] > other ntp servers. > > [1]: To be fair it's a hard problem. Anything that involves time just gets > more and more complicated the more you look at it, ntp is extremely clever > and probably knows more about time than I'd ever want to know, but you're > making it's job hard. > > [2]: http://vancouver-webpages.com/time/ / > http://vancouver-webpages.com/time/ltmhist.png > > [3]: > http://twiki.ntp.org/bin/view/Support/SelectingOffsiteNTPServers#Section_5.3.3 > . >