When there is too high strata, ntpdate can understand this and always write this into its log. In our case there are just no log - ntpdate send first packet, get an answer - that's all. So, fudging won't save us, as I think. Also, it's a really bad approach to fudge a server which doesn't have a real clock onboard.
On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <aschu...@mirantis.com> wrote: > On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin > <sbogat...@mirantis.com> wrote: > > Hi guys, > > > > for some time we have a bug [0] with ntpdate. It doesn't reproduced 100% > of > > time, but breaks our BVT and swarm tests. There is no exact point where > > problem root located. To better understand this, some verbosity to > ntpdate > > output was added but in logs we can see only that packet exchange between > > ntpdate and server was started and was never completed. > > > > So when I've hit this in my local environments there is usually one or > two possible causes for this. 1) lack of network connectivity so ntp > server never responds or 2) the stratum is too high. My assumption is > that we're running into #2 because of our revert-resume in testing. > When we resume, the ntp server on the master may take a while to > become stable. This sync in the deployment uses the fuel master for > synchronization so if the stratum is too high, it will fail with this > lovely useless error. My assumption on what is happening is that > because we aren't using a set of internal ntp servers but rather > relying on the standard ntp.org pools. So when the master is being > resumed it's struggling to find a good enough set of servers so it > takes a while to sync. This then causes these deployment tasks to fail > because the master has not yet stabilized (might also be geolocation > related). We could either address this by fudging the stratum on the > master server in the configs or possibly introducing our own more > stable local ntp servers. I have a feeling fudging the stratum might > be better when we only use the master in our ntp configuration. > > > As this bug is blocker, I propose to merge [1] to better understanding > > what's going on. I created custom ISO with this patchset and tried to run > > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we will > > merge this, we would catch the problem much faster and understand root > > cause. > > > > I think we should merge the increased logging patch anyway because > it'll be useful in troubleshooting but we also might want to look into > getting an ntp peers list added into the snapshot. > > > I appreciate your answers, folks. > > > > > > [0] https://bugs.launchpad.net/fuel/+bug/1533082 > > [1] https://review.openstack.org/#/c/271219/ > > -- > > with best regards, > > Stan. > > > > Thanks, > -Alex > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- with best regards, Stan.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev