But you've used 'logger -t ntpdate' - this is can fail again and logs can be empty again. My opinion we should use output redirection to the log-file directly.
On Wed, Jan 27, 2016 at 11:21 AM, Stanislaw Bogatkin <sbogat...@mirantis.com > wrote: > Yes, I have created custom iso with debug output. It didn't help, so > another one with strace was created. > On Jan 27, 2016 00:56, "Alex Schultz" <aschu...@mirantis.com> wrote: > >> On Tue, Jan 26, 2016 at 2:16 PM, Stanislaw Bogatkin >> <sbogat...@mirantis.com> wrote: >> > When there is too high strata, ntpdate can understand this and always >> write >> > this into its log. In our case there are just no log - ntpdate send >> first >> > packet, get an answer - that's all. So, fudging won't save us, as I >> think. >> > Also, it's a really bad approach to fudge a server which doesn't have a >> real >> > clock onboard. >> >> Do you have a debug output of the ntpdate somewhere? I'm not finding >> it in the bugs or in some of the snapshots for the failures. I did >> find one snapshot with the -v change that didn't have any response >> information so maybe it's the other problem where there is some >> network connectivity isn't working correctly or the responses are >> getting dropped somewhere? >> >> -Alex >> >> > >> > On Tue, Jan 26, 2016 at 10:41 PM, Alex Schultz <aschu...@mirantis.com> >> > wrote: >> >> >> >> On Tue, Jan 26, 2016 at 11:42 AM, Stanislaw Bogatkin >> >> <sbogat...@mirantis.com> wrote: >> >> > Hi guys, >> >> > >> >> > for some time we have a bug [0] with ntpdate. It doesn't reproduced >> 100% >> >> > of >> >> > time, but breaks our BVT and swarm tests. There is no exact point >> where >> >> > problem root located. To better understand this, some verbosity to >> >> > ntpdate >> >> > output was added but in logs we can see only that packet exchange >> >> > between >> >> > ntpdate and server was started and was never completed. >> >> > >> >> >> >> So when I've hit this in my local environments there is usually one or >> >> two possible causes for this. 1) lack of network connectivity so ntp >> >> server never responds or 2) the stratum is too high. My assumption is >> >> that we're running into #2 because of our revert-resume in testing. >> >> When we resume, the ntp server on the master may take a while to >> >> become stable. This sync in the deployment uses the fuel master for >> >> synchronization so if the stratum is too high, it will fail with this >> >> lovely useless error. My assumption on what is happening is that >> >> because we aren't using a set of internal ntp servers but rather >> >> relying on the standard ntp.org pools. So when the master is being >> >> resumed it's struggling to find a good enough set of servers so it >> >> takes a while to sync. This then causes these deployment tasks to fail >> >> because the master has not yet stabilized (might also be geolocation >> >> related). We could either address this by fudging the stratum on the >> >> master server in the configs or possibly introducing our own more >> >> stable local ntp servers. I have a feeling fudging the stratum might >> >> be better when we only use the master in our ntp configuration. >> >> >> >> > As this bug is blocker, I propose to merge [1] to better >> understanding >> >> > what's going on. I created custom ISO with this patchset and tried to >> >> > run >> >> > about 10 BVT tests on this ISO. Absolutely with no luck. So, if we >> will >> >> > merge this, we would catch the problem much faster and understand >> root >> >> > cause. >> >> > >> >> >> >> I think we should merge the increased logging patch anyway because >> >> it'll be useful in troubleshooting but we also might want to look into >> >> getting an ntp peers list added into the snapshot. >> >> >> >> > I appreciate your answers, folks. >> >> > >> >> > >> >> > [0] https://bugs.launchpad.net/fuel/+bug/1533082 >> >> > [1] https://review.openstack.org/#/c/271219/ >> >> > -- >> >> > with best regards, >> >> > Stan. >> >> > >> >> >> >> Thanks, >> >> -Alex >> >> >> >> >> __________________________________________________________________________ >> >> OpenStack Development Mailing List (not for usage questions) >> >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> > >> > >> > >> > -- >> > with best regards, >> > Stan. >> > >> > >> __________________________________________________________________________ >> > OpenStack Development Mailing List (not for usage questions) >> > Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > -- Best Regards, Maksim Malchuk, Senior DevOps Engineer, MOS: Product Engineering, Mirantis, Inc <vgor...@mirantis.com>
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev