TL;DR: Here be nitpicks maquerading as answers, humor in wisdom's drag.
On 2016-04-05 12:47, Edward Ned Harvey (lopser) wrote: > It's clear that the industry best practice is to use a real > time source, but there is a flip side: If you deviate from the > industry best practice, how much do you risk? Hi Edward, >From your posts here, I've got tons of respect for your opinions and insights, some of which I've snagged for posterity. Below, when I say "you" I'm really addressing the List -- who as a rule are less subtle, less skilled, and more in need of a sharp push. I'm saddened by the irrational optimism often expressed here. Please bear with me. | As the IT person, I never trust anything. Every piece of hardware, | every software, every system, is expected to fail, and it's my | job as IT person, to minimize harm caused by these failures. So | right now, I'm transitioning from "never trust anything" to | "holy crap, look at that Corillian Death Ray. | -- Edward Ned Harvey, 2013-05-12 The biggest problems facing sysadmins in general are risk related. A previous employer saw backups as a time-consuming predator that was never going to have a pay-back. The problem was not that they hated backups, but they couldn't see the existential risk to the business they help reduce. The ops manager's predecessor had a career-limiting incident with a failed system for which there was no backup. And yet nothing was learned. > When instructed to > deviate from best practice, should you be upset and insist that > your boss put the request in writing, creating friction between > yourself and him/her? Should you refuse to do it completely? > Should you just roll with it? There's an over-arching game being played, the rules are baked into our primate behavior. If your boss is supportive and you have a warm and reciprocal relationship, mention that you need resources to keep the lights on and wait for them to make it rain. | When the customer has beaten upon you long enough, give him what | he asks for, instead of what he needs. This is very strong | medicine, and is normally only required once. | --Vadim Vygonets, a.s.r Otherwise, carefully transact Vadim's advice and discreetly carry an umbrella. You can use the tip to prod the most suitable orifice if funds are delayed. If your manager misunderstands their role and think they're a problem-solver or maybe a dominatrix, you've lost the game. | "It's called a shovel," said the Senior Wrangler. "I've seen | the gardeners use them. You stick the sharp end in the ground. | Then it gets a bit technical." -- Terry Pratchett, "Reaper Man" > If you have a specific need, then it's important to cater to > that need, and you might have to insist against deviation from > best practice. Paul Culmsee and K. Awati wrote a book-length screed about best practices that I highly recommend, ISBN 1938908406. Spoiler alert: they don't work. > But if instead, all you have to worry about is AD > remaining functional, and approximate correct timestamps on > files and such, and users knowing the correct time to show up at > meetings, then your need is much less strict. Timestamp accuracy requirments are relative. AD's +/- 5 minutes (based, I think on the Kerberos requirements for ticket expiry) is the guard-rail of timekeeping. Maybe you didn't plunge off the highway down the embankment but you're definitely not driving within the lane. At the other end of the scale, when systems have gone pear-shaped and 1 msec clock uncertainty has sundered your ability to separate cause from effect, the cost savings of having run a loose ship will melt away into the lake of overtime, downtime, and lost sales. If you're lucky you can force the bill onto the DBAs and developers who will mop up the mess. Just don't sit with your backs to them. > I hear people on this list referring to "unstable time source" > and "false ticker." This is a valid concern, sort-of. In > reality, your guest machines are no better at tracking time than > the VM time server is, so when there's any deviation between the > client and your time server, it doesn't result in a false > ticker. The way you get a false ticker is when you have > configured several time servers, and some of them agree with > each other by quorum, but there is an outlier. In that case, the > clients still get the correct time, but the one outlier server > needs to be corrected. Break some canary server loose from the tyranny of accurate timestamps and wait for the alarm. If alarms don't sound have a talk with your PFY about the nature of systems, geometry, causal chains, overtime, risk management, and so on, until one day they come (metaphorically) running over waving a printout saying "holy sh*t the clock on canary666 is way out of spec how do we fix that?". Until that moment arrives keep 3 to 6 month's wages in highly liquid form. Single malt is good. > There is something to be said for anecdotal evidence, when > there's a lot of it. I've seen many dozens of environments where > AD is run in virtual servers, which get their time from the > internet (non-local stratum 2), and AD serves time to the > clients on the LAN without noticeable problems. Sure the time > may drift a few seconds in either direction, but again - unless > you have a specific need, that's good enough for general use. The key thing is to have very tight control over clocks within the blast radius of your troubleshooting domain. One time the above reasonable assumptions went wrong was when a junior SA misconfigured a subdomain in a way that got hundreds of systems pulling time from the first workstation to boot in the morning. | The ticket was closed with 'Colonel Mustard, in the | datacenter, with the keyboard.' > Most environments don't have a dedicated stratum 1 time > source, and most AD is run on VM's. Is this the right place to remind that VMs experience time- travel? One moment it's 10 after two, and then there was a snapshot revert, and suddenly it's twenty 'till. Oh my what is the luckless bastard of an NTP client going to make of the giant jumping timeslop? (Hint: answer varies depending on OS make, model, and vintage. Some exceptions apply. Results not guaranteed in all states. When in doubt ask your doctor). > In those many dozen environments, I've seen many hundreds of > VM's running as guests on peoples' laptops. Most common is the > windows VM running inside someone's macbook. This introduces yet > another level of VM time-fuzziness, and yet again, I've never > seen it cause a problem, because the only requirements in those > environments have been to keep AD running, and clients > operational, and users showing up to meetings at the right time. Right -- the real world, of users and desktops. System administration of the back end is less forgiving. > I have some further anecdotal evidence: I have actually seen a > situation or two, where the AD server was no longer able to get > time from the internet (due to firewall change - once hardware, > and once software). So the AD time source slowly drifted off, > and all the clients slowly followed. We discovered when a human > noticed, "Why does my laptop say 4:05, when my phone says 4:13?" > So then we restored the ability for AD to follow the internet, > and AD slowly adjusted back to the right time, and all the > clients slowly followed. > Nevermind stratum 1. That's a VM client, following a VM > server, following a non-local stratum 2 time source. It's about > as bad as you can get, but the problem was caused by firewall > blockage, and the behavior of the system was about as ideal as > you can get, once the firewall problems were corrected. > > Unless you have a specific need, in this case, the risk of > deviation from best practice is pretty low. > Further evidence: Despite trying to demonstrate a problem with > this, to prove your boss wrong, you couldn't demonstrate a > problem. -- Charles Polisher _______________________________________________ Tech mailing list Tech@lists.lopsa.org https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/