On Tue 23 August 2011 14:10:57 kashani did opine thusly: > On 8/23/2011 1:43 PM, Alan McKinnon wrote: > > I can't fix it without running afoul of the Change Management > > process, and today's emergency reboot didn't leave me any time > > to poke around and determine the effect of removing hal. > > > > This is how life in corporate IT works.... > > I hate Corp CM and it's one of the reasons I stay in startups. It's > job is to slow normal change down so much so that every change > becomes an emergency. > > However next time I have to deal with one I am shoving mathematical > proof of "there is no rollback in systems" down there throats. > http://www.iu.hio.no/~mark/papers/totalfield.pdf
Haven't read the pdf yet, but I just have to share this joke. Tonight's CM was an unscheduled emergency reboot. This gave me opportunity to do something I've been dying to do for ages, enter this: Install plan: reboot server Test plan: ping server Backout plan: unreboot server <====== :-) On the whole our CM process is sane. The manager knows how infrastructure works: If that undersea optical link goes down, I'm fixing it right now and to hell with the paperwork and process. Contrast with my gf's job at the bank. That one truly is a case where to change anything, she has to invent imaginary catastrophic emergencies. More often than not, she causes them in undetectable ways just to get her job done. > > For those that aren't ginormous systems nerds this bit sums it up > nicely. > > "There is a deeper issue with roll-back in partial systems. If a > system is in contact with another system, e.g. receiving data, or > if we have partitioned a system into loosely coupled pieces only > one of which is being changed, then the other system becomes a part > of the total system and we must write a hypothetical journal for > the entire system in order to achieve a consistent rollback." > > kashani -- alan dot mckinnon at gmail dot com