Hi. First of all very well done for raising the bar at your work. Second, management buy-in can make or break your activity. Lack of management buy can break your activity.
I highly recommend "Visible Ops". Check it out. They've studied successful IT shops and identified what they have in common, and change management is a top top item. "Visible Ops" lays out a program for moving from where you are now to a more desireable position. Check it out. http://www.itpi.org/?page=Visible_Ops Best, -at On Sun, Feb 27, 2011 at 1:30 AM, Benjamin Spiccia <[email protected]> wrote: > > Hi LOPSA, > > I need some advice on how to get everyone in an operations team to clearly > document their changes/actions and make sure that their actions are > documented. > I will probably cross post in my local SAGE mailing list as well. > > In short: > - I'm a slightly inexperienced young Sysadmin, who's just started with the > company last week > - From what I can gather I was hired largely on the basis of my reputation > at my previous company for establishing and following robust processes > (Operations process design/maintenance, Server provisioning and handling of > Level 2/3 escalated support requests are part of my job description) > - Operations Team consists of < 5 people > - The company I work for has recently taken over another one. We are aiming > to slowly transition taken-over company customers over to our systems. The > old-company network is complex and the current state of that network is > virtually undocumented. The existing company network is relatively new, > relies on > some parts of the taken-over company infrastructure. We want to be running > own stuff and be completely independent of taken-over-company systems. > - Two people in my Operations Team are very smart and very good technically > at what they do but do not see the need to document actions taken to resolve > a problem, or infrastructure configuration changes that are performed > - The company has a colossus, legacy web app (designed by one of the > Operations Team) which appears to be a one stop place for service creation, > DNS changes (to Bind), customer ticket creation (to RT) and monitoring (with > Nagios), but GUI is not fantastic and it appears no-one other than the > person who coded it likes to use it. There is no detail of what was actually > changed, other than who did the last change. > > Any advice on how I get people to change the way they do things, or for that > matter any advice on how to go about such a large infrastructure transition > would be appreciated. Preferably, I'd like to not come across as some > know-it-all punk who's asking for things to implemented simply to create > electronic paperwork. > > Thanks LOPSA, > > Ben S > > In more detail: > - While the legacy web app creates tickets for Request Tracker, there is no > documentation of what happens during L2/3 escalation (communication is > through side channels like a direct e-mail to L1 or a phone call) > - We have a a lot of infrastructure in multiple remote areas fail due to > circumstances beyond our control (weather, upstream provider problems etc). > There appears to be some auto-acknowledging of some Nagios alerts and > rate-limiting of e-mails due to what I think is a bad legacy Nagios > configuration, which the legacy web app generates > - Both people in my Operations Team surprisingly aren't from the taken-over > company > - Partial knowledge of the complete network remains in the head of 2 people > in my Operations Team, undocumented anywhere > - The Web app appears to have been developed with an emphasis on allowing > Techs to add services quickly, but the web GUI is > both information overload at times, and complex due to non-standard > terminology used > > I seem to have hit a brick wall trying to convince them of a need to track > changes/actions. > Argument 1: Me: "Don't you think the fact that had to revive the > taken-over--company systems after an outage should be documented? > Operations Team: "The old-company systems are going away. We know how to fix > this common problem. The old-company > systems are going to be blown away anyway. Why bother documenting what was > performed? > > Argument 2) Me: Don't you think the fact that you changed network routes to > work around an upstream problem should be > documented somewhere? How do other members of the Operations Team know that > you have already done so? How do > you know what other Operations Team member have already done to work around > the problem? > Operations Team: We're already in constant phone contact with each other > when such a problem happens, why should > what has been performed need to be documented? > > Argument 3) Me: Even a one-liner of what was performed, would you be > prepared to do that? > Operations Team: No, I don't have time for that. I've got far too much to > do. (It is apparent that all Operations staff have > a lot to get done daily) > > Argument 4) Me: Shouldn't item (X) be documented > Operations Team: You don't need to know about this particular component, you > won't be administrating it anyway > > Argument 5) Me: The fact that company techs had to go onsite to replace a > component that died, fixing an issue - > do you think that should be documented? > Operations Team: I guess...it should. The bean-counters would probably want > to know about it.... > > My proposed plan is to: > - Get clarification of my role from the boss > - Get everyone to use RT properly for any kind of request (even e-mails are > deliberately not sent when a new request > is made) > - Get started performing some kind of documentation of the taken-over > infrastructure and the current infrastructure > using something like Racktables. Non-config technical descriptions will go > into Sharepoint (I would like to use a wiki, > but sadly cannot given big bucks have already been paid) > - See if I can get underlying config files checked into Subversion every > time the underlying config files for a service > is changed and a diff sent to the Operations Team. Longer term I am thinking > of transitioning some of the service > config changes performed by the web-app over to a manual config change > process. This may allow things > to be tracked properly with a Puppet+Subversion solution, this sounds > terrible as it will mean reduced automation. > > > --------------------------------Advertisement----------------------------- > > > > _______________________________________________ > Tech mailing list > [email protected] > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech > This list provided by the League of Professional System Administrators > http://lopsa.org/ > > _______________________________________________ Tech mailing list [email protected] https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list provided by the League of Professional System Administrators http://lopsa.org/
