Hi.  First of all very well done for raising the bar at  your work.  Second,
management buy-in can make or break your activity.  Lack of management
buy can break your activity.

I highly recommend "Visible Ops".  Check it out.  They've studied successful
IT shops and identified what they have in common, and change management
is a top top item.  "Visible Ops" lays out a program for moving from where you
are now to a more desireable position.  Check it out.


http://www.itpi.org/?page=Visible_Ops


Best,
-at

On Sun, Feb 27, 2011 at 1:30 AM, Benjamin Spiccia <[email protected]> wrote:
>
> Hi LOPSA,
>
> I need some advice on how to get everyone in an operations team to clearly
> document their changes/actions and make sure that their actions are
> documented.
> I will probably cross post in my local SAGE mailing list as well.
>
> In short:
> - I'm a slightly inexperienced young Sysadmin, who's just started with the
> company last week
> - From what I can gather I was hired largely on the basis of my reputation
> at my previous company for establishing and following robust processes
> (Operations process design/maintenance, Server provisioning and handling of
> Level 2/3 escalated support requests are part of my job description)
> - Operations Team consists of < 5 people
> - The company I work for has recently taken over another one. We are aiming
> to slowly transition taken-over company customers over to our systems. The
> old-company network is complex and the current state of that network is
> virtually undocumented. The existing company network is relatively new,
> relies on
> some parts of the taken-over company infrastructure. We want to be running
> own stuff and be completely independent of taken-over-company systems.
> - Two people in my Operations Team are very smart and very good technically
> at what they do but do not see the need to document actions taken to resolve
> a problem, or infrastructure configuration changes that are performed
> - The company has a colossus, legacy web app (designed by one of the
> Operations Team) which appears to be a one stop place for service creation,
> DNS changes (to Bind), customer ticket creation (to RT) and monitoring (with
> Nagios), but GUI is not fantastic and it appears no-one other than the
> person who coded it likes to use it. There is no detail of what was actually
> changed, other than who did the last change.
>
> Any advice on how I get people to change the way they do things, or for that
> matter any advice on how to go about such a large infrastructure transition
> would be appreciated. Preferably, I'd like to not come across as some
> know-it-all punk who's asking for things to implemented simply to create
> electronic paperwork.
>
> Thanks LOPSA,
>
> Ben S
>
> In more detail:
> - While the legacy web app creates tickets for Request Tracker, there is no
> documentation of what happens during L2/3 escalation (communication is
> through side channels like a direct e-mail to L1 or a phone call)
> - We have a a lot of infrastructure in multiple remote areas fail due to
> circumstances beyond our control (weather, upstream provider problems etc).
> There appears to be some auto-acknowledging of some Nagios alerts and
> rate-limiting of e-mails due to what I think is a bad legacy Nagios
> configuration, which the legacy web app generates
> - Both people in my Operations Team surprisingly aren't from the taken-over
> company
> - Partial knowledge of the complete network remains in the head of 2 people
> in my Operations Team, undocumented anywhere
> - The Web app appears to have been developed with an emphasis on allowing
> Techs to add services quickly, but the web GUI is
> both information overload at times, and complex due to non-standard
> terminology used
>
> I seem to have hit a brick wall trying to convince them of a need to track
> changes/actions.
> Argument 1: Me: "Don't you think the fact that had to revive the
> taken-over--company systems after an outage should be documented?
> Operations Team: "The old-company systems are going away. We know how to fix
> this common problem. The old-company
> systems are going to be blown away anyway. Why bother documenting what was
> performed?
>
> Argument 2)  Me: Don't you think the fact that you changed network routes to
> work around an upstream problem should be
> documented somewhere? How do other members of the Operations Team know that
> you have already done so? How do
> you know what other Operations Team member have already done to work around
> the problem?
> Operations Team: We're already in constant phone contact with each other
> when such a problem happens, why should
> what has been performed need to be documented?
>
> Argument 3) Me: Even a one-liner of what was performed, would you be
> prepared to do that?
> Operations Team: No, I don't have time for that. I've got far too much to
> do. (It is apparent that all Operations staff have
> a lot to get done daily)
>
> Argument 4) Me: Shouldn't item (X) be documented
> Operations Team: You don't need to know about this particular component, you
> won't be administrating it anyway
>
> Argument 5) Me: The fact that company techs had to go onsite to replace a
> component that died, fixing an issue -
> do you think that should be documented?
> Operations Team: I guess...it should. The bean-counters would probably want
> to know about it....
>
> My proposed plan is to:
> - Get clarification of my role from the boss
> - Get everyone to use RT properly for any kind of request (even e-mails are
> deliberately not sent when a new request
> is made)
> - Get started performing some kind of documentation of the taken-over
> infrastructure and the current infrastructure
> using something like Racktables. Non-config technical descriptions will go
> into Sharepoint (I would like to use a wiki,
> but sadly cannot given big bucks have already been paid)
> - See if I can get underlying config files checked into Subversion every
> time the underlying config files for a service
> is changed and a diff sent to the Operations Team. Longer term I am thinking
> of transitioning some of the service
> config changes performed by the web-app over to a manual config change
> process. This may allow things
> to be tracked properly with a Puppet+Subversion solution, this sounds
> terrible as it will mean reduced automation.
>
>
> --------------------------------Advertisement-----------------------------
>
>
>
> _______________________________________________
> Tech mailing list
> [email protected]
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
>  http://lopsa.org/
>
>
_______________________________________________
Tech mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to