I've had my share of broken systems caused by pushing out configuration files. So when it came to user management, I created a system to avoid pushing files out. I have master passwd and group files that are read by client systems to manage local users. Scripts to manage users and groups use the client-specific commands internally to add, modify or remove users -- useradd, usermod, userdel on Linux, mkuser, chuser, rmuser on AIX (plus the AIX utilities handle the multitude of AIX user management files). This setup ensures that I don't push out empty or corrupted files, and it provides a common UID/GID across all systems.

Other configuration files that are pushed out are done by environment (lab, dev, test, qa, prod), with validation on each environment.

- Dave Bianchi

On 5/19/14, 12:29 PM, Dana Quinn wrote:
For those of you sharing these tales of woe (I have some similar ones I could share) - can you share what you discussed in your post mortems as to protect against these issues in the future? One thing I'm curious about is whether you discussed doing slower rollouts to the "production" environment. Did you come up with any general approaches or rules for these type of rollouts? For example only rolling out to 5% of servers at a time. Or another idea is a pattern i've heard discussed for production rollouts: One, Few, Many - which is as it sounds, rollout to one host in production, observe, then a few servers, then rollout more broadly. Just curious what learnings and what approaches you've taken from these incidents (an incident is a terrible thing to waste!).

Dana


On Mon, May 19, 2014 at 8:20 AM, Ski Kacoroski <kacoro...@gmail.com <mailto:kacoro...@gmail.com>> wrote:

    Paul & David,

    So very true.  I learned that the hard way when we had a bug in a
    configuration one-liner that renamed /etc to /somethingelse across 20
    different kinds of unix (I was working for a software development
    house
    and we shipped on all of them).  Took over a day to break into
    each one
    and rename it back.  Hardest was Dec Tru64 which required pressing a
    special key combination at just the right time in the boot sequence.

    ski

    On Mon, 19 May 2014 06:41:19 -0700
    Paul Graydon <p...@paulgraydon.co.uk
    <mailto:p...@paulgraydon.co.uk>> wrote:

    > At a previous job we managed to push out passwd file to several
    > hundred servers without a root account in it. (we'd forgotten to
    make
    > root a protected account that could never expire in the generating
    > script we used with cfengine) That was fun. All sorts of stuff broke
    > in some very interesting ways. That lead to a fun day of running
    > around servers with recovery disks and replacing the passwd and
    > shadow files.
    >
    > David Lang <da...@lang.hm <mailto:da...@lang.hm>> wrote:
    >
    > >to err is human, to really foul things up requires a computer
    > >
    > >...and when you automate changes to computers....
    > >
    > >I've done similar things, not reformatting everything, but I
    managed
    > >to use an automation tool to break all 250 firewalls in at
    > >$prior_job in a way that disabled the automation at the same time,
    > >requiring booting from recovery media and manual changes to
    each box
    > >to recover. To complicate things, the firewalls mostly continued to
    > >work, so we had to juggle the fixes to avoid breaking things even
    > >worse.
    > >
    > >The good news was that the automation was good enough that I was
    > >able to give a couple people instructions on how to recover and we
    > >got everything fixed in a few hours, but it was an interesting
    > >afternoon.
    > >
    > >David Lang
    > >
    > >On Sun, 18 May 2014, Nick Webb wrote:
    > >
    > >> On Sun, May 18, 2014 at 9:38 PM, David Lang <da...@lang.hm
    <mailto:da...@lang.hm>> wrote:
    > >>
    > >>> wayback to the rescue
    > >>>
    > >>> http://web.archive.org/web/20140516225155/http://it.
    > >>> emory.edu/windows7-incident/
    <http://emory.edu/windows7-incident/>
    > >>>
    > >>>
    > >> I hang my head in shame for not checking there!
    > >>
    > >> Wow this is/was a nightmare. For those of us working on
    automation
    > >> initiatives, this is one downside to be careful of... when
    it's so
    > >> easy to make a mass change we must take extra care...
    > >>
    > >_______________________________________________
    > >Discuss mailing list
    > >Discuss@lists.lopsa.org <mailto:Discuss@lists.lopsa.org>
    > >https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
    > >This list provided by the League of Professional System
    > >Administrators
    > > http://lopsa.org/
    > _______________________________________________
    > Discuss mailing list
    > Discuss@lists.lopsa.org <mailto:Discuss@lists.lopsa.org>
    > https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
    > This list provided by the League of Professional System
    Administrators
    > http://lopsa.org/



    --
    "When we try to pick out anything by itself, we find it
      connected to the entire universe"            John Muir

    Chris "Ski" Kacoroski, kacoro...@gmail.com
    <mailto:kacoro...@gmail.com>, 206-501-9803 <tel:206-501-9803>
    or ski98033 on most IM services

    _______________________________________________
    Discuss mailing list
    Discuss@lists.lopsa.org <mailto:Discuss@lists.lopsa.org>
    https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
    This list provided by the League of Professional System Administrators
    http://lopsa.org/




--
Dana Quinn
da...@pobox.com <mailto:da...@pobox.com>


_______________________________________________
Discuss mailing list
Discuss@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
  http://lopsa.org/

_______________________________________________
Discuss mailing list
Discuss@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to