On 18/02/2014 14:16, J. Roeleveld wrote:
> On Tue, February 18, 2014 12:17, Alan McKinnon wrote:
>> On 18/02/2014 11:52, J. Roeleveld wrote:
>>> On Tue, February 18, 2014 10:47, Alan McKinnon wrote:
>>>> What I do run into is daemons that drop privs on start up, like
>>>> tac_plus. Unwary new sysadmins always try start/stop it as root,
>>>> causing
>>>> an unholy mess. Root the owns the log and pid files, when tac_plus
>>>> drops
>>>> privs it can't record it's state so continues to service requests but
>>>> fails to log any of them. For an auth daemon, that's a serious issue.
>>>
>>> Shouldn't sysadmins use the init-scripts for that?
>>> If done correctly, permissions should not be an issue.
>>
>> It's a little more complex than just that. It's an auth service and user
>> are frequently added, removed and modified. The daemon does syntax
>> checking on it's config file at startup or after being HUP'ed but that
>> only finds static errors. It catches things like adding people to a grop
>> instead of to a group, but misses dynamic mistakes like adding users to
>> groups that don't exist.
> 
> The auth-service gets the current state from a static file that is only
> read upon service-start?

Yes.

It's a good design for reasonably static userbases. The user details,
priviledge definitions, passwords hashes and such are stored in a single
flat file readable only by root and protected by file permissions.
Overall protection is provided by restricted shell access to the host.

We're not talking about AT&T's radius servers for dsl users here who
sign up on a web form - for that you would use a database backend - this
is for the company's network support personnel who log into the backbone
and configure the network itself. There's no rush to add new (and
unproven...) users so this scheme suits me just fine. Yes, it has quirks
but these no longer bother me myself, we get caught out by new sysadmins
who have not felt that pain yet



> 
>> It's exactly analogous to compile-time vs runtime errors, compilers
>> can't catch the latter.
>>
>> Despite this all being run out of cron with wrapper scripts to check
>> validity, automated additions and safety checks between all three
>> daemons, plus being fully documented on the internal wiki and in bold
>> blinking red caps in the login motd, people still find ways to do stuff
>> things in an attempt to fix it.
> 
> (OT: Does the bold blinking red caps work on all terminals? :) )


Um, OK, you got me there. I was exaggerating!

> 
>> The daemon also tries to log these errors, by writing to a log file it
>> has no write permissions on.
> 
> "setuid" on the group with group-write in the umask not an option?


Hmmm, that's worth investigating. I hadn't really considered that as I
have an aversion to trying to use umask as a control for anything.

> 
>> There is nothing I can do about the quality of sysadmins, I have no
>> input into the HR process and damagement think cheaper is always better,
>> including skills. What I can do, is find ways to make the software more
>> resistant to errors than it already is.
> 
> And only grant access permissions to these rookies once they have proven
> they understand rule #1: If In Doubt, Call Someone Who Knows!

Hah! I fought that good fight for years and fought it well. They don't
call me the sysadmin from hell around here without good reason. And I
did manage to get a cowboy network under control and instill respect for
how much breakage Cisco's products can cause.

It's getting harder to grant access based purely on expertise,
especially when someone crunched the numbers. It turns out that the cost
of fixing mistakes is far less than the cost of leaving new untrained
people unutilized and have support tickets pile up...

> 
> But yes, I fully understand the methods of HR and Damagement.
> It is a financial mistake and risk not to include technical expertise
> checks in the recruitement fase for technical positions.

Interesting story:

I once had a good shouting match with a support manager about the
quality of his recruits. I demanded to know why he hired so many
clueless idiots (my exact words). This manager knows me well so he just
smiled and said "Alan, you didn't get to see the applicants we rejected.
These are the best in the market who applied".

*That* was a wake-up call of note :-)


> 
> How much does it cost the company each time this goes wrong and someone
> like you has to come online to fix the issue?
> That is what Damagement needs to understand.

Surprisingly, it's not too expensive. There's always one of us on duty
or standby and outages don't continue unnoticed for long. Longest that I
recall is 3 minutes, then the phone starts ringing non-stop. remember,
this system is internal, it does not service customers.



-- 
Alan McKinnon
alan.mckin...@gmail.com


Reply via email to