On Wed, Sep 4, 2013 at 9:36 AM, janI <j...@apache.org> wrote:

> Hi.
>
> We have had some longer discussions on different ML/IRC about how a
> vm-admin should behave and which level of service we expect for our
> servers.
>
> We need new admins, so this is also a request for anyone interested to chip
> in.
>
> We have had some unfortunate incidents on all 3 vm, of different nature,
> which made me question if we as a community:
> a) want servers, that are cared for professionally or by happening.
> b) want to (are capable to) maintain the servers ourself.
> c) are prepared to support a change that make a), b) possible.
>
> I have formulated some thoughts on how admins could work, but in general I
> believe we should convince infra to take over the vm responsibility and
> keep our well functioning forum/wiki admins.
>
> We have a vm-team in place, that was created with the purpose of not having
> a single person as admin. I my opinion the team have not lived up to that
> purpose but I am still thankful for the help I have received.
>
> Remarks the ideas below are my personal thought, which I have used during
> the time where I maintained the servers:
>
> ===========
> The server should at all times be maintained with the following priority:
> 1) security (the backside of being popular is to have the attention of
> people who want to gain merit by breaking our servers)
> 2) stability (we have limited cpu/ram/disk so we must optimize)
> 3) add user wishes (we already have stable systems, 1,2 are far  more
> important that enhancing the systems).
>
> Being an admin on a vm is a job that does not take soo much time, but
> requires a lot of monitoring and communication (especially with infra).
>
> A good setup would be, 3 types of admin:
> Each server will have an appointed "owner" (anchor-admin)
> A number of persons have full sudo on a server (admin)
> A number of persons can reboot/restart/work on po files (help-admin)
>
> === Anchor-admin responsibilities ===
> Anchor-admin is the "owner" of the vm and the prime contact to infra.
>
> Anchor-admin has the overall responsibility of the vm.
> 1) help when receiving alerts
> 2) keep informed on available patches, especial security related patches
> 3) create/keep a maintenance plan
> 4) coordinate changes external to vm (like dns) with infra
> 5) participate in infra discussions relevant for the vm (e.g. certificates)
> 6) monitor the vm regularly for resource usage
> 7) secure that appl changes are implemented with relevant consensus
> 8) discuss work with admin, with the goal that they should be able to take
> over one day.
>
> These activities are expected to take 3-4 hours pr week, more in the
> beginning and less later. The hour usage highly depend on the number and
> level of admins.
>
> === Admin responsibilities ===
> Admins help the anchor admin with ongoing maintenance and have full sudo.
>
> All changes must be discussed and agreed with the anchor admin, no change
> is so important that it cannot wait until discussed !
>
> Admins are expected to:
> 1) help when receiving alerts
> 2) stay informed with the vm configuration
> including but not limited to:
> - where are which configuration done, and stored (svn/backup)
> - how are the apps. configured
> - read and update runbook, if something is unclear
> 3) participate in the regular maintenance
> 4) coordinate all non-scheduled work with anchor-admin
>
> These activities are expected to take 1-2 hours pr week, more in the
> beginning and less later.
>
> Admin does not need to be specialists, we all learn, but it is important
> that the admin have motivation and time to learn.
>
>
> === Help-admin responsibilities ===
> Help-admins are located in different timezones, so we have 24/7 coverage
> and have limited sudo (only restart/reboot/handle po files).
>
> When a help-admin receives an alert mail, actions should be taken
> 1) is the vm reachable via ssh, then login else escalate to admin/infra
> 2) is the vm overloaded, or is apache/mysql not running
> 3) restart the needed processes
> 4) mail at least anchor-admin about with obervations and what was done.
>
>
> ===
> remark the above are just my thoughts, there are a lot of other
> possibilities.
>
> Lets hear your opinion?
>
> rgds
> jan I.
>

I would like to discuss this topic further, much further as a matter of
fact, but right now I don't really have enough information.

Can you provide details on the following 9or point to document that
describes this):

* to aid our memories, who are the current vm-team
* what are the three servers now under the vm-team
* what vm-OS does each use
* for each server, what are the specific applications a vm-sysadmin would
need to know/become familiar with to be an effective sysadmin
* how are alerts on system failure currently handled
* what resources would a vm-admin need to respond to a system failure


Your role outline is good, but I think before we discuss future strategy,
we need a better idea about what's involved.







-- 
-------------------------------------------------------------------------------------------------
MzK

"Truth is stranger than fiction, but it is because Fiction is obliged
 to stick to possibilities. Truth isn't."
                             -- "Following the Equator", Mark Twain

Reply via email to