Ian Kelling wrote:
> Ineiev writes:
> > Yes; our Git server already can't cope with its current load,
> > https://savannah.gnu.org/support/?110712
>
> Afaik, that isn't true. I've heard Bob say that he keeps track of load
> related problems and they are relatively rare. That bug report you link
> to does not even say otherwise, it is one person who had a problem and
> did not stick around and follow the debugging advice they were given.

The git server is not overloaded at this time.  It is handling the
steady state of normal usage acceptably well.  I measure this looking
at RAM needs, CPU utilization, storage I/O utilization, and network
bandwidth.  I am reviewing the system while typing this.  Looks good.

The load on the service does grow over time.  People are doing more
with it.  Generally a good thing.  Automated CI/CD build use is
ramping up.  That's going to need some guidelines to help people help
us.  Guidelines that need to be written.  But it will get there.
Recently we increased the resources dedicated to the git system
scaling up the size of the server for it to deal with the increasing
load.  And I expect that over time we will need to do that again.  But
for the moment the engines are looking good.

I monitor the server system resource metrics in detail.  I investigate
anomalies found.  I have processes that screen the system logs for
unusual things.  Unfortunately the Internet is a hostile place and
there is a continuous level of Internet Background Radiation of probes
and abuse and attacks.  ALL of our servers are contantly under this
continuing abuse level.  This requires attention and often response to
block that abuse.

I'll say especially to the https/http protocol ports which has become
general use for everything and therefore suffers the most from abuse.
That we use HTTP for many different things, for web browsing of
documentation, for web browsing of source code through CGIT/gitweb,
ViewVC, Loggerhead, through release tar.gz bundle download access, and
for everything else, makes it more difficult.  Member access through
ssh is more narrow and therefore easier to deal with abuse.

This is what it takes to operate a well known and highly utilized
server on the hostile Internet.  It's going to be abused.  It's going
to be attacked.  It is going to need love and attention.

> > however, the hard part is making the server work adequately.

In the above I write my opinion of the problem.  It's not the server.

We have recently had rounds of significant downtime of the underlying
servers which host the virtual machine systems which host these
services.  If the underlying virtualization server is not working then
of course the virtual servers hosted upon them can't run either.

We among the SysOps team have regular meetings and discussions about
what is the highest priority to do next with the resources available.
Scope.  Schedule.  Cost.  Pick any two to control and the 3rd is what
it is.  Another way to say this is "Good.  Fast.  Cheap.  Pick any
two."

It won't be a surprise that we as a team are not in agreement over
what should happen.  Shock!  I know, right.  Life is a compromise.
Personally my goal is that we are always improving.  Even if things
are not happening as I personally would want them to happen as long as
we are making progress in the forward direction then I see that as a
good result.

At this moment I think the hosting infrastructure is still behind.
It's not where it should be for a high reliability system.  We are
trying to catch up.  But "stuff is happening".  That's one of the
reasons the power became a problem!  Because more systems were being
added to the rack.  Corwin also noted some problems with the FSF
choice of RYF hardware which creates challenges which I will
reinforce.  It's old and sometimes flakes out.

Also remember that Ian and Michael are also tasked with doing other
jobs too.  It's not only the hosting cluster.  And they like to have
the weekend off every now and again.

> No, I'm pretty sure that is wrong and not an issue at all. Lots of
> cloning is the thing which causes load, and it web pages repos are very
> rarely cloned except by bots, which we already have the job of
> limiting. People almost always access the web pages as web pages and
> that does not put any load on the git server.

Mostly agree.  There is a lot of unpack here in the details.  And I
think most people reading this would have their eyes glaze over with
the details.  But unpacking it there are 1) git clones for automated
CI/CD builds 2) people web browsing dynamic web pages of version
control history and 3) abuse from the hostile Internet.

But yes, git introduced the concept of full history clones and too
much full history cloning has caused problems.  I am sure it will
continue to cause problems.  Unfortunately I am unaware of any way to
force the Right Thing to happen.  It's an education issue.

For the most part we have gotten the word out to use shallow clones.
I have this idea that I will write up Bob's Guide to CI/CD building
which we can point people to in order to help them.  Because automated
CI/CD builds are a GOOD THING.  It improves software quality.  I want
to encourage and facilitate CI/CD builds.

Bob

Reply via email to