Ian Kelling wrote: > Ineiev writes: > > Yes; our Git server already can't cope with its current load, > > https://savannah.gnu.org/support/?110712 > > Afaik, that isn't true. I've heard Bob say that he keeps track of load > related problems and they are relatively rare. That bug report you link > to does not even say otherwise, it is one person who had a problem and > did not stick around and follow the debugging advice they were given.
The git server is not overloaded at this time. It is handling the steady state of normal usage acceptably well. I measure this looking at RAM needs, CPU utilization, storage I/O utilization, and network bandwidth. I am reviewing the system while typing this. Looks good. The load on the service does grow over time. People are doing more with it. Generally a good thing. Automated CI/CD build use is ramping up. That's going to need some guidelines to help people help us. Guidelines that need to be written. But it will get there. Recently we increased the resources dedicated to the git system scaling up the size of the server for it to deal with the increasing load. And I expect that over time we will need to do that again. But for the moment the engines are looking good. I monitor the server system resource metrics in detail. I investigate anomalies found. I have processes that screen the system logs for unusual things. Unfortunately the Internet is a hostile place and there is a continuous level of Internet Background Radiation of probes and abuse and attacks. ALL of our servers are contantly under this continuing abuse level. This requires attention and often response to block that abuse. I'll say especially to the https/http protocol ports which has become general use for everything and therefore suffers the most from abuse. That we use HTTP for many different things, for web browsing of documentation, for web browsing of source code through CGIT/gitweb, ViewVC, Loggerhead, through release tar.gz bundle download access, and for everything else, makes it more difficult. Member access through ssh is more narrow and therefore easier to deal with abuse. This is what it takes to operate a well known and highly utilized server on the hostile Internet. It's going to be abused. It's going to be attacked. It is going to need love and attention. > > however, the hard part is making the server work adequately. In the above I write my opinion of the problem. It's not the server. We have recently had rounds of significant downtime of the underlying servers which host the virtual machine systems which host these services. If the underlying virtualization server is not working then of course the virtual servers hosted upon them can't run either. We among the SysOps team have regular meetings and discussions about what is the highest priority to do next with the resources available. Scope. Schedule. Cost. Pick any two to control and the 3rd is what it is. Another way to say this is "Good. Fast. Cheap. Pick any two." It won't be a surprise that we as a team are not in agreement over what should happen. Shock! I know, right. Life is a compromise. Personally my goal is that we are always improving. Even if things are not happening as I personally would want them to happen as long as we are making progress in the forward direction then I see that as a good result. At this moment I think the hosting infrastructure is still behind. It's not where it should be for a high reliability system. We are trying to catch up. But "stuff is happening". That's one of the reasons the power became a problem! Because more systems were being added to the rack. Corwin also noted some problems with the FSF choice of RYF hardware which creates challenges which I will reinforce. It's old and sometimes flakes out. Also remember that Ian and Michael are also tasked with doing other jobs too. It's not only the hosting cluster. And they like to have the weekend off every now and again. > No, I'm pretty sure that is wrong and not an issue at all. Lots of > cloning is the thing which causes load, and it web pages repos are very > rarely cloned except by bots, which we already have the job of > limiting. People almost always access the web pages as web pages and > that does not put any load on the git server. Mostly agree. There is a lot of unpack here in the details. And I think most people reading this would have their eyes glaze over with the details. But unpacking it there are 1) git clones for automated CI/CD builds 2) people web browsing dynamic web pages of version control history and 3) abuse from the hostile Internet. But yes, git introduced the concept of full history clones and too much full history cloning has caused problems. I am sure it will continue to cause problems. Unfortunately I am unaware of any way to force the Right Thing to happen. It's an education issue. For the most part we have gotten the word out to use shallow clones. I have this idea that I will write up Bob's Guide to CI/CD building which we can point people to in order to help them. Because automated CI/CD builds are a GOOD THING. It improves software quality. I want to encourage and facilitate CI/CD builds. Bob