The main issue is that we need to get clone and diff+render operations back into normal time frames. The salsa workers (e.g. to render a diff) time out after 60s. Similar time constraints are put onto other rendering frond-ends. Actually you can easily get Apache to segfault if you do not time-constrain cgi/fcgi type processes. But that's out of scope here.
Back on topic: Just splitting the file will not do. We need to (unfortunately) somehow "get rid" of the history (delta-resolution) walks in git: # test setup limits: Network bw: 200 MBit, client system: 4 core $ time git clone https://.../debian_security_security-tracker Klone nach 'debian_security_security-tracker' ... remote: Counting objects: 334274, done. remote: Compressing objects: 100% (67288/67288), done. remote: Total 334274 (delta 211939), reused 329399 (delta 208905) Empfange Objekte: 100% (334274/334274), 165.46 MiB | 21.93 MiB/s, Fertig. Löse Unterschiede auf: 100% (211939/211939), Fertig. real 14m13,159s user 27m23,980s sys 0m17,068s # Run the tool already available to split the main CVE/list # file into annual files. Thanks Raphael Geissert! $ bin/split-by-year # remove the old big CVE/list file $ git rm data/CVE/list # get the new files into git $ git add data/CVE/list.* $ git commit --all [master a06d3446ca] Remove list and commit bin/split-by-year results 21 files changed, 342414 insertions(+), 342414 deletions(-) delete mode 100644 data/CVE/list create mode 100644 data/CVE/list.1999 create mode 100644 data/CVE/list.2000 create mode 100644 data/CVE/list.2001 create mode 100644 data/CVE/list.2002 create mode 100644 data/CVE/list.2003 create mode 100644 data/CVE/list.2004 create mode 100644 data/CVE/list.2005 create mode 100644 data/CVE/list.2006 create mode 100644 data/CVE/list.2007 create mode 100644 data/CVE/list.2008 create mode 100644 data/CVE/list.2009 create mode 100644 data/CVE/list.2010 create mode 100644 data/CVE/list.2011 create mode 100644 data/CVE/list.2012 create mode 100644 data/CVE/list.2013 create mode 100644 data/CVE/list.2014 create mode 100644 data/CVE/list.2015 create mode 100644 data/CVE/list.2016 create mode 100644 data/CVE/list.2017 create mode 100644 data/CVE/list.2018 # this one is fast: $ git push # create a new clone $ time git clone https://.../debian_security_security-tracker_split_files test-clone Klone nach 'test-clone' ... remote: Counting objects: 334298, done. remote: Compressing objects: 100% (67312/67312), done. remote: Total 334298 (delta 211943), reused 329399 (delta 208905) Empfange Objekte: 100% (334298/334298), 168.91 MiB | 21.28 MiB/s, Fertig. Löse Unterschiede auf: 100% (211943/211943), Fertig. real 14m35,444s user 27m45,500s sys 0m21,100s --> so splitting alone doesn't help. Git is not clever enough to not run through the deltas of not to be checked-out files. git 2.18's git2 wire protocol could be used with server-side filtering but that's an awful hack. Telling people to git clone --depth 1 #(shallow) like Guido advises is easier and more reliable for the clone use-case. For the original repo that will take ~1.5s, for a split-by-year repo ~0.2s. There are tools to split git files and keep the history e.g. https://github.com/potherca-bash/git-split-file but we'd need (to create) one that also zaps the old deltas. So really "rewrite history" as the git folks tend to call this. git filter-branch can do this. But it would get somewhat complex and murky with commits that span CVE/list-year and list-year+1 which are at least 21 for 2018+2017, 19 for 2017+2016 and ~10 for previous year combos. So I wouldn't put too much effort into that path. In any case, a repo with just the split files but no maintained history clones in ~12s in the above test setup. It also brings the (bare) repo down from 3,3GB to 189MB. So the issue is really the data/CVE/list file. That said, data/DSA/list is 14575 lines. That seems to not bother git too much yet. Still if things get re-structured, this file may be worth a look, too. To me the most reasonable path forward unfortunately looks like start a new repo for 2019+ and "just" import the split files or single-record files as mentioned by pabs but not the git/svn/cvs history. The old repo would - of course - stay around but frozen at a deadline. Corsac also mentioned on IRC that the repo could be hosted outside of Gitlab. That would reduce the pressure for some time. But cgit and other git frontends (as well as backends) we tested also struggle with the repo (which is why my company, Faster IT GmbH, used the security-tracker repo as a very welcome test case in the first place). So that would buy time but not be a solution long(er) term. Thanks for reading that much!