Hi all, > On 6. Jun 2022, at 16:09, Finan, Sean > <sean.fi...@childrens.harvard.edu.INVALID> wrote: > > Hi Kean, > > Thank you for the suggestion and the link. I am really glad that people are > interested in this guithub topic and taking it seriously. It would be great > if we could make it happen. > > While definitely a possibility, the git LFS paradigm is something that I > would like to avoid. > > Like keeping our models on SVN, it would also require separating models from > code into two different repos, e.g. github and bitbucket. As opposed to > bitbucket, the apache svn repos are long established, familiar to and > supported by the apache infrastructure team. The same goes for the apache > foundation use of github. I like being able to lean on the apache infra team > for help.
So GitHub seems to have support for LFS [1]. What I do not know is if the ASF's GitHub plan allows us to use this and if so if there is a volume limit. Would have to ask INFRA about that. The use of Git and GitHub is well supported by the INFRA team. For example, there is self-service for creating and managing repos. [2] There is also the `.asf.yaml` mechanism for configuring GitHub repos and hooking them up with the ASF infrastructure including mailing lists, website publishing, etc. etc. [3] > The apache Jenkins servers are linked to the svn repos, making continuous > integration easy - on the rare occasion when somebody does change something > in a model repo. While I expect anybody savvy enough to work on models to > also have the knowhow and wherewithal to work with a separate svn repo, I > don't want them to need to get out to jenkins and manually kick off snapshot > builds. Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop a `Jenkinsfile` [5,6] configuration file into each repo and Jenkins picks them up even gives us support pull requests [7]. I'm happy to help you setting that up for cTAKES as well. > Probably most important is the requirement of the client user to have the LFS > command line client. I think that there are enough hoops stuck in front of > getting ctakes installed/checked out/cloned/etc. and it seems to me that one > of the biggest reasons to use github is to make things easier for absolute > newbies to just pull down code and experiment. It is an additional hoop to jump through indeed, but it is a one-time action to install LFS. Chances are that people may even already have it set up because they use it in other repos. > Keeping the models on a separate svn repo would mean that they aren't checked > out as code, but would be put in the .m2 maven area when a user runs maven > compile. While the total footprint of full ctakes would still be the same > size, it would essentially make the code directory smaller and initial > downloads/checkouts would be faster. Plus, if done properly maybe it could > "clean up" all of those nearly identically named modules in my intellij > project window and I'd stop clicking on the wrong one when I've had too much > coffee. Nowadays, I fear that people may not have svn installed anymore ;) So requiring svn to download models and drop them into m2 might be an inconvenience. If the models live in a Maven Repository and can be dragged in as a normal dependency, that would seem most convenient. Cheers, -- Richard [1] https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage [2] https://gitbox.apache.org [3] https://s.apache.org/asfyaml [4] https://builds.apache.org/job/UIMA/ [5] https://github.com/apache/uima-uimaj/blob/main/Jenkinsfile [6] https://github.com/apache/uima-build-jenkins-shared-library [7] https://builds.apache.org/job/UIMA/job/uima-uimaj/view/change-requests/