Hi all,

> On 6. Jun 2022, at 16:09, Finan, Sean 
> <sean.fi...@childrens.harvard.edu.INVALID> wrote:
> 
> Hi Kean,
> 
> Thank you for the suggestion and the link. I am really glad that people are 
> interested in this guithub topic and taking it seriously. It would be great 
> if we could make it happen.
> 
> While definitely a possibility, the git LFS paradigm is something that I 
> would like to avoid. 
> 
> Like keeping our models on SVN, it would also require separating models from 
> code into two different repos, e.g. github and bitbucket. As opposed to 
> bitbucket, the apache svn repos are long established, familiar to and 
> supported by the apache infrastructure team. The same goes for the apache 
> foundation use of github. I like being able to lean on the apache infra team 
> for help.

So GitHub seems to have support for LFS [1]. What I do not know is if the ASF's 
GitHub plan allows us to use this and if so if there is a volume limit. Would 
have to ask INFRA about that.

The use of Git and GitHub is well supported by the INFRA team. For example, 
there is self-service for creating and managing repos. [2]

There is also the `.asf.yaml` mechanism for configuring GitHub repos and 
hooking them up with the ASF infrastructure including mailing lists, website 
publishing, etc. etc. [3]

> The apache Jenkins servers are linked to the svn repos, making continuous 
> integration easy - on the rare occasion when somebody does change something 
> in a model repo. While I expect anybody savvy enough to work on models to 
> also have the knowhow and wherewithal to work with a separate svn repo, I 
> don't want them to need to get out to jenkins and manually kick off snapshot 
> builds.

Jenkins also supports GitHub very well [4]. For example, in UIMA, we just drop 
a `Jenkinsfile` [5,6] configuration file into each repo and Jenkins picks them 
up even gives us support pull requests [7].
I'm happy to help you setting that up for cTAKES as well.

> Probably most important is the requirement of the client user to have the LFS 
> command line client. I think that there are enough hoops stuck in front of 
> getting ctakes installed/checked out/cloned/etc. and it seems to me that one 
> of the biggest reasons to use github is to make things easier for absolute 
> newbies to just pull down code and experiment.

It is an additional hoop to jump through indeed, but it is a one-time action to 
install LFS. Chances are that people may even already have it set up because 
they use it in other repos.

> Keeping the models on a separate svn repo would mean that they aren't checked 
> out as code, but would be put in the .m2 maven area when a user runs maven 
> compile. While the total footprint of full ctakes would still be the same 
> size, it would essentially make the code directory smaller and initial 
> downloads/checkouts would be faster. Plus, if done properly maybe it could 
> "clean up" all of those nearly identically named modules in my intellij 
> project window and I'd stop clicking on the wrong one when I've had too much 
> coffee.

Nowadays, I fear that people may not have svn installed anymore ;) So requiring 
svn to download models and drop them into m2 might be an inconvenience. If the 
models live in a Maven Repository and can be dragged in as a normal dependency, 
that would seem most convenient.

Cheers,

-- Richard

[1] 
https://docs.github.com/en/repositories/working-with-files/managing-large-files/configuring-git-large-file-storage
[2] https://gitbox.apache.org
[3] https://s.apache.org/asfyaml
[4] https://builds.apache.org/job/UIMA/
[5] https://github.com/apache/uima-uimaj/blob/main/Jenkinsfile
[6] https://github.com/apache/uima-build-jenkins-shared-library 
[7] https://builds.apache.org/job/UIMA/job/uima-uimaj/view/change-requests/

Reply via email to