Best practices for documenting NLP versions

2022-10-21 Thread Miller, Timothy
We’ve recently been using cTAKES for some internal projects where we make modifications, often using the REST server, combined with an open-source python client that makes the output of the REST server easy to post-process: https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py

Re: Best practices for documenting NLP versions

2022-10-21 Thread Peter Abramowitsch
Well, obviously, the full range of permutations of all source files and all annotators and pre and post ctakes code would require a huge amount of commit information on thousands of files... and not only ctakes files...recently I made some pretty significant changes to the ZonerCli library which i

Re: Best practices for documenting NLP versions

2022-10-21 Thread Greg Silverman
Why not use Docker and versioning by tags? See "C. Boettiger, An introduction to Docker for reproducible research, SIGOPS Oper. Syst. Rev. 49 (2015) 71–79. doi:10.1145/2723872.2723882. " On Fri, Oct 21, 2022 at 3:15 PM Peter Abramowitsch wrote: > Wel

Re: Best practices for documenting NLP versions

2022-10-21 Thread Peter Abramowitsch
Interesting, but it would depend on how the docker is set up. Our docker for instance, encapsulates all the code and imported jars, as you imply, but the piper and other runtime configuration such as section regex, negex, bsvs, etc are imported on a mounted FS during the container's runtime. Havin

Re: Best practices for documenting NLP versions

2022-10-21 Thread Greg Silverman
It was an off-the-cuff suggestion. Devil is obviously in the details. On Fri, Oct 21, 2022 at 3:33 PM Peter Abramowitsch wrote: > Interesting, but it would depend on how the docker is set up. Our docker > for instance, encapsulates all the code and imported jars, as you imply, > but the piper a