Re: Best practices for documenting NLP versions

Greg Silverman Fri, 21 Oct 2022 13:18:07 -0700

Why not use Docker and versioning by tags? See "C. Boettiger, An
introduction to Docker for reproducible research, SIGOPS Oper. Syst. Rev. 49
(2015) 71–79. doi:10.1145/2723872.2723882.
<https://www.zotero.org/google-docs/?Xd3H9e>"




On Fri, Oct 21, 2022 at 3:15 PM Peter Abramowitsch <pabramowit...@gmail.com>
wrote:

> Well, obviously, the full range of permutations of all source files and all
> annotators and pre and post ctakes code would require a huge amount of
> commit information on thousands of files... and not only ctakes
> files...recently I made some pretty significant changes to the  ZonerCli
> library which is only a dependency of the ctakes distribution. How would
> all the commit info be used to tag the end results.  I think the answer is
> that it's simply not feasible or useful.     So we haven't gone to those
> lengths.  As far as we go at the UCs  is to version the piper file and then
> write the versioned_name of the piper back into the json object returned
> for each note... We have our own rest service and our own Java and Python
> clients, but they don't touch the internals of the message in a way that
> interferes with the clinical informatics.  The note concept collection
> object with its piper version is then persisted in our data store.   The
> server jar also has a version which writes into a log and is updated
> whenever any significant framework changes are implemented.   But the
> server version is not written into the data-store.
>
> Not sure if any of this was helpful
>
> On Fri, Oct 21, 2022 at 8:03 PM Miller, Timothy
> <timothy.mil...@childrens.harvard.edu.invalid> wrote:
>
> > We’ve recently been using cTAKES for some internal projects where we make
> > modifications, often using the REST server, combined with an open-source
> > python client that makes the output of the REST server easy to
> post-process:
> >
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
> > written by my colleagues Andy McMurry and Mike Terry, and pip
> installable.
> > The output is then either converted to FHIR or written to whatever
> > convenient format we need.
> >
> > But it’s useful to know for a given run on a given project, what was the
> > NLP configuration that produced this output? Obviously, there are things
> > like version numbers, but since cTAKES is highly configurable, and our
> > post-processing libraries have versions, and we may use trunk or a
> previous
> > commit instead of releases, things get complicated quickly. Does anyone
> > have an existing solution they are willing to share? Or does anyone have
> > any thoughts on this topic? This question goes slightly beyond cTAKES,
> but
> > cTAKES is responsible for a lot of the complexity in figuring this out
> > since it’s the most configurable component.
> >
> > Thanks
> > Tim
> >
> >
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
g...@umn.edu

Re: Best practices for documenting NLP versions

Reply via email to