Re: Best practices for documenting NLP versions

Peter Abramowitsch Fri, 21 Oct 2022 13:33:19 -0700

Interesting, but it would depend on how the docker is set up.  Our docker
for instance, encapsulates all the code and imported jars, as you imply,
but the piper and other runtime configuration such as section regex, negex,
bsvs, etc are imported on a mounted FS during the container's runtime.
Having them frozen into the docker instances would proliferate vast numbers
of docker image-tars with 99% redundant data.  Or do you have a cleverer
solution?


Peter

On Fri, Oct 21, 2022 at 10:18 PM Greg Silverman <g...@umn.edu.invalid> wrote:

> Why not use Docker and versioning by tags? See "C. Boettiger, An
> introduction to Docker for reproducible research, SIGOPS Oper. Syst. Rev.
> 49
> (2015) 71–79. doi:10.1145/2723872.2723882.
> <https://www.zotero.org/google-docs/?Xd3H9e>"
>
>
>
> On Fri, Oct 21, 2022 at 3:15 PM Peter Abramowitsch <
> pabramowit...@gmail.com>
> wrote:
>
> > Well, obviously, the full range of permutations of all source files and
> all
> > annotators and pre and post ctakes code would require a huge amount of
> > commit information on thousands of files... and not only ctakes
> > files...recently I made some pretty significant changes to the  ZonerCli
> > library which is only a dependency of the ctakes distribution. How would
> > all the commit info be used to tag the end results.  I think the answer
> is
> > that it's simply not feasible or useful.     So we haven't gone to those
> > lengths.  As far as we go at the UCs  is to version the piper file and
> then
> > write the versioned_name of the piper back into the json object returned
> > for each note... We have our own rest service and our own Java and Python
> > clients, but they don't touch the internals of the message in a way that
> > interferes with the clinical informatics.  The note concept collection
> > object with its piper version is then persisted in our data store.   The
> > server jar also has a version which writes into a log and is updated
> > whenever any significant framework changes are implemented.   But the
> > server version is not written into the data-store.
> >
> > Not sure if any of this was helpful
> >
> > On Fri, Oct 21, 2022 at 8:03 PM Miller, Timothy
> > <timothy.mil...@childrens.harvard.edu.invalid> wrote:
> >
> > > We’ve recently been using cTAKES for some internal projects where we
> make
> > > modifications, often using the REST server, combined with an
> open-source
> > > python client that makes the output of the REST server easy to
> > post-process:
> > >
> >
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
> > > written by my colleagues Andy McMurry and Mike Terry, and pip
> > installable.
> > > The output is then either converted to FHIR or written to whatever
> > > convenient format we need.
> > >
> > > But it’s useful to know for a given run on a given project, what was
> the
> > > NLP configuration that produced this output? Obviously, there are
> things
> > > like version numbers, but since cTAKES is highly configurable, and our
> > > post-processing libraries have versions, and we may use trunk or a
> > previous
> > > commit instead of releases, things get complicated quickly. Does anyone
> > > have an existing solution they are willing to share? Or does anyone
> have
> > > any thoughts on this topic? This question goes slightly beyond cTAKES,
> > but
> > > cTAKES is responsible for a lot of the complexity in figuring this out
> > > since it’s the most configurable component.
> > >
> > > Thanks
> > > Tim
> > >
> > >
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> g...@umn.edu
>

Re: Best practices for documenting NLP versions

Reply via email to