Best practices for documenting NLP versions

Miller, Timothy Fri, 21 Oct 2022 11:03:18 -0700

We’ve recently been using cTAKES for some internal projects where we make 
modifications, often using the REST server, combined with an open-source python 
client that makes the output of the REST server easy to post-process:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py
written by my colleagues Andy McMurry and Mike Terry, and pip installable. The 
output is then either converted to FHIR or written to whatever convenient 
format we need.


But it’s useful to know for a given run on a given project, what was the NLP 
configuration that produced this output? Obviously, there are things like 
version numbers, but since cTAKES is highly configurable, and our 
post-processing libraries have versions, and we may use trunk or a previous 
commit instead of releases, things get complicated quickly. Does anyone have an 
existing solution they are willing to share? Or does anyone have any thoughts 
on this topic? This question goes slightly beyond cTAKES, but cTAKES is 
responsible for a lot of the complexity in figuring this out since it’s the 
most configurable component.

Thanks
Tim

Best practices for documenting NLP versions

Reply via email to