We’ve recently been using cTAKES for some internal projects where we make modifications, often using the REST server, combined with an open-source python client that makes the output of the REST server easy to post-process: https://github.com/Machine-Learning-for-Medical-Language/ctakes-client-py written by my colleagues Andy McMurry and Mike Terry, and pip installable. The output is then either converted to FHIR or written to whatever convenient format we need.
But it’s useful to know for a given run on a given project, what was the NLP configuration that produced this output? Obviously, there are things like version numbers, but since cTAKES is highly configurable, and our post-processing libraries have versions, and we may use trunk or a previous commit instead of releases, things get complicated quickly. Does anyone have an existing solution they are willing to share? Or does anyone have any thoughts on this topic? This question goes slightly beyond cTAKES, but cTAKES is responsible for a lot of the complexity in figuring this out since it’s the most configurable component. Thanks Tim