…or should it be spacyTAKES? Horrible name, but potentially useful tool. Thanks again to Tim for the inspiration.
Proof-of-principle FastAPI tool here: https://github.com/twloehfelm/ctakescy This stands up a FastAPI endpoint to which you send a cTAKES CAS XMI file (output from the clinical pipeline, for example), Typesystem XML file, and a list of Types to modify ([…DiseaseDisorderMention, …SignSymptomMention], for example). It returns to you an updated CAS with polarity (and optionally: uncertainty, historyOf, subject, conditional) set by either the negex (negspaCy; polarity only) or ConText (medspacy context: polarity + other attributes) spaCy components. This is proof-of-principle, not production ready – it works from the FastAPI test panel at least. In a production environment I’d make some obvious changes to make it more efficient and snappier (don’t need to send the typesystem with each request, and don’t need to initialize a new Language model with each request), but this works as a proof-of-principle and those changes are trivial to make. I was looking for a way to leverage Python-based NLP tools like spaCy while preserving the core features and rich annotations of cTAKES. I’m not very facile with Java or the cTAKES dev process, so this is my way of moving things over to Python where I can iterate and test faster than I can in Java. The solution I came up with is to piece together tools that allow: 1. manipulating CAS in python (dkpro-cassis library) 2. accessing the rich spaCy ecosystem * build a spaCy Doc from existing cTAKES-assigned CAS attributes i. super useful content for understanding spaCy framework here: https://applied-language-technology.mooc.fi/html/about.html ii. For now I am just adding cTAKES sentences and Entities (as spaCy Spans) – as far as I can tell these are the only required upstream pipeline outputs for the negSpacy and medspacy ConText algorithms. * It is not unreasonable to also add POS, Chunks, ConLL dependencies, etc to the spaCy Doc object, so if you want to use a spaCy pipe component that requires those your options are to map them from cTAKES CAS or use an existing spaCy model that includes them. 1. returning the updated but still valid CAS **CONFIDENTIALITY NOTICE** This e-mail communication and any attachments are for the sole use of the intended recipient and may contain information that is confidential and privileged under state and federal privacy laws. If you received this e-mail in error, be aware that any unauthorized use, disclosure, copying, or distribution is strictly prohibited. If you received this e-mail in error, please contact the sender immediately and destroy/delete all copies of this message.