Hi Paul 1. The cTakes ecosystem is Java with a some optional Python code. I have little experience running it in a Windows environment and so perhaps someone else in the group can give you pointers. My instinct would be to run it in a Linux based Docker instance - which I do anyway for some clients. You can package it yourself as a standalone application talking to a database or you can use a Webservice wrapper around it which exists in the codebase (that is either Dockerized or packaged as.a WAR or both). Then you can implement a REST client in a pure Windows environment if that is easier for you.
2. cTakes is an open source project going back to 2012, and as such, uses many different technical approaches in its various components: pattern recognition, state machines, POS and treebank extractors and some ML techniques but it does not have a user friendly training mechanism for those components, although there are some examples. The best way to understand it is to download it and get started. Peter On Fri, Aug 11, 2023 at 6:45 AM Paul Stearns <pa...@compuace.com.invalid> wrote: > Peter: > > Thanks for the detailed and thoughtful explanation. > > The easiest part for me to understand and work through would be #6. My MO > for this sort of thing with both currently used in the existing target > system are Windows services with associated DB queues and DLLs called from > the application. The former for items which are not needed as part of the > "real time" application and the latter for those which are. > > I currently have a homegrown application which looks for keywords and > negation modifiers within a certain distance from the keywords which works > moderately well. > > My ignorance regarding NLP algorithms like CTakes is whether it is keyword > driven, or it is self learning. If it is the latter, I have a fairly large > collection of human curated data which I could feed a training module. > > Where can I find an "executive overview" (30,000 foot view) of how the > CTakes works? > > Paul R. Stearns > Advanced Consulting Enterprises, Inc. > 15150 NW 79th Court, > Suite: 206 > Miami Lakes Fl, 33016 > > Voice: (305)623-0360 x107 > Fax: (305)623-4588 > > ---------------------------------------- > From: "Peter Abramowitsch" <pabramowit...@gmail.com> > Sent: 8/10/23 11:59 PM > To: dev@ctakes.apache.org > Subject: Junk E-Mail Fwd: Initial CTakes analysis > > Hi Paul > Out of the box, cTakes would get you part of the way there, but would > require several types of customization to meet your requirements. All of > these are the kind of customizations that most of us have had to do, so > there's nothing new here, but they are not trivial. As I see it they fall > into these categories. > > 1. getting familiar with the cTakes Application, pipeline, annotator and > vocabulary ecosystem > 2. choosing a vocabulary subset that gives the best coverage of the terms > you are looking for > 3. adding one or more custom dictionaries to add terms & synonyms that are > not present - > 4. maybe employing the anatomical site annotator in your pipeline > 5. deciding how to harvest and structure the data you extract from the CAS > object which all the annotators target > 6. decide how to deploy the application (standalone?, webservices host? > multi-instance? ). Many considerations go into this and greatly affect > ability to scale. There is more than one architectural solution that will > work and allow you to get to your "fully automated" goal, but you will need > to implement that yourself. > > A hint about highlighting the text - all annotations carry text offsets so > with these you can write code (usually JS and CSS) to do your > highlighting. native cTakes does not have any graphical display > functionality. > > Another hint learned from experience. If you have many large texts (say, > 20kb and above with lots of potential terms to discover), you can achieve > much better throughput by breaking these into smaller chunks at sentence > boundaries and tweaking offsets accordingly as you reassemble the chunks. > The memory requirements grow rapidly with the size of the note. > > In summary, a strong developer background is a good starting point. To > that you'd want to add medical informatics, and experience with scalable > architectures. cTakes is a great kernel to your system but be prepared to > dive deep. > > Peter > > On Thu, Aug 10, 2023 at 10:06 AM Paul Stearns <pa...@compuace.com.invalid> > wrote: > > > I am looking for a NLP to read pathology reports and extract cancer > > related site, histology, stage and any other DX/RX data available. In > > looking at CTakes, I have a few questions; > > > > - Is CTakes an appropriate tool to automate this task? > > - The end goal would be a fully automated tool where text was presented > to > > an API and data was returned. > > - An added bonus, would be for the tool to annotate the text, so that a > > reviewer can more easily find the relevant data. > > - For someone with a strong IT/software development background, but no > NLP > > background what is the level of difficulty in getting started with this > > product? > > > > Paul R. Stearns > > Advanced Consulting Enterprises, Inc. > > 15150 NW 79th Court, > > Suite: 206 > > Miami Lakes Fl, 33016 > > > > Voice: (305)623-0360 x107 > > Fax: (305)623-4588 > > >