on a tangential note, we do have example of running ctakes in a massively parallel system like spark/hadoop.
https://svn.apache.org/repos/asf/ctakes/sandbox/ctakes-spark-streaming-twitter/ if you're problem is embarrasingly parallelizable, you can use mapreduce/spark to distribute your app using that as a template (spark streaming can ) On Fri, Dec 5, 2014 at 1:29 PM, Geise, Brandon D. <bdge...@geisinger.edu> wrote: > Thanks Sean. I'll take a look and see if this speeds the pipeline up. > > Thanks, > Brandon > > -----Original Message----- > From: Finan, Sean [mailto:sean.fi...@childrens.harvard.edu] > Sent: Friday, December 05, 2014 1:14 PM > To: dev@ctakes.apache.org > Subject: RE: Scaling cTakes > > Hi Brandon, > > It sounds like you've got a decent pipeline set up. To increase the > speed you could try swapping out use of ctakes-dictionary-lookup with > ctakes-dictionary-lookup-fast in the AE. Check > ctakes-clinical-pipeline/desc/[ae]/AggregatePlaintextFastUMLSProcessor.xml > for an example. As for the CASPool, I don't think that it will make any > difference for cTakes. > > Sean > ________________________________________ > From: Geise, Brandon D. [bdge...@geisinger.edu] > Sent: Friday, December 05, 2014 12:40 PM > To: dev@ctakes.apache.org > Subject: Scaling cTakes > > Hi, > > I'm new to cTakes and the UIMA framework. I've read most of the UIMA > documentation and was able to take the BagofCUIGenerator example and modify > to read notes from a DB, process using the UMLS AE in the clinical-pipeline > using a local DB version of UMLS, and output the CUIs to a DB. However, > the problem I'm having is it's extremely slow; ~3.5-4 notes a minute. I > was hoping I could get some hints or advice on speeding the process up. I > read there's a patch for LVG, but wasn't quite sure how to implement. Also > from testing using the CPE GUI, I don't notice any different in processing > time by adjusting the CASPool setting. Some advice on the CASPool would be > appreciated also. > > Thanks, > Brandon > > > IMPORTANT WARNING: The information in this message (and the documents > attached to it, if any) is confidential and may be legally privileged. It > is intended solely for the addressee. Access to this message by anyone else > is unauthorized. If you are not the intended recipient, any disclosure, > copying, distribution or any action taken, or omitted to be taken, in > reliance on it is prohibited and may be unlawful. If you have received this > message in error, please delete all electronic copies of this message (and > the documents attached to it, if any), destroy any hard copies you may have > created and notify me immediately by replying to this email. Thank you. > > Geisinger Health System utilizes an encryption process to safeguard > Protected Health Information and other confidential data contained in > external e-mail messages. If email is encrypted, the recipient will receive > an e-mail instructing them to sign on to the Geisinger Health System Secure > E-mail Message Center to retrieve the encrypted e-mail. > > -- jay vyas