Yes, the current release of cTAKES has a module for the temporal expressions which includes dates. The normalizer for the temporal expressions is Steven Bethard's timenorm code.
However, if you do de-identification of dates/temporal expressions, you run the risk of creating incorrect timelines as many of the relative temporal expressions (e.g. spring of this year, x-mas time, etc.) are unlikely to be correctly shifted by any de-identification tool. One de-identification tool is MIST -- http://mist-deid.sourceforge.net/ . Hope this helps with the de-identification items.... --Guergana Guergana Savova, PhD, FACMI Associate Professor PI Natural Language Processing Lab Boston Children's Hospital and Harvard Medical School 300 Longwood Avenue Mailstop: BCH3092 Enders 144.1 Boston, MA 02115 Tel: (617) 919-2972 Fax: (617) 730-0817 Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv -----Original Message----- From: Azad Dehghan [mailto:azad.dehg...@gmail.com] Sent: Thursday, March 10, 2016 3:42 PM To: dev@ctakes.apache.org Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives > This means both training data folders? I have access to the data but > not to the challenge description. Yes. Is there any specific information that you are missing? > > >> It would be good to incorporate/refactor (basically, GATE API needs >> to be replaced with UIMA API to generate annotation) the two-pass >> recognition method for cTAKES - which has a wider application on >> longitudinal data. >> This method is used on-top of a number NERs. > > > I'll take a look. > > I do not know how much time I can invest this month. Let's see how > many phases I can translate. > > I added the rules for age. Are there jape rules for creating date annotations? > No. I believe cTAKES has existing component(s) to capture dates? > After all rules are translated, they need some major refactoring. Jape and Ruta are quite different in some aspects. > Ok. > > > > > >> Please let me know where I can help. I will be available again in April. >> >> Cheers, >> Azad >> >> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com> wrote: >> >>> Hi, >>> >>> sorry, I was quite busy last month. >>> >>> I added a new patch, which needs to be applied. >>> >>> No new rules, but it's possible now to evaluate everything against >>> the labelled data of the challenge. >>> >>> @Azad: >>> Which documents exactly did you use to develop the rules? >>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or testing-PHI-Gold-fixed? >>> >>> Best, >>> >>> Peter >>> >>> Am 03.02.2016 um 09:05 schrieb Peter Klügl: >>>> >>>> Hi, >>>> >>>> the last patch fixed almost all problems. >>>> >>>> I added another one that adds the csv file for the unit test and extends >>>> svn-ignore. >>>> >>>> Best, >>>> >>>> Peter >>>> >>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl: >>>>> >>>>> Hi, >>>>> >>>>> I added another patch. I missed to manually add one test file to version >>>>> control, and there are still duplicate lines. >>>>> I hope this patch fixes the remaining problems. >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> >>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl: >>>>>> >>>>>> Hi, >>>>>> >>>>>> the problems were caused by the svn client in my Eclipse. Sorry >>>>>> for the >>>>>> trouble, I should have looked more closely at the ciomplete patch. >>>>>> >>>>>> I attached a new patch created with commandline tools wich looks >>> >>> correct >>>>>> >>>>>> now. >>>>>> >>>>>> Pei, can you apply the new patch? >>>>>> >>>>>> Best, >>>>>> >>>>>> Peter >>>>>> >>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl: >>>>>>> >>>>>>> Thanks Pei. >>>>>>> >>>>>>> I fear there was again a problem with the patch. All new files >>>>>>> are missing (and also the svn-ignore settings). >>>>>>> >>>>>>> Can you take a look? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen: >>>>>>>> >>>>>>>> patch applied. >>>>>>>> Thanks, >>>>>>>> Pei >>>>>>>> >>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl < >>> >>> peter.klu...@averbis.com> wrote: >>>>>>>>> >>>>>>>>> Hi Pei, >>>>>>>>> >>>>>>>>> can you commit the recent patch for us? >>>>>>>>> >>>>>>>>> CTAKES-384-20160120.patch >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> Sorry I was swamped recently. >>>>>>>>>> But yeah, we can even create an extended type system to store >>> >>> these items temporarily and add them into the main/core type system >>> afterwards. >>>>>>>>>> >>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it >>>>>>>>>> will >>> >>> require much more testing. If it works, we can upgrade it in our sandbox >>> area or create a branch if necessary. >>>>>>>>>> >>>>>>>>>> —Pei >>>>>>>>>> >>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl < >>> >>> peter.klu...@averbis.com> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> a new patch is attached. >>>>>>>>>>> >>>>>>>>>>> @Pei: >>>>>>>>>>> are there suitable annotation types in the cTAKES type system? >>> >>> Some >>>>>>>>>>> >>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I map >>>>>>>>>>> it to >>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty >>> >>> features... >>>>>>>>>>> >>>>>>>>>>> @Azad: >>>>>>>>>>> I changed the rules a bit, especially the capitalization >>>>>>>>>>> like I >>> >>> use it >>>>>>>>>>> >>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by the maven >>>>>>>>>>> plugin. I also added the two regexes for url and email. I >>> >>> extended the >>>>>>>>>>> >>>>>>>>>>> regex for the url. I also changed the evaluation order of >>>>>>>>>>> some >>> >>> rules >>>>>>>>>>> >>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv >>>>>>>>>>> for >>> >>> the unit >>>>>>>>>>> >>>>>>>>>>> tests. >>>>>>>>>>> >>>>>>>>>>> Let me know if you need more information about the changes. >>>>>>>>>>> >>>>>>>>>>> Do you wanna have help with the other rule sets? Or should >>>>>>>>>>> we >>> >>> split them up? >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl: >>>>>>>>>>>> >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> great. I will integrate them in the project and in the next >>> >>> patch. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan: >>>>>>>>>>>>> >>>>>>>>>>>>> Three NERs translated and uploaded. >>>>>>>>>>>>> >>>>>>>>>>>>> PS. I will validate all NERs once we have them all completed. >>>>>>>>>>>>> >>>>>>>>>>>>> Cheers, >>>>>>>>>>>>> Azad >>>>>>>>>>>>> >>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan < >>> >>> azad.dehg...@gmail.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are >>>>>>>>>>>>>> any >>> >>> more volunteers >>>>>>>>>>>>>> >>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Azad >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" >>>>>>>>>>>>>> <peter.klu...@averbis.com > >>> >>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it. >>> >>> Unfortunately, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> there is just no spare time right now. I hope I will be >>>>>>>>>>>>>>> able >>> >>> to provide >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> the patches in December. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Peter >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Peter, >>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good starting >>> >>> point at least >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> in terms of maven modules, etc. I think it would be >>>>>>>>>>>>>>>> good if >>> >>> we use >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring components >>> >>> together and >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> generate desc's as secondary... >>>>>>>>>>>>>>>> I think the actual components that would be required is >>> >>> probably best >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> left up to what is actually required for best >>>>>>>>>>>>>>>> performing >>> >>> c-deid. The >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we should treat >>> >>> this as >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> an independent preprocessing component or part of a pipeline >>> >>> (in which >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> case, we may need to propose a change to the type >>>>>>>>>>>>>>>> system or >>> >>> perhaps an >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> alternative JCas view. You can probably open up that >>> >>> discussion to >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> the dev group as you see fit.) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> My 2 cents... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl < >>> >>> peter.klu...@averbis.com> >>>>>>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an example >>>>>>>>>>>>>>>>> on >>> >>> how the >>>>>>>>>>>>>> >>>>>>>>>>>>>> cTAKES >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> community develops or how a project should look like? >>>>>>>>>>>>>>>>> I learned that different people set up UIMA project in >>>>>>>>>>>>>>>>> a >>> >>> quite >>>>>>>>>>>>>> >>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some sort >>>>>>>>>>>>>>>>> of >>> >>> out-dated" >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> approach in the cTAKES repo. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Are there restriction or preferences about the preprocessing >>>>>>>>>>>>>> >>>>>>>>>>>>>> components >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the project. >>>>>>>>>>>>>>>>> Components: On which components may the componetns rely: >>> >>> tokenizer, >>>>>>>>>>>>>> >>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> parser, ... dict lookup? >>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a single >>> >>> AE? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> More comments below. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to >>>>>>>>>>>>>>>>>>> avoid >>> >>> duplicate >>>>>>>>>>>>>> >>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> and to coordnate the efforts ... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to RUTA. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench >>>>>>>>>>>>>>>>> if >>> >>> you want, or >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If any questions arise, just ask :-) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized >>>>>>>>>>>>>>>>>>> for the >>> >>> initial >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to >>>>>>>>>>>>>>>>>>> contribute it >>> >>> too? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly available; i2b2 >>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A_ >>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4goW >>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9 >>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m >>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEeOR >>>>>>>>>>>>>>>>>> yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e= > typically >>> >>> releases the >>>>>>>>>>>>>> >>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is done >>>>>>>>>>>>>>>>>> on an >>>>>>>>>>>>>> >>>>>>>>>>>>>> individual basis >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> and involve a Data Use Agreement. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate the >>> >>> validation. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to the >>> >>> dataset here. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> My first step would be: >>>>>>>>>>>>>>>>>>> - set up a maven project >>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with >>>>>>>>>>>>>>>>>>> cTAKES >>> >>> components >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd party libs >>> >>> jars that >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> were included to ensure compatibility. I’ll be sure >>>>>>>>>>>>>>>>>>> to >>> >>> take a look >>>>>>>>>>>>>> >>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> that over the next few weeks. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> —Pei >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is >>>>>>>>>>>>>>>>>> should >>> >>> not be a >>>>>>>>>>>>>> >>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> worry about the 3rd party libs. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an independent >>> >>> component >>>>>>>>>>>>>> >>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this >>>>>>>>>>>>>>>>>> method >>> >>> have shown >>>>>>>>>>>>>> >>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely >>>>>>>>>>>>>>>>>> useful >>> >>> independent >>>>>>>>>>>>>> >>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> deid component. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>> Azad >>>>>>>>>>>>>>>>>> >>> >