Thanks Guergana. > Yes, the current release of cTAKES has a module for the temporal expressions which includes dates. The normalizer for the temporal expressions is Steven Bethard's timenorm code. >
Great. > However, if you do de-identification of dates/temporal expressions, you run the risk of creating incorrect timelines as many of the relative temporal expressions (e.g. spring of this year, x-mas time, etc.) are unlikely to be correctly shifted by any de-identification tool. > Indeed, a reason I have not included the dates component. > One de-identification tool is MIST -- http://mist-deid.sourceforge.net/ . > I don't remember them doing well in the community held evaluation in 2014. Hence, cDeid :) > > Guergana Savova, PhD, FACMI > Associate Professor > PI Natural Language Processing Lab > Boston Children's Hospital and Harvard Medical School > 300 Longwood Avenue > Mailstop: BCH3092 > Enders 144.1 > Boston, MA 02115 > Tel: (617) 919-2972 > Fax: (617) 730-0817 > Harvard Scholar: http://scholar.harvard.edu/guergana_k_savova/biocv > > -----Original Message----- > From: Azad Dehghan [mailto:azad.dehg...@gmail.com] > Sent: Thursday, March 10, 2016 3:42 PM > To: dev@ctakes.apache.org > Subject: Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives > > > This means both training data folders? I have access to the data but > > not > to the challenge description. > > Yes. Is there any specific information that you are missing? > > > > > >> It would be good to incorporate/refactor (basically, GATE API needs > >> to be replaced with UIMA API to generate annotation) the two-pass > >> recognition method for cTAKES - which has a wider application on longitudinal data. > >> This method is used on-top of a number NERs. > > > > > > I'll take a look. > > > > I do not know how much time I can invest this month. Let's see how > > many > phases I can translate. > > > > I added the rules for age. Are there jape rules for creating date > annotations? > > > > No. I believe cTAKES has existing component(s) to capture dates? > > > After all rules are translated, they need some major refactoring. Jape > and Ruta are quite different in some aspects. > > > Ok. > > > > > > > > > > > > >> Please let me know where I can help. I will be available again in April. > >> > >> Cheers, > >> Azad > >> > >> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com> wrote: > >> > >>> Hi, > >>> > >>> sorry, I was quite busy last month. > >>> > >>> I added a new patch, which needs to be applied. > >>> > >>> No new rules, but it's possible now to evaluate everything against > >>> the labelled data of the challenge. > >>> > >>> @Azad: > >>> Which documents exactly did you use to develop the rules? > >>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or > testing-PHI-Gold-fixed? > >>> > >>> Best, > >>> > >>> Peter > >>> > >>> Am 03.02.2016 um 09:05 schrieb Peter Klügl: > >>>> > >>>> Hi, > >>>> > >>>> the last patch fixed almost all problems. > >>>> > >>>> I added another one that adds the csv file for the unit test and > extends > >>>> svn-ignore. > >>>> > >>>> Best, > >>>> > >>>> Peter > >>>> > >>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I added another patch. I missed to manually add one test file to > version > >>>>> control, and there are still duplicate lines. > >>>>> I hope this patch fixes the remaining problems. > >>>>> > >>>>> Best, > >>>>> > >>>>> Peter > >>>>> > >>>>> > >>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl: > >>>>>> > >>>>>> Hi, > >>>>>> > >>>>>> the problems were caused by the svn client in my Eclipse. Sorry > >>>>>> for > the > >>>>>> trouble, I should have looked more closely at the ciomplete patch. > >>>>>> > >>>>>> I attached a new patch created with commandline tools wich looks > >>> > >>> correct > >>>>>> > >>>>>> now. > >>>>>> > >>>>>> Pei, can you apply the new patch? > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> Peter > >>>>>> > >>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl: > >>>>>>> > >>>>>>> Thanks Pei. > >>>>>>> > >>>>>>> I fear there was again a problem with the patch. All new files > >>>>>>> are missing (and also the svn-ignore settings). > >>>>>>> > >>>>>>> Can you take a look? > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> Peter > >>>>>>> > >>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen: > >>>>>>>> > >>>>>>>> patch applied. > >>>>>>>> Thanks, > >>>>>>>> Pei > >>>>>>>> > >>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl < > >>> > >>> peter.klu...@averbis.com> wrote: > >>>>>>>>> > >>>>>>>>> Hi Pei, > >>>>>>>>> > >>>>>>>>> can you commit the recent patch for us? > >>>>>>>>> > >>>>>>>>> CTAKES-384-20160120.patch > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> > >>>>>>>>> Peter > >>>>>>>>> > >>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen: > >>>>>>>>>> > >>>>>>>>>> Hi, > >>>>>>>>>> Sorry I was swamped recently. > >>>>>>>>>> But yeah, we can even create an extended type system to store > >>> > >>> these items temporarily and add them into the main/core type system > >>> afterwards. > >>>>>>>>>> > >>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it > >>>>>>>>>> will > >>> > >>> require much more testing. If it works, we can upgrade it in our > sandbox > >>> area or create a branch if necessary. > >>>>>>>>>> > >>>>>>>>>> —Pei > >>>>>>>>>> > >>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl < > >>> > >>> peter.klu...@averbis.com> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Hi, > >>>>>>>>>>> > >>>>>>>>>>> a new patch is attached. > >>>>>>>>>>> > >>>>>>>>>>> @Pei: > >>>>>>>>>>> are there suitable annotation types in the cTAKES type system? > >>> > >>> Some > >>>>>>>>>>> > >>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I map > >>>>>>>>>>> it > to > >>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty > >>> > >>> features... > >>>>>>>>>>> > >>>>>>>>>>> @Azad: > >>>>>>>>>>> I changed the rules a bit, especially the capitalization > >>>>>>>>>>> like I > >>> > >>> use it > >>>>>>>>>>> > >>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by the > maven > >>>>>>>>>>> plugin. I also added the two regexes for url and email. I > >>> > >>> extended the > >>>>>>>>>>> > >>>>>>>>>>> regex for the url. I also changed the evaluation order of > >>>>>>>>>>> some > >>> > >>> rules > >>>>>>>>>>> > >>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv > >>>>>>>>>>> for > >>> > >>> the unit > >>>>>>>>>>> > >>>>>>>>>>> tests. > >>>>>>>>>>> > >>>>>>>>>>> Let me know if you need more information about the changes. > >>>>>>>>>>> > >>>>>>>>>>> Do you wanna have help with the other rule sets? Or should > >>>>>>>>>>> we > >>> > >>> split them up? > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> > >>>>>>>>>>> Peter > >>>>>>>>>>> > >>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl: > >>>>>>>>>>>> > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> > >>>>>>>>>>>> great. I will integrate them in the project and in the next > >>> > >>> patch. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> > >>>>>>>>>>>> Peter > >>>>>>>>>>>> > >>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Three NERs translated and uploaded. > >>>>>>>>>>>>> > >>>>>>>>>>>>> PS. I will validate all NERs once we have them all completed. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>> Azad > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan < > >>> > >>> azad.dehg...@gmail.com> wrote: > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are > >>>>>>>>>>>>>> any > >>> > >>> more volunteers > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>> Azad > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" > >>>>>>>>>>>>>> <peter.klu...@averbis.com > > > >>> > >>> wrote: > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it. > >>> > >>> Unfortunately, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> there is just no spare time right now. I hope I will be > >>>>>>>>>>>>>>> able > >>> > >>> to provide > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> the patches in December. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Peter > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen: > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Hi Peter, > >>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good starting > >>> > >>> point at least > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> in terms of maven modules, etc. I think it would be > >>>>>>>>>>>>>>>> good > if > >>> > >>> we use > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring components > >>> > >>> together and > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> generate desc's as secondary... > >>>>>>>>>>>>>>>> I think the actual components that would be required is > >>> > >>> probably best > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> left up to what is actually required for best > >>>>>>>>>>>>>>>> performing > >>> > >>> c-deid. The > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we should > treat > >>> > >>> this as > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> an independent preprocessing component or part of a > pipeline > >>> > >>> (in which > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> case, we may need to propose a change to the type > >>>>>>>>>>>>>>>> system or > >>> > >>> perhaps an > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> alternative JCas view. You can probably open up that > >>> > >>> discussion to > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> the dev group as you see fit.) > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> My 2 cents... > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl < > >>> > >>> peter.klu...@averbis.com> > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an example > >>>>>>>>>>>>>>>>> on > >>> > >>> how the > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> cTAKES > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> community develops or how a project should look like? > >>>>>>>>>>>>>>>>> I learned that different people set up UIMA project in > >>>>>>>>>>>>>>>>> a > >>> > >>> quite > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some sort > >>>>>>>>>>>>>>>>> of > >>> > >>> out-dated" > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> approach in the cTAKES repo. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Are there restriction or preferences about the > preprocessing > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> components > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the > project. > >>>>>>>>>>>>>>>>> Components: On which components may the componetns rely: > >>> > >>> tokenizer, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> parser, ... dict lookup? > >>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a > single > >>> > >>> AE? > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> More comments below. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan: > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to > >>>>>>>>>>>>>>>>>>> avoid > >>> > >>> duplicate > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> work > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> and to coordnate the efforts ... > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to RUTA. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench > >>>>>>>>>>>>>>>>> if > >>> > >>> you want, or > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> If any questions arise, just ask :-) > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized > >>>>>>>>>>>>>>>>>>> for > the > >>> > >>> initial > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to > >>>>>>>>>>>>>>>>>>> contribute it > >>> > >>> too? > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly available; > i2b2 > >>>>>>>>>>>>>>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A_ > >>>>>>>>>>>>>>>>>> _www.i2b2.org_NLP_DataSets_Main.php&d=BQIFaQ&c=qS4goW > >>>>>>>>>>>>>>>>>> BT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=SeLHlpmrGNnJ9 > >>>>>>>>>>>>>>>>>> mI2WCgf_wwQk9zL4aIrVmfBoSi-j0kfEcrO4yRGmRCJNAr-rCmP&m > >>>>>>>>>>>>>>>>>> =1Qpd4A2PgVD13w31PkkvmJf6I0PTCatCzgBgsnetPOg&s=aAEeOR > >>>>>>>>>>>>>>>>>> yMtz7NCv-6EEgiABVY_Rf6zLnJghQh2DA_CKQ&e= > typically > >>> > >>> releases the > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> data > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is done > >>>>>>>>>>>>>>>>>> on > an > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> individual basis > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> and involve a Data Use Agreement. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate the > >>> > >>> validation. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to the > >>> > >>> dataset here. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> My first step would be: > >>>>>>>>>>>>>>>>>>> - set up a maven project > >>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with > >>>>>>>>>>>>>>>>>>> cTAKES > >>> > >>> components > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd party > libs > >>> > >>> jars that > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> were included to ensure compatibility. I’ll be sure > >>>>>>>>>>>>>>>>>>> to > >>> > >>> take a look > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> that over the next few weeks. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> —Pei > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is > >>>>>>>>>>>>>>>>>> should > >>> > >>> not be a > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> need to > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> worry about the 3rd party libs. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an > independent > >>> > >>> component > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this > >>>>>>>>>>>>>>>>>> method > >>> > >>> have shown > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> useful > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely > >>>>>>>>>>>>>>>>>> useful > >>> > >>> independent > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> of the > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> deid component. > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>> Azad > >>>>>>>>>>>>>>>>>> > >>> > >