Hi Azad,
the basic rules are now translated. Do you wanna take a look at it? There remain still many issues and the F score is quite low on the dev set. I will continue improving the rules when I find the time. Best, Peter Am 15.03.2016 um 09:49 schrieb Peter Klügl: > Hi, > > this is essentially just a decision of design. For a single longitudinal > record, there is no problem at all. We can solve this even with some > simple ruta rules, or with some cutom analysis engine. If we want to > process a set of record of the same patient jointly, then we cannot > apply a single pipeline. I propose to postpone the decison and implement > it only for single documents for now. > > Best, > > Peter > > > Am 11.03.2016 um 20:03 schrieb Azad Dehghan: >>> I had a quick look on PassTwo. This is not directly translatable into >>> UIMA if the functionlity is based on analysis engines. Normally, >>> analysis engines process one document at a time in a pipeline. My first >>> quick guess is the we either need two pipelines (result is a program not >>> a component) or we need a different definition of a CAS (joining all >>> documents of a patient). Overall, it depends on the targeted use case of >>> the project. Should it be usable in a cTAKES/uimaFIT pipeline? >>> >> The two pass method will have a broader applicability for NER on >> longitudinal records... >> >> >>> btw, the CRF models are not part of the contribution, right? >>> >>> >> The CRF (UK,US) models will be released but this will be together with a >> more mature software planned for August 2016. >> >> Best, >>> Peter >>> >>> Am 10.03.2016 um 20:29 schrieb Azad Dehghan: >>>> Thanks Peter, >>>> >>>> The rules were modeled using the training data. >>>> >>>> It would be good to incorporate/refactor (basically, GATE API needs to be >>>> replaced with UIMA API to generate annotation) the two-pass recognition >>>> method for cTAKES - which has a wider application on longitudinal data. >>>> This method is used on-top of a number NERs. >>>> >>>> Please let me know where I can help. I will be available again in April. >>>> >>>> Cheers, >>>> Azad >>>> >>>> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com> wrote: >>>> >>>>> Hi, >>>>> >>>>> sorry, I was quite busy last month. >>>>> >>>>> I added a new patch, which needs to be applied. >>>>> >>>>> No new rules, but it's possible now to evaluate everything against the >>>>> labelled data of the challenge. >>>>> >>>>> @Azad: >>>>> Which documents exactly did you use to develop the rules? >>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or >>> testing-PHI-Gold-fixed? >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl: >>>>>> Hi, >>>>>> >>>>>> the last patch fixed almost all problems. >>>>>> >>>>>> I added another one that adds the csv file for the unit test and >>> extends >>>>>> svn-ignore. >>>>>> >>>>>> Best, >>>>>> >>>>>> Peter >>>>>> >>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl: >>>>>>> Hi, >>>>>>> >>>>>>> I added another patch. I missed to manually add one test file to >>> version >>>>>>> control, and there are still duplicate lines. >>>>>>> I hope this patch fixes the remaining problems. >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> >>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl: >>>>>>>> Hi, >>>>>>>> >>>>>>>> the problems were caused by the svn client in my Eclipse. Sorry for >>> the >>>>>>>> trouble, I should have looked more closely at the ciomplete patch. >>>>>>>> >>>>>>>> I attached a new patch created with commandline tools wich looks >>>>> correct >>>>>>>> now. >>>>>>>> >>>>>>>> Pei, can you apply the new patch? >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Peter >>>>>>>> >>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl: >>>>>>>>> Thanks Pei. >>>>>>>>> >>>>>>>>> I fear there was again a problem with the patch. All new files are >>>>>>>>> missing (and also the svn-ignore settings). >>>>>>>>> >>>>>>>>> Can you take a look? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen: >>>>>>>>>> patch applied. >>>>>>>>>> Thanks, >>>>>>>>>> Pei >>>>>>>>>> >>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl < >>>>> peter.klu...@averbis.com> wrote: >>>>>>>>>>> Hi Pei, >>>>>>>>>>> >>>>>>>>>>> can you commit the recent patch for us? >>>>>>>>>>> >>>>>>>>>>> CTAKES-384-20160120.patch >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Peter >>>>>>>>>>> >>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen: >>>>>>>>>>>> Hi, >>>>>>>>>>>> Sorry I was swamped recently. >>>>>>>>>>>> But yeah, we can even create an extended type system to store >>>>> these items temporarily and add them into the main/core type system >>>>> afterwards. >>>>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it will >>>>> require much more testing. If it works, we can upgrade it in our >>> sandbox >>>>> area or create a branch if necessary. >>>>>>>>>>>> —Pei >>>>>>>>>>>> >>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl < >>>>> peter.klu...@averbis.com> wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> a new patch is attached. >>>>>>>>>>>>> >>>>>>>>>>>>> @Pei: >>>>>>>>>>>>> are there suitable annotation types in the cTAKES type system? >>>>> Some >>>>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I map it >>> to >>>>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty >>>>> features... >>>>>>>>>>>>> @Azad: >>>>>>>>>>>>> I changed the rules a bit, especially the capitalization like I >>>>> use it >>>>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by the >>> maven >>>>>>>>>>>>> plugin. I also added the two regexes for url and email. I >>>>> extended the >>>>>>>>>>>>> regex for the url. I also changed the evaluation order of some >>>>> rules >>>>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv for >>>>> the unit >>>>>>>>>>>>> tests. >>>>>>>>>>>>> >>>>>>>>>>>>> Let me know if you need more information about the changes. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you wanna have help with the other rule sets? Or should we >>>>> split them up? >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl: >>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>> >>>>>>>>>>>>>> great. I will integrate them in the project and in the next >>>>> patch. >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Peter >>>>>>>>>>>>>> >>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan: >>>>>>>>>>>>>>> Three NERs translated and uploaded. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> PS. I will validate all NERs once we have them all completed. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Azad >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan < >>>>> azad.dehg...@gmail.com> wrote: >>>>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are any >>>>> more volunteers >>>>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>> Azad >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" < >>> peter.klu...@averbis.com> >>>>> wrote: >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it. >>>>> Unfortunately, >>>>>>>>>>>>>>>>> there is just no spare time right now. I hope I will be able >>>>> to provide >>>>>>>>>>>>>>>>> the patches in December. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Peter >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen: >>>>>>>>>>>>>>>>>> Hi Peter, >>>>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good starting >>>>> point at least >>>>>>>>>>>>>>>>>> in terms of maven modules, etc. I think it would be good >>> if >>>>> we use >>>>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring components >>>>> together and >>>>>>>>>>>>>>>>>> generate desc's as secondary... >>>>>>>>>>>>>>>>>> I think the actual components that would be required is >>>>> probably best >>>>>>>>>>>>>>>>>> left up to what is actually required for best performing >>>>> c-deid. The >>>>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we should >>> treat >>>>> this as >>>>>>>>>>>>>>>>>> an independent preprocessing component or part of a >>> pipeline >>>>> (in which >>>>>>>>>>>>>>>>>> case, we may need to propose a change to the type system or >>>>> perhaps an >>>>>>>>>>>>>>>>>> alternative JCas view. You can probably open up that >>>>> discussion to >>>>>>>>>>>>>>>>>> the dev group as you see fit.) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> My 2 cents... >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl < >>>>> peter.klu...@averbis.com> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an example on >>>>> how the >>>>>>>>>>>>>>>> cTAKES >>>>>>>>>>>>>>>>>>> community develops or how a project should look like? >>>>>>>>>>>>>>>>>>> I learned that different people set up UIMA project in a >>>>> quite >>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some sort of >>>>> out-dated" >>>>>>>>>>>>>>>>>>> approach in the cTAKES repo. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Are there restriction or preferences about the >>> preprocessing >>>>>>>>>>>>>>>> components >>>>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the >>> project. >>>>>>>>>>>>>>>>>>> Components: On which components may the componetns rely: >>>>> tokenizer, >>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>> parser, ... dict lookup? >>>>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a >>> single >>>>> AE? >>>>>>>>>>>>>>>>>>> More comments below. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan: >>>>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to avoid >>>>> duplicate >>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>>>> and to coordnate the efforts ... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to RUTA. >>>>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench if >>>>> you want, or >>>>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> If any questions arise, just ask :-) >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized for >>> the >>>>> initial >>>>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to contribute it >>>>> too? >>>>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly available; >>> i2b2 >>>>>>>>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically >>>>> releases the >>>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is done on >>> an >>>>>>>>>>>>>>>> individual basis >>>>>>>>>>>>>>>>>>>> and involve a Data Use Agreement. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate the >>>>> validation. >>>>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to the >>>>> dataset here. >>>>>>>>>>>>>>>>>>>>> My first step would be: >>>>>>>>>>>>>>>>>>>>> - set up a maven project >>>>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with cTAKES >>>>> components >>>>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing) >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd party >>> libs >>>>> jars that >>>>>>>>>>>>>>>>>>>>> were included to ensure compatibility. I’ll be sure to >>>>> take a look >>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>>> that over the next few weeks. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> —Pei >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is should >>>>> not be a >>>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>>>>>> worry about the 3rd party libs. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an >>> independent >>>>> component >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this method >>>>> have shown >>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely useful >>>>> independent >>>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>>>> deid component. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>>> Azad >>>>>>>>>>>>>>>>>>>>