Hi, this is essentially just a decision of design. For a single longitudinal record, there is no problem at all. We can solve this even with some simple ruta rules, or with some cutom analysis engine. If we want to process a set of record of the same patient jointly, then we cannot apply a single pipeline. I propose to postpone the decison and implement it only for single documents for now.
Best, Peter Am 11.03.2016 um 20:03 schrieb Azad Dehghan: >> >> I had a quick look on PassTwo. This is not directly translatable into >> UIMA if the functionlity is based on analysis engines. Normally, >> analysis engines process one document at a time in a pipeline. My first >> quick guess is the we either need two pipelines (result is a program not >> a component) or we need a different definition of a CAS (joining all >> documents of a patient). Overall, it depends on the targeted use case of >> the project. Should it be usable in a cTAKES/uimaFIT pipeline? >> > The two pass method will have a broader applicability for NER on > longitudinal records... > > >> btw, the CRF models are not part of the contribution, right? >> >> > The CRF (UK,US) models will be released but this will be together with a > more mature software planned for August 2016. > > Best, >> Peter >> >> Am 10.03.2016 um 20:29 schrieb Azad Dehghan: >>> Thanks Peter, >>> >>> The rules were modeled using the training data. >>> >>> It would be good to incorporate/refactor (basically, GATE API needs to be >>> replaced with UIMA API to generate annotation) the two-pass recognition >>> method for cTAKES - which has a wider application on longitudinal data. >>> This method is used on-top of a number NERs. >>> >>> Please let me know where I can help. I will be available again in April. >>> >>> Cheers, >>> Azad >>> >>> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com> wrote: >>> >>>> Hi, >>>> >>>> sorry, I was quite busy last month. >>>> >>>> I added a new patch, which needs to be applied. >>>> >>>> No new rules, but it's possible now to evaluate everything against the >>>> labelled data of the challenge. >>>> >>>> @Azad: >>>> Which documents exactly did you use to develop the rules? >>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or >> testing-PHI-Gold-fixed? >>>> Best, >>>> >>>> Peter >>>> >>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl: >>>>> Hi, >>>>> >>>>> the last patch fixed almost all problems. >>>>> >>>>> I added another one that adds the csv file for the unit test and >> extends >>>>> svn-ignore. >>>>> >>>>> Best, >>>>> >>>>> Peter >>>>> >>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl: >>>>>> Hi, >>>>>> >>>>>> I added another patch. I missed to manually add one test file to >> version >>>>>> control, and there are still duplicate lines. >>>>>> I hope this patch fixes the remaining problems. >>>>>> >>>>>> Best, >>>>>> >>>>>> Peter >>>>>> >>>>>> >>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl: >>>>>>> Hi, >>>>>>> >>>>>>> the problems were caused by the svn client in my Eclipse. Sorry for >> the >>>>>>> trouble, I should have looked more closely at the ciomplete patch. >>>>>>> >>>>>>> I attached a new patch created with commandline tools wich looks >>>> correct >>>>>>> now. >>>>>>> >>>>>>> Pei, can you apply the new patch? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Peter >>>>>>> >>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl: >>>>>>>> Thanks Pei. >>>>>>>> >>>>>>>> I fear there was again a problem with the patch. All new files are >>>>>>>> missing (and also the svn-ignore settings). >>>>>>>> >>>>>>>> Can you take a look? >>>>>>>> >>>>>>>> Best, >>>>>>>> >>>>>>>> Peter >>>>>>>> >>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen: >>>>>>>>> patch applied. >>>>>>>>> Thanks, >>>>>>>>> Pei >>>>>>>>> >>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl < >>>> peter.klu...@averbis.com> wrote: >>>>>>>>>> Hi Pei, >>>>>>>>>> >>>>>>>>>> can you commit the recent patch for us? >>>>>>>>>> >>>>>>>>>> CTAKES-384-20160120.patch >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> >>>>>>>>>> Peter >>>>>>>>>> >>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen: >>>>>>>>>>> Hi, >>>>>>>>>>> Sorry I was swamped recently. >>>>>>>>>>> But yeah, we can even create an extended type system to store >>>> these items temporarily and add them into the main/core type system >>>> afterwards. >>>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it will >>>> require much more testing. If it works, we can upgrade it in our >> sandbox >>>> area or create a branch if necessary. >>>>>>>>>>> —Pei >>>>>>>>>>> >>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl < >>>> peter.klu...@averbis.com> wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> a new patch is attached. >>>>>>>>>>>> >>>>>>>>>>>> @Pei: >>>>>>>>>>>> are there suitable annotation types in the cTAKES type system? >>>> Some >>>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I map it >> to >>>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty >>>> features... >>>>>>>>>>>> @Azad: >>>>>>>>>>>> I changed the rules a bit, especially the capitalization like I >>>> use it >>>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by the >> maven >>>>>>>>>>>> plugin. I also added the two regexes for url and email. I >>>> extended the >>>>>>>>>>>> regex for the url. I also changed the evaluation order of some >>>> rules >>>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv for >>>> the unit >>>>>>>>>>>> tests. >>>>>>>>>>>> >>>>>>>>>>>> Let me know if you need more information about the changes. >>>>>>>>>>>> >>>>>>>>>>>> Do you wanna have help with the other rule sets? Or should we >>>> split them up? >>>>>>>>>>>> Best, >>>>>>>>>>>> >>>>>>>>>>>> Peter >>>>>>>>>>>> >>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> great. I will integrate them in the project and in the next >>>> patch. >>>>>>>>>>>>> Best, >>>>>>>>>>>>> >>>>>>>>>>>>> Peter >>>>>>>>>>>>> >>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan: >>>>>>>>>>>>>> Three NERs translated and uploaded. >>>>>>>>>>>>>> >>>>>>>>>>>>>> PS. I will validate all NERs once we have them all completed. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>> Azad >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan < >>>> azad.dehg...@gmail.com> wrote: >>>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are any >>>> more volunteers >>>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>> Azad >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" < >> peter.klu...@averbis.com> >>>> wrote: >>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it. >>>> Unfortunately, >>>>>>>>>>>>>>>> there is just no spare time right now. I hope I will be able >>>> to provide >>>>>>>>>>>>>>>> the patches in December. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Peter >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen: >>>>>>>>>>>>>>>>> Hi Peter, >>>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good starting >>>> point at least >>>>>>>>>>>>>>>>> in terms of maven modules, etc. I think it would be good >> if >>>> we use >>>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring components >>>> together and >>>>>>>>>>>>>>>>> generate desc's as secondary... >>>>>>>>>>>>>>>>> I think the actual components that would be required is >>>> probably best >>>>>>>>>>>>>>>>> left up to what is actually required for best performing >>>> c-deid. The >>>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we should >> treat >>>> this as >>>>>>>>>>>>>>>>> an independent preprocessing component or part of a >> pipeline >>>> (in which >>>>>>>>>>>>>>>>> case, we may need to propose a change to the type system or >>>> perhaps an >>>>>>>>>>>>>>>>> alternative JCas view. You can probably open up that >>>> discussion to >>>>>>>>>>>>>>>>> the dev group as you see fit.) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My 2 cents... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl < >>>> peter.klu...@averbis.com> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an example on >>>> how the >>>>>>>>>>>>>>> cTAKES >>>>>>>>>>>>>>>>>> community develops or how a project should look like? >>>>>>>>>>>>>>>>>> I learned that different people set up UIMA project in a >>>> quite >>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some sort of >>>> out-dated" >>>>>>>>>>>>>>>>>> approach in the cTAKES repo. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Are there restriction or preferences about the >> preprocessing >>>>>>>>>>>>>>> components >>>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the >> project. >>>>>>>>>>>>>>>>>> Components: On which components may the componetns rely: >>>> tokenizer, >>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>> parser, ... dict lookup? >>>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a >> single >>>> AE? >>>>>>>>>>>>>>>>>> More comments below. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan: >>>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to avoid >>>> duplicate >>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>>>>> and to coordnate the efforts ... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to RUTA. >>>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench if >>>> you want, or >>>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If any questions arise, just ask :-) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized for >> the >>>> initial >>>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to contribute it >>>> too? >>>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly available; >> i2b2 >>>>>>>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> typically >>>> releases the >>>>>>>>>>>>>>> data >>>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is done on >> an >>>>>>>>>>>>>>> individual basis >>>>>>>>>>>>>>>>>>> and involve a Data Use Agreement. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate the >>>> validation. >>>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to the >>>> dataset here. >>>>>>>>>>>>>>>>>>>> My first step would be: >>>>>>>>>>>>>>>>>>>> - set up a maven project >>>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with cTAKES >>>> components >>>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd party >> libs >>>> jars that >>>>>>>>>>>>>>>>>>>> were included to ensure compatibility. I’ll be sure to >>>> take a look >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> that over the next few weeks. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> —Pei >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is should >>>> not be a >>>>>>>>>>>>>>> need to >>>>>>>>>>>>>>>>>>> worry about the 3rd party libs. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an >> independent >>>> component >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this method >>>> have shown >>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely useful >>>> independent >>>>>>>>>>>>>>> of the >>>>>>>>>>>>>>>>>>> deid component. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Cheers, >>>>>>>>>>>>>>>>>>> Azad >>>>>>>>>>>>>>>>>>> >>