Hi Peter, I will pick this up soon after the summer I hope.
Cheers, Azad 2016-06-07 8:57 GMT+01:00 Peter Klügl <peter.klu...@averbis.com>: > Hi Azad, > > > the basic rules are now translated. Do you wanna take a look at it? > > > There remain still many issues and the F score is quite low on the dev > set. I will continue improving the rules when I find the time. > > > Best, > > Peter > > > Am 15.03.2016 um 09:49 schrieb Peter Klügl: > > Hi, > > > > this is essentially just a decision of design. For a single longitudinal > > record, there is no problem at all. We can solve this even with some > > simple ruta rules, or with some cutom analysis engine. If we want to > > process a set of record of the same patient jointly, then we cannot > > apply a single pipeline. I propose to postpone the decison and implement > > it only for single documents for now. > > > > Best, > > > > Peter > > > > > > Am 11.03.2016 um 20:03 schrieb Azad Dehghan: > >>> I had a quick look on PassTwo. This is not directly translatable into > >>> UIMA if the functionlity is based on analysis engines. Normally, > >>> analysis engines process one document at a time in a pipeline. My first > >>> quick guess is the we either need two pipelines (result is a program > not > >>> a component) or we need a different definition of a CAS (joining all > >>> documents of a patient). Overall, it depends on the targeted use case > of > >>> the project. Should it be usable in a cTAKES/uimaFIT pipeline? > >>> > >> The two pass method will have a broader applicability for NER on > >> longitudinal records... > >> > >> > >>> btw, the CRF models are not part of the contribution, right? > >>> > >>> > >> The CRF (UK,US) models will be released but this will be together with > a > >> more mature software planned for August 2016. > >> > >> Best, > >>> Peter > >>> > >>> Am 10.03.2016 um 20:29 schrieb Azad Dehghan: > >>>> Thanks Peter, > >>>> > >>>> The rules were modeled using the training data. > >>>> > >>>> It would be good to incorporate/refactor (basically, GATE API needs > to be > >>>> replaced with UIMA API to generate annotation) the two-pass > recognition > >>>> method for cTAKES - which has a wider application on longitudinal > data. > >>>> This method is used on-top of a number NERs. > >>>> > >>>> Please let me know where I can help. I will be available again in > April. > >>>> > >>>> Cheers, > >>>> Azad > >>>> > >>>> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com> > wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> sorry, I was quite busy last month. > >>>>> > >>>>> I added a new patch, which needs to be applied. > >>>>> > >>>>> No new rules, but it's possible now to evaluate everything against > the > >>>>> labelled data of the challenge. > >>>>> > >>>>> @Azad: > >>>>> Which documents exactly did you use to develop the rules? > >>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or > >>> testing-PHI-Gold-fixed? > >>>>> Best, > >>>>> > >>>>> Peter > >>>>> > >>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl: > >>>>>> Hi, > >>>>>> > >>>>>> the last patch fixed almost all problems. > >>>>>> > >>>>>> I added another one that adds the csv file for the unit test and > >>> extends > >>>>>> svn-ignore. > >>>>>> > >>>>>> Best, > >>>>>> > >>>>>> Peter > >>>>>> > >>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl: > >>>>>>> Hi, > >>>>>>> > >>>>>>> I added another patch. I missed to manually add one test file to > >>> version > >>>>>>> control, and there are still duplicate lines. > >>>>>>> I hope this patch fixes the remaining problems. > >>>>>>> > >>>>>>> Best, > >>>>>>> > >>>>>>> Peter > >>>>>>> > >>>>>>> > >>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl: > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> the problems were caused by the svn client in my Eclipse. Sorry > for > >>> the > >>>>>>>> trouble, I should have looked more closely at the ciomplete patch. > >>>>>>>> > >>>>>>>> I attached a new patch created with commandline tools wich looks > >>>>> correct > >>>>>>>> now. > >>>>>>>> > >>>>>>>> Pei, can you apply the new patch? > >>>>>>>> > >>>>>>>> Best, > >>>>>>>> > >>>>>>>> Peter > >>>>>>>> > >>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl: > >>>>>>>>> Thanks Pei. > >>>>>>>>> > >>>>>>>>> I fear there was again a problem with the patch. All new files > are > >>>>>>>>> missing (and also the svn-ignore settings). > >>>>>>>>> > >>>>>>>>> Can you take a look? > >>>>>>>>> > >>>>>>>>> Best, > >>>>>>>>> > >>>>>>>>> Peter > >>>>>>>>> > >>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen: > >>>>>>>>>> patch applied. > >>>>>>>>>> Thanks, > >>>>>>>>>> Pei > >>>>>>>>>> > >>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl < > >>>>> peter.klu...@averbis.com> wrote: > >>>>>>>>>>> Hi Pei, > >>>>>>>>>>> > >>>>>>>>>>> can you commit the recent patch for us? > >>>>>>>>>>> > >>>>>>>>>>> CTAKES-384-20160120.patch > >>>>>>>>>>> > >>>>>>>>>>> Best, > >>>>>>>>>>> > >>>>>>>>>>> Peter > >>>>>>>>>>> > >>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen: > >>>>>>>>>>>> Hi, > >>>>>>>>>>>> Sorry I was swamped recently. > >>>>>>>>>>>> But yeah, we can even create an extended type system to store > >>>>> these items temporarily and add them into the main/core type system > >>>>> afterwards. > >>>>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it > will > >>>>> require much more testing. If it works, we can upgrade it in our > >>> sandbox > >>>>> area or create a branch if necessary. > >>>>>>>>>>>> —Pei > >>>>>>>>>>>> > >>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl < > >>>>> peter.klu...@averbis.com> wrote: > >>>>>>>>>>>>> Hi, > >>>>>>>>>>>>> > >>>>>>>>>>>>> a new patch is attached. > >>>>>>>>>>>>> > >>>>>>>>>>>>> @Pei: > >>>>>>>>>>>>> are there suitable annotation types in the cTAKES type > system? > >>>>> Some > >>>>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I map > it > >>> to > >>>>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty > >>>>> features... > >>>>>>>>>>>>> @Azad: > >>>>>>>>>>>>> I changed the rules a bit, especially the capitalization > like I > >>>>> use it > >>>>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by the > >>> maven > >>>>>>>>>>>>> plugin. I also added the two regexes for url and email. I > >>>>> extended the > >>>>>>>>>>>>> regex for the url. I also changed the evaluation order of > some > >>>>> rules > >>>>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv > for > >>>>> the unit > >>>>>>>>>>>>> tests. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Let me know if you need more information about the changes. > >>>>>>>>>>>>> > >>>>>>>>>>>>> Do you wanna have help with the other rule sets? Or should we > >>>>> split them up? > >>>>>>>>>>>>> Best, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Peter > >>>>>>>>>>>>> > >>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl: > >>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> great. I will integrate them in the project and in the next > >>>>> patch. > >>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Peter > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan: > >>>>>>>>>>>>>>> Three NERs translated and uploaded. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> PS. I will validate all NERs once we have them all > completed. > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>> Azad > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan < > >>>>> azad.dehg...@gmail.com> wrote: > >>>>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are any > >>>>> more volunteers > >>>>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch. > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>> Azad > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" < > >>> peter.klu...@averbis.com> > >>>>> wrote: > >>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it. > >>>>> Unfortunately, > >>>>>>>>>>>>>>>>> there is just no spare time right now. I hope I will be > able > >>>>> to provide > >>>>>>>>>>>>>>>>> the patches in December. > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Best, > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Peter > >>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen: > >>>>>>>>>>>>>>>>>> Hi Peter, > >>>>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good starting > >>>>> point at least > >>>>>>>>>>>>>>>>>> in terms of maven modules, etc. I think it would be > good > >>> if > >>>>> we use > >>>>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring components > >>>>> together and > >>>>>>>>>>>>>>>>>> generate desc's as secondary... > >>>>>>>>>>>>>>>>>> I think the actual components that would be required is > >>>>> probably best > >>>>>>>>>>>>>>>>>> left up to what is actually required for best performing > >>>>> c-deid. The > >>>>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we should > >>> treat > >>>>> this as > >>>>>>>>>>>>>>>>>> an independent preprocessing component or part of a > >>> pipeline > >>>>> (in which > >>>>>>>>>>>>>>>>>> case, we may need to propose a change to the type > system or > >>>>> perhaps an > >>>>>>>>>>>>>>>>>> alternative JCas view. You can probably open up that > >>>>> discussion to > >>>>>>>>>>>>>>>>>> the dev group as you see fit.) > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> My 2 cents... > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl < > >>>>> peter.klu...@averbis.com> > >>>>>>>>>>>>>>>> wrote: > >>>>>>>>>>>>>>>>>>> Hi, > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an example > on > >>>>> how the > >>>>>>>>>>>>>>>> cTAKES > >>>>>>>>>>>>>>>>>>> community develops or how a project should look like? > >>>>>>>>>>>>>>>>>>> I learned that different people set up UIMA project in > a > >>>>> quite > >>>>>>>>>>>>>>>> different > >>>>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some sort > of > >>>>> out-dated" > >>>>>>>>>>>>>>>>>>> approach in the cTAKES repo. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Are there restriction or preferences about the > >>> preprocessing > >>>>>>>>>>>>>>>> components > >>>>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the > >>> project. > >>>>>>>>>>>>>>>>>>> Components: On which components may the componetns > rely: > >>>>> tokenizer, > >>>>>>>>>>>>>>>> ... > >>>>>>>>>>>>>>>>>>> parser, ... dict lookup? > >>>>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a > >>> single > >>>>> AE? > >>>>>>>>>>>>>>>>>>> More comments below. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan: > >>>>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to > avoid > >>>>> duplicate > >>>>>>>>>>>>>>>> work > >>>>>>>>>>>>>>>>>>>>> and to coordnate the efforts ... > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to > RUTA. > >>>>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench > if > >>>>> you want, or > >>>>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration. > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>> If any questions arise, just ask :-) > >>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized for > >>> the > >>>>> initial > >>>>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to > contribute it > >>>>> too? > >>>>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly available; > >>> i2b2 > >>>>>>>>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php> > typically > >>>>> releases the > >>>>>>>>>>>>>>>> data > >>>>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is done > on > >>> an > >>>>>>>>>>>>>>>> individual basis > >>>>>>>>>>>>>>>>>>>> and involve a Data Use Agreement. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate the > >>>>> validation. > >>>>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to the > >>>>> dataset here. > >>>>>>>>>>>>>>>>>>>>> My first step would be: > >>>>>>>>>>>>>>>>>>>>> - set up a maven project > >>>>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with > cTAKES > >>>>> components > >>>>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing) > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd party > >>> libs > >>>>> jars that > >>>>>>>>>>>>>>>>>>>>> were included to ensure compatibility. I’ll be sure > to > >>>>> take a look > >>>>>>>>>>>>>>>> at > >>>>>>>>>>>>>>>>>>>>> that over the next few weeks. > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> —Pei > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is > should > >>>>> not be a > >>>>>>>>>>>>>>>> need to > >>>>>>>>>>>>>>>>>>>> worry about the 3rd party libs. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an > >>> independent > >>>>> component > >>>>>>>>>>>>>>>> for > >>>>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this method > >>>>> have shown > >>>>>>>>>>>>>>>> useful > >>>>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely useful > >>>>> independent > >>>>>>>>>>>>>>>> of the > >>>>>>>>>>>>>>>>>>>> deid component. > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Cheers, > >>>>>>>>>>>>>>>>>>>> Azad > >>>>>>>>>>>>>>>>>>>> > >