Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Azad Dehghan Sun, 17 Jul 2016 02:42:07 -0700

Hi Peter,

I will pick this up soon after the summer I hope.



Cheers,
Azad



2016-06-07 8:57 GMT+01:00 Peter Klügl <peter.klu...@averbis.com>:

> Hi Azad,
>
>
> the basic rules are now translated. Do you wanna take a look at it?
>
>
> There remain still many issues and the F score is quite low on the dev
> set. I will continue improving the rules when I find the time.
>
>
> Best,
>
> Peter
>
>
> Am 15.03.2016 um 09:49 schrieb Peter Klügl:
> > Hi,
> >
> > this is essentially just a decision of design. For a single longitudinal
> > record, there is no problem at all. We can solve this even with some
> > simple ruta rules, or with some cutom analysis engine. If we want to
> > process a set of record of the same patient jointly, then we cannot
> > apply a single pipeline. I propose to postpone the decison and implement
> > it only for single documents for now.
> >
> > Best,
> >
> > Peter
> >
> >
> > Am 11.03.2016 um 20:03 schrieb Azad Dehghan:
> >>> I had a quick look on PassTwo. This is not directly translatable into
> >>> UIMA if the functionlity is based on analysis engines. Normally,
> >>> analysis engines process one document at a time in a pipeline. My first
> >>> quick guess is the we either need two pipelines (result is a program
> not
> >>> a component) or we need a different definition of a CAS (joining all
> >>> documents of a patient). Overall, it depends on the targeted use case
> of
> >>> the project. Should it be usable in a cTAKES/uimaFIT pipeline?
> >>>
> >> The two pass method will have a broader applicability for NER on
> >> longitudinal records...
> >>
> >>
> >>> btw, the CRF models are not part of the contribution, right?
> >>>
> >>>
> >> The CRF  (UK,US) models will be released but this will be together with
> a
> >> more mature software planned for August 2016.
> >>
> >> Best,
> >>> Peter
> >>>
> >>> Am 10.03.2016 um 20:29 schrieb Azad Dehghan:
> >>>> Thanks Peter,
> >>>>
> >>>> The rules were modeled using the training data.
> >>>>
> >>>> It would be good to incorporate/refactor (basically, GATE API needs
> to be
> >>>> replaced with UIMA API to generate annotation) the two-pass
> recognition
> >>>> method for cTAKES - which has a wider application on longitudinal
> data.
> >>>> This method is used on-top of a number NERs.
> >>>>
> >>>> Please let me know where I can help. I will be available again in
> April.
> >>>>
> >>>> Cheers,
> >>>> Azad
> >>>>
> >>>> On 10 March 2016 at 13:13, Peter Klügl <peter.klu...@averbis.com>
> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> sorry, I was quite busy last month.
> >>>>>
> >>>>> I added a new patch, which needs to be applied.
> >>>>>
> >>>>> No new rules, but it's possible now to evaluate everything against
> the
> >>>>> labelled data of the challenge.
> >>>>>
> >>>>> @Azad:
> >>>>> Which documents exactly did you use to develop the rules?
> >>>>> training-PHI-Gold-Set1, training-PHI-Gold-Set2 or
> >>> testing-PHI-Gold-fixed?
> >>>>> Best,
> >>>>>
> >>>>> Peter
> >>>>>
> >>>>> Am 03.02.2016 um 09:05 schrieb Peter Klügl:
> >>>>>> Hi,
> >>>>>>
> >>>>>> the last patch fixed almost all problems.
> >>>>>>
> >>>>>> I added another one that adds the csv file for the unit test and
> >>> extends
> >>>>>> svn-ignore.
> >>>>>>
> >>>>>> Best,
> >>>>>>
> >>>>>> Peter
> >>>>>>
> >>>>>> Am 02.02.2016 um 09:16 schrieb Peter Klügl:
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> I added another patch. I missed to manually add one test file to
> >>> version
> >>>>>>> control, and there are still duplicate lines.
> >>>>>>> I hope this patch fixes the remaining problems.
> >>>>>>>
> >>>>>>> Best,
> >>>>>>>
> >>>>>>> Peter
> >>>>>>>
> >>>>>>>
> >>>>>>> Am 29.01.2016 um 10:34 schrieb Peter Klügl:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> the problems were caused by the svn client in my Eclipse. Sorry
> for
> >>> the
> >>>>>>>> trouble, I should have looked more closely at the ciomplete patch.
> >>>>>>>>
> >>>>>>>> I attached a new patch created with commandline tools wich looks
> >>>>> correct
> >>>>>>>> now.
> >>>>>>>>
> >>>>>>>> Pei, can you apply the new patch?
> >>>>>>>>
> >>>>>>>> Best,
> >>>>>>>>
> >>>>>>>> Peter
> >>>>>>>>
> >>>>>>>> Am 28.01.2016 um 15:57 schrieb Peter Klügl:
> >>>>>>>>> Thanks Pei.
> >>>>>>>>>
> >>>>>>>>> I fear there was again a problem with the patch. All new files
> are
> >>>>>>>>> missing (and also the svn-ignore settings).
> >>>>>>>>>
> >>>>>>>>> Can you take a look?
> >>>>>>>>>
> >>>>>>>>> Best,
> >>>>>>>>>
> >>>>>>>>> Peter
> >>>>>>>>>
> >>>>>>>>> Am 28.01.2016 um 14:43 schrieb Pei Chen:
> >>>>>>>>>> patch applied.
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Pei
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jan 28, 2016 at 4:14 AM, Peter Klügl <
> >>>>> peter.klu...@averbis.com> wrote:
> >>>>>>>>>>> Hi Pei,
> >>>>>>>>>>>
> >>>>>>>>>>> can you commit the recent patch for us?
> >>>>>>>>>>>
> >>>>>>>>>>> CTAKES-384-20160120.patch
> >>>>>>>>>>>
> >>>>>>>>>>> Best,
> >>>>>>>>>>>
> >>>>>>>>>>> Peter
> >>>>>>>>>>>
> >>>>>>>>>>> Am 20.01.2016 um 19:35 schrieb Pei Chen:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>> Sorry I was swamped recently.
> >>>>>>>>>>>> But yeah, we can even create an extended type system to store
> >>>>> these items temporarily and add them into the main/core type system
> >>>>> afterwards.
> >>>>>>>>>>>> There was an existing item to upgrade UIMA, but agreed- it
> will
> >>>>> require much more testing.  If it works, we can upgrade it in our
> >>> sandbox
> >>>>> area or create a branch if necessary.
> >>>>>>>>>>>> —Pei
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On Jan 18, 2016, at 9:06 AM, Peter Klügl <
> >>>>> peter.klu...@averbis.com> wrote:
> >>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> a new patch is attached.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> @Pei:
> >>>>>>>>>>>>> are there suitable annotation types in the cTAKES type
> system?
> >>>>> Some
> >>>>>>>>>>>>> project in cTAKES uses something like OntologyMatch... I map
> it
> >>> to
> >>>>>>>>>>>>> IdentifiedAnnotation right now, but there are many empty
> >>>>> features...
> >>>>>>>>>>>>> @Azad:
> >>>>>>>>>>>>> I changed the rules a bit, especially the capitalization
> like I
> >>>>> use it
> >>>>>>>>>>>>> in ruta normally. The wordlist are compiled to a trie by the
> >>> maven
> >>>>>>>>>>>>> plugin. I also added the two regexes for url and email. I
> >>>>> extended the
> >>>>>>>>>>>>> regex for the url. I also changed the evaluation order of
> some
> >>>>> rules
> >>>>>>>>>>>>> (with @). Feel free to add simple examples to examples.csv
> for
> >>>>> the unit
> >>>>>>>>>>>>> tests.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Let me know if you need more information about the changes.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Do you wanna have help with the other rule sets? Or should we
> >>>>> split them up?
> >>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Am 18.01.2016 um 11:04 schrieb Peter Klügl:
> >>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> great. I will integrate them in the project and in the next
> >>>>> patch.
> >>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Am 18.01.2016 um 00:58 schrieb Azad Dehghan:
> >>>>>>>>>>>>>>> Three NERs translated and uploaded.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> PS. I will validate all NERs once we have them all
> completed.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On 24 November 2015 at 10:37, Azad Dehghan <
> >>>>> azad.dehg...@gmail.com> wrote:
> >>>>>>>>>>>>>>>> This is on my todo list for Dec. as well. If there are any
> >>>>> more volunteers
> >>>>>>>>>>>>>>>> for translating JAPE to RUTA, please get in touch.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> On 24 Nov 2015 09:55, "Peter Klügl" <
> >>> peter.klu...@averbis.com>
> >>>>> wrote:
> >>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> I just wanted to mention that I haven't forgot about it.
> >>>>> Unfortunately,
> >>>>>>>>>>>>>>>>> there is just no spare time right now. I hope I will be
> able
> >>>>> to provide
> >>>>>>>>>>>>>>>>> the patches in December.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Best,
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Peter
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Am 06.11.2015 um 16:40 schrieb Pei Chen:
> >>>>>>>>>>>>>>>>>> Hi Peter,
> >>>>>>>>>>>>>>>>>> I think the ctakes-examples is probably a good starting
> >>>>> point at least
> >>>>>>>>>>>>>>>>>> in terms of maven modules, etc.  I think it would be
> good
> >>> if
> >>>>> we use
> >>>>>>>>>>>>>>>>>> uimaFIT style as primary approach to wiring components
> >>>>> together and
> >>>>>>>>>>>>>>>>>> generate desc's as secondary...
> >>>>>>>>>>>>>>>>>> I think the actual components that would be required is
> >>>>> probably best
> >>>>>>>>>>>>>>>>>> left up to what is actually required for best performing
> >>>>> c-deid.  The
> >>>>>>>>>>>>>>>>>> output would be interesting, I'm not sure if we should
> >>> treat
> >>>>> this as
> >>>>>>>>>>>>>>>>>> an independent preprocessing component or part of a
> >>> pipeline
> >>>>> (in which
> >>>>>>>>>>>>>>>>>> case, we may need to propose a change to the type
> system or
> >>>>> perhaps an
> >>>>>>>>>>>>>>>>>> alternative JCas view.  You can probably open up that
> >>>>> discussion to
> >>>>>>>>>>>>>>>>>> the dev group as you see fit.)
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> My 2 cents...
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> On Fri, Nov 6, 2015 at 3:38 AM, Peter Klügl <
> >>>>> peter.klu...@averbis.com>
> >>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>> Hi,
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Is there a cTAKES project that may serve as an example
> on
> >>>>> how the
> >>>>>>>>>>>>>>>> cTAKES
> >>>>>>>>>>>>>>>>>>> community develops or how a project should look like?
> >>>>>>>>>>>>>>>>>>> I learned that different people set up UIMA project in
> a
> >>>>> quite
> >>>>>>>>>>>>>>>> different
> >>>>>>>>>>>>>>>>>>> manner and I do not what to get inspired by "some sort
> of
> >>>>> out-dated"
> >>>>>>>>>>>>>>>>>>> approach in the cTAKES repo.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Are there restriction or preferences about the
> >>> preprocessing
> >>>>>>>>>>>>>>>> components
> >>>>>>>>>>>>>>>>>>> that should be used and the kind of "output" of the
> >>> project.
> >>>>>>>>>>>>>>>>>>> Components: On which components may the componetns
> rely:
> >>>>> tokenizer,
> >>>>>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>>>>> parser, ... dict lookup?
> >>>>>>>>>>>>>>>>>>> "output": Should the project provide a pipeline or a
> >>> single
> >>>>> AE?
> >>>>>>>>>>>>>>>>>>> More comments below.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Am 03.11.2015 um 16:54 schrieb Azad Dehghan:
> >>>>>>>>>>>>>>>>>>>>> Who else plans to provide patches for it? Just to
> avoid
> >>>>> duplicate
> >>>>>>>>>>>>>>>> work
> >>>>>>>>>>>>>>>>>>>>> and to coordnate the efforts ...
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> I would like to help with the translating JAPE to
> RUTA.
> >>>>>>>>>>>>>>>>>>> You can already go ahead with the UIMA Ruta Workbench
> if
> >>>>> you want, or
> >>>>>>>>>>>>>>>>>>> wait until I set up the project with ruta integration.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> If any questions arise, just ask :-)
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Is there a development dataset which was utilized for
> >>> the
> >>>>> initial
> >>>>>>>>>>>>>>>>>>>>> development, and if yes, is it possible to
> contribute it
> >>>>> too?
> >>>>>>>>>>>>>>>>>>>> The data set is unfortunately not publicly available;
> >>> i2b2
> >>>>>>>>>>>>>>>>>>>> <https://www.i2b2.org/NLP/DataSets/Main.php>
> typically
> >>>>> releases the
> >>>>>>>>>>>>>>>> data
> >>>>>>>>>>>>>>>>>>>> sets 12 months after a given challenge; this is done
> on
> >>> an
> >>>>>>>>>>>>>>>> individual basis
> >>>>>>>>>>>>>>>>>>>> and involve a Data Use Agreement.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> However, I will be able to conduct and coordinate the
> >>>>> validation.
> >>>>>>>>>>>>>>>>>>> Ok, I'll investigate if we have already access to the
> >>>>> dataset here.
> >>>>>>>>>>>>>>>>>>>>> My first step would be:
> >>>>>>>>>>>>>>>>>>>>> - set up a maven project
> >>>>>>>>>>>>>>>>>>>>> - set up a development pipeline in a test (with
> cTAKES
> >>>>> components
> >>>>>>>>>>>>>>>>>>>>> replacing the previous ANNIE preprocessing)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> But one item that we need to review is the 3rd party
> >>> libs
> >>>>> jars that
> >>>>>>>>>>>>>>>>>>>>> were included to ensure compatibility.  I’ll be sure
> to
> >>>>> take a look
> >>>>>>>>>>>>>>>> at
> >>>>>>>>>>>>>>>>>>>>> that over the next few weeks.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> —Pei
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> @Pei - once ANNIE components are replaced there is
> should
> >>>>> not be a
> >>>>>>>>>>>>>>>> need to
> >>>>>>>>>>>>>>>>>>>> worry about the 3rd party libs.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Also, just a thought: we may want to create an
> >>> independent
> >>>>> component
> >>>>>>>>>>>>>>>> for
> >>>>>>>>>>>>>>>>>>>> the Two Pass recognition (TwoPass.java) as this method
> >>>>> have shown
> >>>>>>>>>>>>>>>> useful
> >>>>>>>>>>>>>>>>>>>> for general NER on longitudinal data and surely useful
> >>>>> independent
> >>>>>>>>>>>>>>>> of the
> >>>>>>>>>>>>>>>>>>>> deid component.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Cheers,
> >>>>>>>>>>>>>>>>>>>> Azad
> >>>>>>>>>>>>>>>>>>>>
>
>

Re: Combining Knowledge- and Data-driven Methods for De-identification of Clinical Narratives

Reply via email to