Hi Josh, Thanks for pointing this out. It hadn't occurred to me that someone might post something like this to indicate they would like to receive fewer or no messages.
Paul --- On Mon, 5/21/12, Joshua Wiley <jwiley.ps...@gmail.com> wrote: > From: Joshua Wiley <jwiley.ps...@gmail.com> > Subject: Re: [R] Complex text parsing task > To: "Paul Miller" <pjmiller...@yahoo.com> > Cc: "Nick Gayeski" <n...@wildfishconservancy.org>, r-help@r-project.org > Received: Monday, May 21, 2012, 11:01 AM > Hi Paul, > > I do not think that Nick's comment was really meant to be > directed at > you. He is probably just tired of getting so many > emails from R-help. > > Nick, to stop getting emails if you no longer want them, try > following > the link at the bottom of every single email you have > received from > R-help...you can unsubscribe yourself from there if you > want. If you > like R-help but just do not like the quantity of emails, you > could > consider switching your subscription to a daily digest so > you just get > one email. Alternately, you could create a special > folder in your > email for R-help messages, and create a filter that > automatically > sends all message from R-help to that special folder so you > still have > them all but they do not clutter up your inbox. > > Cheers, > > Josh > > On Mon, May 21, 2012 at 8:53 AM, Paul Miller <pjmiller...@yahoo.com> > wrote: > > Hi Nick, > > > > Can you elaborate (hopefully in a constructive way) on > what it is that you find objectionable about my post? > > > > Thanks, > > > > Paul > > > > --- On Mon, 5/21/12, Nick Gayeski <n...@wildfishconservancy.org> > wrote: > > > >> From: Nick Gayeski <n...@wildfishconservancy.org> > >> Subject: RE: [R] Complex text parsing task > >> To: "'Paul Miller'" <pjmiller...@yahoo.com>, > r-help@r-project.org > >> Received: Monday, May 21, 2012, 10:36 AM > >> Please stop sending these emails! > >> > >> > >> -----Original Message----- > >> From: r-help-boun...@r-project.org > >> [mailto:r-help-boun...@r-project.org] > >> On > >> Behalf Of Paul Miller > >> Sent: Monday, May 21, 2012 8:32 AM > >> To: r-help@r-project.org > >> Subject: [R] Complex text parsing task > >> > >> Hello Everyone, > >> > >> I have what I think is a complex text parsing task. > I've > >> provided some > >> sample data below. There's a relatively simple > version of > >> the coding that > >> needs to be done and a more complex version. If > someone > >> could help me out > >> with either version, I'd greatly appreciate it. > >> > >> Here are my sample data. > >> > >> haveData <- > >> structure(list(profile_key = structure(c(1L, 1L, > 2L, 2L, 2L, > >> 3L, 3L, 4L, 4L, > >> 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 > ", > >> "001-002 ", "001-003 ", "001-004 ", "001-005 ", > "001-006 ", > >> "001-007 " > >> ), class = "factor"), encounter_date = > structure(c(9L, 10L, > >> 11L, 12L, 13L, > >> 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label > = c(" > >> 2009-03-01 ", " > >> 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " > 2010-10-15 > >> ", " 2010-11-15 > >> ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ", > " > >> 2011-10-24 ", " > >> 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 " > >> ), class = "factor"), raw = structure(c(9L, 12L, > 16L, 13L, > >> 10L, 7L, 6L, 3L, > >> 2L, 4L, 14L, 15L, 1L, 5L, 8L, 11L), .Label = c(" > ... If > >> patient KRAS result > >> is wild type, they will start Erbitux. ... (Several > lines of > >> material) ... > >> Ordered KRAS mutation test 11/11/2011. Results are > still not > >> available. ... > >> ", " ... KRAS (mutated). Therefore did not > prescribe > >> Erbitux. ... ", " ... > >> KRAS (mutated). Will not prescribe Erbitux due to > mutation. > >> ... ", " ... > >> KRAS (Wild). ...", " ... KRAS results are in. > Patient has > >> the mutation. ... > >> ", " ... KRAS results still pending. Note that > patient was > >> negative for > >> Lynch mutation. ...", " ... KRAS test results > pending. Note > >> that patient was > >> negative for Lynch mutation. ...", " ... Ordered > KRAS > >> mutation testing on > >> 02/15/2011. Results came back negative. ... > (Several lines > >> of material) ... > >> Patient KRAS mutation test is negative. Will start > Erbitux. > >> ...", " ... > >> Ordered KRAS testing on 10/10/2010. Results not > yet > >> available. If patient > >> has a mutaton, will start Erbitux. ...", " ... > Ordered KRAS > >> testing. Waiting > >> for results. ...", " ... Patient is KRAS negative. > Started > >> Erbitux on > >> 03/01/2011. ...", " ... Received KRAS results on > 10/20/2010. > >> Test results > >> indicate tumor is wild type. Ua Protein positve. > ER/PR > >> positive. HER2/neu > >> positve. ...", " ... Still need to order KRAS > mutation > >> testing. ... ", " ... > >> Tumor is negative for KRAS mutation. ...", " ... > Tumor is > >> wild type. Patient > >> is eligible to receive Eribtux. ...", " ... Will > conduct > >> KRAS mutation > >> testing prior to initiation of therapy with > Erbitux. ..." > >> ), class = "factor")), .Names = c("profile_key", > >> "encounter_date", "raw"), > >> row.names = c(NA, -16L), class = "data.frame") > >> > >> The following code displays the results of > so-called > >> "simple" coding. > >> > >> #### Simple coding #### > >> > >> KRASpatient <- c("001-001", "001-002", > "001-003", > >> "001-004", "001-005", > >> "001-006", "001-007") KRAStested <- > >> c(2,3,2,2,2,3,3) KRASwild <- > >> c(1,0,2,0,3,1,3) KRASmutant <- c(4,2,2,3,1,2,2) > >> simpleData <- > >> data.frame(KRASpatient, KRAStested, KRASwild, > KRASmutant) > >> simpleData > >> > >> Here, KRAStested is calculated by summing all > references to > >> "KRAS" for each > >> patient. Wild is calculated by summing all > references to > >> "wild type", > >> "wild", and "negative" that come within 20 words of > the > >> closest reference to > >> KRAS. Mutant is calculated by summing all > references to > >> "mutant", "mutated", > >> and "positive" that occur within 20 words of the > closest > >> reference to KRAS. > >> > >> > >> The second kind of coding is what I'm referring to > as > >> "complex coding". The > >> following code displays the results of this type of > coding. > >> > >> #### Complex coding #### > >> > >> KRAStested <- c(2,1,0,2,2,2,3) > >> KRASwild <- c(1,0,0,0,3,0,3) > >> KRASmutant <- c(0,0,0,3,0,1,0) > >> complexData <- data.frame(KRASpatient, > KRAStested, > >> KRASwild, KRASmutant) > >> complexData > >> > >> The results of "complex coding" differ > substantially from > >> those obtained > >> under "simple coding" and I think illustrate the > potential > >> problems with > >> that approach. With "complex coding", the goal > would be to > >> identify and sum > >> only true references to KRAS testing and true > references to > >> the result of > >> that testing (either wild type/negative or > >> mutant/positive). > >> > >> True references to KRAS testing would be identified > using a > >> set of > >> qualifiers that eliminate the false references. So, > for > >> example, one of the > >> patients in my (made up) sample data has the phrase > "Will > >> conduct KRAS > >> mutation testing prior to initiation of therapy > with > >> Erbitux" in their > >> medical record. In this case, "Will" is a qualifier > that > >> indicates this is > >> not a true reference to KRAS testing. For this > exercise, > >> other qualifiers > >> related to KRAS testing would include "need", > "order" (but > >> not the past > >> tense "ordered"), "wait", "waiting", "await", and > >> "awaiting". > >> To be a qualifier, these terms would need to occur > within 12 > >> words of the > >> closest true reference to KRAS. > >> > >> True references to the results of testing would > also be > >> identified using a > >> set of qualifiers that eliminate false references. > Here the > >> list of > >> qualifiers would include "if", "lynch", "kras > mutation > >> test", "kras mutation > >> testing" and "for kras mutation". Qualifiers would > need to > >> come within 12 > >> words of a true reference to KRAS testing. > >> > >> There's an additional wrinkle for identifying true > >> references to the results > >> of testing. One also needs to take into account the > presence > >> of what I'm > >> calling "nullifiers". For purposes of this > exercise, > >> nullfiers include "Ua > >> Protein", "ER/PR", and "HER2/neu" If "positive" or > >> "negative" come closer to > >> one of these words than to a true reference to > KRAS, then > >> they should not be > >> used to identify the results of KRAS testing. > >> > >> Help with either type of coding would be greatly > >> appreciated. > >> > >> Thanks, > >> > >> Paul > >> > >> ______________________________________________ > >> R-help@r-project.org > >> mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, > reproducible > >> code. > >> > >> > >> > >> > > > > ______________________________________________ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > -- > Joshua Wiley > Ph.D. Student, Health Psychology > Programmer Analyst II, Statistical Consulting Group > University of California, Los Angeles > https://joshuawiley.com/ > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.