Hi Paul, I do not think that Nick's comment was really meant to be directed at you. He is probably just tired of getting so many emails from R-help.
Nick, to stop getting emails if you no longer want them, try following the link at the bottom of every single email you have received from R-help...you can unsubscribe yourself from there if you want. If you like R-help but just do not like the quantity of emails, you could consider switching your subscription to a daily digest so you just get one email. Alternately, you could create a special folder in your email for R-help messages, and create a filter that automatically sends all message from R-help to that special folder so you still have them all but they do not clutter up your inbox. Cheers, Josh On Mon, May 21, 2012 at 8:53 AM, Paul Miller <pjmiller...@yahoo.com> wrote: > Hi Nick, > > Can you elaborate (hopefully in a constructive way) on what it is that you > find objectionable about my post? > > Thanks, > > Paul > > --- On Mon, 5/21/12, Nick Gayeski <n...@wildfishconservancy.org> wrote: > >> From: Nick Gayeski <n...@wildfishconservancy.org> >> Subject: RE: [R] Complex text parsing task >> To: "'Paul Miller'" <pjmiller...@yahoo.com>, r-help@r-project.org >> Received: Monday, May 21, 2012, 10:36 AM >> Please stop sending these emails! >> >> >> -----Original Message----- >> From: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] >> On >> Behalf Of Paul Miller >> Sent: Monday, May 21, 2012 8:32 AM >> To: r-help@r-project.org >> Subject: [R] Complex text parsing task >> >> Hello Everyone, >> >> I have what I think is a complex text parsing task. I've >> provided some >> sample data below. There's a relatively simple version of >> the coding that >> needs to be done and a more complex version. If someone >> could help me out >> with either version, I'd greatly appreciate it. >> >> Here are my sample data. >> >> haveData <- >> structure(list(profile_key = structure(c(1L, 1L, 2L, 2L, 2L, >> 3L, 3L, 4L, 4L, >> 5L, 5L, 5L, 6L, 6L, 7L, 7L), .Label = c("001-001 ", >> "001-002 ", "001-003 ", "001-004 ", "001-005 ", "001-006 ", >> "001-007 " >> ), class = "factor"), encounter_date = structure(c(9L, 10L, >> 11L, 12L, 13L, >> 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 4L, 7L, 7L), .Label = c(" >> 2009-03-01 ", " >> 2009-03-22 ", " 2009-04-01 ", " 2010-03-01 ", " 2010-10-15 >> ", " 2010-11-15 >> ", " 2011-03-01 ", " 2011-03-14 ", " 2011-10-10 ", " >> 2011-10-24 ", " >> 2012-09-15 ", " 2012-10-05 ", " 2012-10-17 " >> ), class = "factor"), raw = structure(c(9L, 12L, 16L, 13L, >> 10L, 7L, 6L, 3L, >> 2L, 4L, 14L, 15L, 1L, 5L, 8L, 11L), .Label = c(" ... If >> patient KRAS result >> is wild type, they will start Erbitux. ... (Several lines of >> material) ... >> Ordered KRAS mutation test 11/11/2011. Results are still not >> available. ... >> ", " ... KRAS (mutated). Therefore did not prescribe >> Erbitux. ... ", " ... >> KRAS (mutated). Will not prescribe Erbitux due to mutation. >> ... ", " ... >> KRAS (Wild). ...", " ... KRAS results are in. Patient has >> the mutation. ... >> ", " ... KRAS results still pending. Note that patient was >> negative for >> Lynch mutation. ...", " ... KRAS test results pending. Note >> that patient was >> negative for Lynch mutation. ...", " ... Ordered KRAS >> mutation testing on >> 02/15/2011. Results came back negative. ... (Several lines >> of material) ... >> Patient KRAS mutation test is negative. Will start Erbitux. >> ...", " ... >> Ordered KRAS testing on 10/10/2010. Results not yet >> available. If patient >> has a mutaton, will start Erbitux. ...", " ... Ordered KRAS >> testing. Waiting >> for results. ...", " ... Patient is KRAS negative. Started >> Erbitux on >> 03/01/2011. ...", " ... Received KRAS results on 10/20/2010. >> Test results >> indicate tumor is wild type. Ua Protein positve. ER/PR >> positive. HER2/neu >> positve. ...", " ... Still need to order KRAS mutation >> testing. ... ", " ... >> Tumor is negative for KRAS mutation. ...", " ... Tumor is >> wild type. Patient >> is eligible to receive Eribtux. ...", " ... Will conduct >> KRAS mutation >> testing prior to initiation of therapy with Erbitux. ..." >> ), class = "factor")), .Names = c("profile_key", >> "encounter_date", "raw"), >> row.names = c(NA, -16L), class = "data.frame") >> >> The following code displays the results of so-called >> "simple" coding. >> >> #### Simple coding #### >> >> KRASpatient <- c("001-001", "001-002", "001-003", >> "001-004", "001-005", >> "001-006", "001-007") KRAStested <- >> c(2,3,2,2,2,3,3) KRASwild <- >> c(1,0,2,0,3,1,3) KRASmutant <- c(4,2,2,3,1,2,2) >> simpleData <- >> data.frame(KRASpatient, KRAStested, KRASwild, KRASmutant) >> simpleData >> >> Here, KRAStested is calculated by summing all references to >> "KRAS" for each >> patient. Wild is calculated by summing all references to >> "wild type", >> "wild", and "negative" that come within 20 words of the >> closest reference to >> KRAS. Mutant is calculated by summing all references to >> "mutant", "mutated", >> and "positive" that occur within 20 words of the closest >> reference to KRAS. >> >> >> The second kind of coding is what I'm referring to as >> "complex coding". The >> following code displays the results of this type of coding. >> >> #### Complex coding #### >> >> KRAStested <- c(2,1,0,2,2,2,3) >> KRASwild <- c(1,0,0,0,3,0,3) >> KRASmutant <- c(0,0,0,3,0,1,0) >> complexData <- data.frame(KRASpatient, KRAStested, >> KRASwild, KRASmutant) >> complexData >> >> The results of "complex coding" differ substantially from >> those obtained >> under "simple coding" and I think illustrate the potential >> problems with >> that approach. With "complex coding", the goal would be to >> identify and sum >> only true references to KRAS testing and true references to >> the result of >> that testing (either wild type/negative or >> mutant/positive). >> >> True references to KRAS testing would be identified using a >> set of >> qualifiers that eliminate the false references. So, for >> example, one of the >> patients in my (made up) sample data has the phrase "Will >> conduct KRAS >> mutation testing prior to initiation of therapy with >> Erbitux" in their >> medical record. In this case, "Will" is a qualifier that >> indicates this is >> not a true reference to KRAS testing. For this exercise, >> other qualifiers >> related to KRAS testing would include "need", "order" (but >> not the past >> tense "ordered"), "wait", "waiting", "await", and >> "awaiting". >> To be a qualifier, these terms would need to occur within 12 >> words of the >> closest true reference to KRAS. >> >> True references to the results of testing would also be >> identified using a >> set of qualifiers that eliminate false references. Here the >> list of >> qualifiers would include "if", "lynch", "kras mutation >> test", "kras mutation >> testing" and "for kras mutation". Qualifiers would need to >> come within 12 >> words of a true reference to KRAS testing. >> >> There's an additional wrinkle for identifying true >> references to the results >> of testing. One also needs to take into account the presence >> of what I'm >> calling "nullifiers". For purposes of this exercise, >> nullfiers include "Ua >> Protein", "ER/PR", and "HER2/neu" If "positive" or >> "negative" come closer to >> one of these words than to a true reference to KRAS, then >> they should not be >> used to identify the results of KRAS testing. >> >> Help with either type of coding would be greatly >> appreciated. >> >> Thanks, >> >> Paul >> >> ______________________________________________ >> R-help@r-project.org >> mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible >> code. >> >> >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.