Just add a word break marker before and after: zz$v5 <- grepl( paste0( "\\b(", paste0( alarm.words, collapse="|" ), ")\\b" ), do.call( paste, zz[ , 2:3 ] ) ) ) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
On July 9, 2015 10:12:23 AM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote: >Jeff: > >Well, it would be much better (no loops!) except, I think, for one >issue: "red" would match "barred" and I don't think that this is what >is wanted: the matches should be on whole "words" not just string >patterns. > >So you would need to fix up the matching pattern to make this work, >but it may be a little tricky, as arbitrary whitespace characters, >e.g. " " or "\n" etc. could be in the strings to be matched separating >the words or ending the "sentence." I'm sure it can be done, but I'll >leave it to you or others to figure it out. > >Of course, if my diagnosis is wrong or silly, please point this out. > >Cheers, >Bert > > >Bert Gunter > >"Data is not information. Information is not knowledge. And knowledge >is certainly not wisdom." > -- Clifford Stoll > > >On Thu, Jul 9, 2015 at 9:34 AM, Jeff Newmiller ><jdnew...@dcn.davis.ca.us> wrote: >> I think grep is better suited to this: >> >> zz$v5 <- grepl( paste0( alarm.words, collapse="|" ), do.call( paste, >zz[ , 2:3 ] ) ) ) >> >--------------------------------------------------------------------------- >> Jeff Newmiller The ..... ..... Go >Live... >> DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live >Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >--------------------------------------------------------------------------- >> Sent from my phone. Please excuse my brevity. >> >> On July 9, 2015 8:51:10 AM PDT, Bert Gunter <bgunter.4...@gmail.com> >wrote: >>>Here's a way to do it that uses %in% (i.e. match() ) and uses only a >>>single, not a double, loop. It should be more efficient. >>> >>>> sapply(strsplit(do.call(paste,zz[,2:3]),"[[:space:]]+"), >>>+ function(x)any(x %in% alarm.words)) >>> >>> [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE >>> >>>The idea is to paste the strings in each row (do.call allows an >>>arbitrary number of columns) into a single string and then use >>>strsplit to break the string into individual "words" on whitespace. >>>Then the matching is vectorized with the any( %in% ... ) call. >>> >>>Cheers, >>>Bert >>>Bert Gunter >>> >>>"Data is not information. Information is not knowledge. And knowledge >>>is certainly not wisdom." >>> -- Clifford Stoll >>> >>> >>>On Thu, Jul 9, 2015 at 6:05 AM, John Fox <j...@mcmaster.ca> wrote: >>>> Dear Chris, >>>> >>>> If I understand correctly what you want, how about the following? >>>> >>>>> rows <- apply(zz[, 2:3], 1, function(x) any(sapply(alarm.words, >>>grepl, x=x))) >>>>> zz[rows, ] >>>> >>>> v1 v2 v3 v4 >>>> 3 -1.022329 green turtle ronald weasley 2 >>>> 6 0.336599 waffle the hamster red sparks 1 >>>> 9 -1.631874 yellow giraffe with a long neck gandalf the white 1 >>>> 10 1.130622 black bear gandalf the grey 2 >>>> >>>> I hope this helps, >>>> John >>>> >>>> ------------------------------------------------ >>>> John Fox, Professor >>>> McMaster University >>>> Hamilton, Ontario, Canada >>>> http://socserv.mcmaster.ca/jfox/ >>>> >>>> >>>> On Wed, 08 Jul 2015 22:23:37 -0400 >>>> "Christopher W. Ryan" <cr...@binghamton.edu> wrote: >>>>> Running R 3.1.1 on windows 7 >>>>> >>>>> I want to identify as a case any record in a dataframe that >contains >>>any >>>>> of several keywords in any of several variables. >>>>> >>>>> Example: >>>>> >>>>> # create a dataframe with 4 variables and 10 records >>>>> v2 <- c("white bird", "blue bird", "green turtle", "quick brown >>>fox", >>>>> "big black dog", "waffle the hamster", "benny likes food a lot", >>>"hello >>>>> world", "yellow giraffe with a long neck", "black bear") >>>>> v3 <- c("harry potter", "hermione grainger", "ronald weasley", >>>"ginny >>>>> weasley", "dudley dursley", "red sparks", "blue sparks", "white >>>dress >>>>> robes", "gandalf the white", "gandalf the grey") >>>>> zz <- data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, >lambda=2), >>>>> stringsAsFactors=FALSE) >>>>> str(zz) >>>>> zz >>>>> >>>>> # here are the keywords >>>>> alarm.words <- c("red", "green", "turtle", "gandalf") >>>>> >>>>> # For each row/record, I want to test whether the string in v2 or >>>the >>>>> string in v3 contains any of the strings in alarm.words. And then >if >>>so, >>>>> set zz$v5=TRUE for that record. >>>>> >>>>> # I'm thinking the str_detect function in the stringr package >ought >>>to >>>>> be able to help, perhaps with some use of apply over the rows, but >I >>>>> obviously misunderstand something about how str_detect works >>>>> >>>>> library(stringr) >>>>> >>>>> str_detect(zz[,2:3], alarm.words) # error: the target of the >>>search >>>>> # must be a vector, not >>>multiple >>>>> # columns >>>>> >>>>> str_detect(zz[1:4,2:3], alarm.words) # same error >>>>> >>>>> str_detect(zz[,2], alarm.words) # error, length of >alarm.words >>>>> # is less than the number of >>>>> # rows I am using for the >>>>> # comparison >>>>> >>>>> str_detect(zz[1:4,2], alarm.words) # works as hoped when >>>>> length(alarm.words) # confining nrows >>>>> # to the length of >alarm.words >>>>> >>>>> str_detect(zz, alarm.words) # obviously not right >>>>> >>>>> # maybe I need apply() ? >>>>> my.f <- function(x){str_detect(x, alarm.words)} >>>>> >>>>> apply(zz[,2], 1, my.f) # again, a mismatch in lengths >>>>> # between alarm.words and that >>>>> # in which I am searching for >>>>> # matching strings >>>>> >>>>> apply(zz, 2, my.f) # now I'm getting somewhere >>>>> apply(zz[1:4,], 2, my.f) # but still only works with 4 >>>>> # rows of the dataframe >>>>> >>>>> >>>>> # perhaps %in% could do the job? >>>>> >>>>> Appreciate any advice. >>>>> >>>>> --Chris Ryan >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>>______________________________________________ >>>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>https://stat.ethz.ch/mailman/listinfo/r-help >>>PLEASE do read the posting guide >>>http://www.R-project.org/posting-guide.html >>>and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.