I think grep is better suited to this: zz$v5 <- grepl( paste0( alarm.words, collapse="|" ), do.call( paste, zz[ , 2:3 ] ) ) ) --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity.
On July 9, 2015 8:51:10 AM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote: >Here's a way to do it that uses %in% (i.e. match() ) and uses only a >single, not a double, loop. It should be more efficient. > >> sapply(strsplit(do.call(paste,zz[,2:3]),"[[:space:]]+"), >+ function(x)any(x %in% alarm.words)) > > [1] FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE > >The idea is to paste the strings in each row (do.call allows an >arbitrary number of columns) into a single string and then use >strsplit to break the string into individual "words" on whitespace. >Then the matching is vectorized with the any( %in% ... ) call. > >Cheers, >Bert >Bert Gunter > >"Data is not information. Information is not knowledge. And knowledge >is certainly not wisdom." > -- Clifford Stoll > > >On Thu, Jul 9, 2015 at 6:05 AM, John Fox <j...@mcmaster.ca> wrote: >> Dear Chris, >> >> If I understand correctly what you want, how about the following? >> >>> rows <- apply(zz[, 2:3], 1, function(x) any(sapply(alarm.words, >grepl, x=x))) >>> zz[rows, ] >> >> v1 v2 v3 v4 >> 3 -1.022329 green turtle ronald weasley 2 >> 6 0.336599 waffle the hamster red sparks 1 >> 9 -1.631874 yellow giraffe with a long neck gandalf the white 1 >> 10 1.130622 black bear gandalf the grey 2 >> >> I hope this helps, >> John >> >> ------------------------------------------------ >> John Fox, Professor >> McMaster University >> Hamilton, Ontario, Canada >> http://socserv.mcmaster.ca/jfox/ >> >> >> On Wed, 08 Jul 2015 22:23:37 -0400 >> "Christopher W. Ryan" <cr...@binghamton.edu> wrote: >>> Running R 3.1.1 on windows 7 >>> >>> I want to identify as a case any record in a dataframe that contains >any >>> of several keywords in any of several variables. >>> >>> Example: >>> >>> # create a dataframe with 4 variables and 10 records >>> v2 <- c("white bird", "blue bird", "green turtle", "quick brown >fox", >>> "big black dog", "waffle the hamster", "benny likes food a lot", >"hello >>> world", "yellow giraffe with a long neck", "black bear") >>> v3 <- c("harry potter", "hermione grainger", "ronald weasley", >"ginny >>> weasley", "dudley dursley", "red sparks", "blue sparks", "white >dress >>> robes", "gandalf the white", "gandalf the grey") >>> zz <- data.frame(v1=rnorm(10), v2=v2, v3=v3, v4=rpois(10, lambda=2), >>> stringsAsFactors=FALSE) >>> str(zz) >>> zz >>> >>> # here are the keywords >>> alarm.words <- c("red", "green", "turtle", "gandalf") >>> >>> # For each row/record, I want to test whether the string in v2 or >the >>> string in v3 contains any of the strings in alarm.words. And then if >so, >>> set zz$v5=TRUE for that record. >>> >>> # I'm thinking the str_detect function in the stringr package ought >to >>> be able to help, perhaps with some use of apply over the rows, but I >>> obviously misunderstand something about how str_detect works >>> >>> library(stringr) >>> >>> str_detect(zz[,2:3], alarm.words) # error: the target of the >search >>> # must be a vector, not >multiple >>> # columns >>> >>> str_detect(zz[1:4,2:3], alarm.words) # same error >>> >>> str_detect(zz[,2], alarm.words) # error, length of alarm.words >>> # is less than the number of >>> # rows I am using for the >>> # comparison >>> >>> str_detect(zz[1:4,2], alarm.words) # works as hoped when >>> length(alarm.words) # confining nrows >>> # to the length of alarm.words >>> >>> str_detect(zz, alarm.words) # obviously not right >>> >>> # maybe I need apply() ? >>> my.f <- function(x){str_detect(x, alarm.words)} >>> >>> apply(zz[,2], 1, my.f) # again, a mismatch in lengths >>> # between alarm.words and that >>> # in which I am searching for >>> # matching strings >>> >>> apply(zz, 2, my.f) # now I'm getting somewhere >>> apply(zz[1:4,], 2, my.f) # but still only works with 4 >>> # rows of the dataframe >>> >>> >>> # perhaps %in% could do the job? >>> >>> Appreciate any advice. >>> >>> --Chris Ryan >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.