While Eric's solution is correct( mod "corner" cases like all NA's in a row), it can be made considerably more efficient.
One minor improvement can be made by using the idiom any(x == "A") instead of matching via %in% for the simple case of matching just a single value. However, a considerable improvement can be made by getting fancy, taking advantage of do.call() and the pmax() function to mostly vectorize the calculation. Here are the details and timing on a large data frame. (Note: I removed the names in the %in% approach for simplicity. It has almost no effect on timings. I also moved the as.integer() call out of the function so that it is called only once at the end, which improves efficiency a bit) 1. Eric's original: fun1 <-function(df,what) { as.integer(unname(apply(df,MARGIN = 1,function(v) { what %in% v }))) } 2. Using any( x == "A") instead: fun2 <- function(df,what) { as.integer(unname(apply(df,MARGIN =1, function(x)any(x == what, na.rm=TRUE)))) } 3. Getting fancy to use pmax() fun3 <- function(df,what) { z <- lapply(df,function(x)as.integer((x==what))) do.call(pmax,c(z,na.rm=TRUE)) } Here are the timings: > bigdf <- df[rep(1:10,1e4), rep(1:5, 50)] > dim(bigdf) [1] 100000 250 > system.time(res1 <- fun1(bigdf, "A")) user system elapsed 2.204 0.432 2.637 > > system.time(res2 <- fun2(bigdf, "A")) user system elapsed 1.898 0.403 2.302 > > system.time(res3 <- fun3(bigdf, "A")) user system elapsed 0.187 0.048 0.235 ## 10 times faster! > > all.equal(res1,res2) [1] TRUE > all.equal(res1,res3) [1] TRUE NB: I freely admit that Eric's original solution may well be perfectly adequate, and the speed improvement is pointless. In that case, maybe this is at least somewhat instructive for someone. Nevertheless, I would welcome further suggestions for improvement, as I suspect my "fancy" approach is still a ways from what one can do (in R code, without resorting to C++). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jul 29, 2019 at 12:38 PM Eric Berger <ericjber...@gmail.com> wrote: > > Read the help for apply and %in% > > ?apply > ?%in% > > > Sent from my iPhone > > > On 29 Jul 2019, at 22:23, Ana Marija <sokovic.anamar...@gmail.com> wrote: > > > > Thank you so much! Just to confirm here MARGIN=1 indicates that "A" should > > appear at least once per row? > > > >> On Mon, Jul 29, 2019 at 1:53 PM Eric Berger <ericjber...@gmail.com> wrote: > >> df$case <- apply(df,MARGIN = 1,function(v) { as.integer("A" %in% v) }) > >> > >> > >>> On Mon, Jul 29, 2019 at 9:02 PM Ana Marija <sokovic.anamar...@gmail.com> > >>> wrote: > >>> sorry my bad, here is the edited version: > >>> > >>> so the data frame is this: > >>> > >>> df=data.frame( > >>> > >>> eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"), > >>> eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"), > >>> > >>> eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"), > >>> > >>> eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"), > >>> eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B") > >>> > >>> and I would need to put inside the column which would be named "case" and > >>> values inside would be: 1,1,0,1,1,1,0,0,1,1 > >>> > >>> so "case" column is where value "A" can be found in any column. > >>> > >>>> On Mon, Jul 29, 2019 at 12:53 PM Eric Berger <ericjber...@gmail.com> > >>>> wrote: > >>>> You may have a typo/misstatement in your question. > >>>> You define a data frame with 5 columns, each of which has 10 elements, > >>>> so your data frame has dimensions 10 x 5. > >>>> Then you request a new COLUMN which will have only 5 elements, which is > >>>> not allowed. All columns of a data frame > >>>> must have the same length. > >>>> > >>>>> On Mon, Jul 29, 2019 at 8:42 PM Ana Marija > >>>>> <sokovic.anamar...@gmail.com> wrote: > >>>>> I have data frame which looks like this: > >>>>> > >>>>> df=data.frame( > >>>>> eye_problemsdisorders_f6148_0_1=c(A,C,D,NA,D,A,C,NA,B,A), > >>>>> eye_problemsdisorders_f6148_0_2=c(B,C,NA,A,C,B,NA,NA,A,D), > >>>>> eye_problemsdisorders_f6148_0_3=c(C,A,D,D,B,A,NA,NA,A,B), > >>>>> eye_problemsdisorders_f6148_0_4=c(D,D,NA,B,A,C,NA,C,A,B), > >>>>> eye_problemsdisorders_f6148_0_5=c(C,C,NA,D,B,C,NA,D,D,B)) > >>>>> > >>>>> In reality I have much more columns and they don't always match > >>>>> "eye_problemsdisorders_f6148" this string, and there is much more rows. > >>>>> > >>>>> What I would like to do is create a new column, say named "case" where I > >>>>> would have value "1" for every row where string "A" appears at least > >>>>> once > >>>>> in any column, if not the value would be "0". So in the above example > >>>>> column "case" would have these values: 1,1,1,1,0 > >>>>> Thanks > >>>>> Ana > >>>>> > >>>>> [[alternative HTML version deleted]] > >>>>> > >>>>> ______________________________________________ > >>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>> PLEASE do read the posting guide > >>>>> http://www.R-project.org/posting-guide.html > >>>>> and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.