On Mon, Nov 11, 2013 at 3:50 PM, Lopez, Dan <lopez...@llnl.gov> wrote: > Hi R Experts, > > How do I mark rows in dataframe based on a condition that's based off another > row in the same dataframe? > > I want to mark any combination of FY,ID, TT=='HC' rows that have a > FY,ID,TT=='TER' row with a 1. In my example below this is rows 4, 7 and 11. > My data looks something like this: > FY ID TT > 1 FY09 1 HC > 2 FY10 1 HC > 3 FY11 1 HC > 4 FY12 1 HC > 5 FY12 1 TER > 6 FY09 2 HC > 7 FY10 2 HC > 8 FY10 2 TER > 9 FY11 2 HC > 10 FY12 2 HC > 11 FY13 2 HC > 12 FY13 2 TER > > I know for this specific example I can use: > HTDF$EXCL3<-1*duplicated(HTDF[,1:2],fromLast=T) > > However my actual data set is NOT sorted by FY, ID and TT. TT is a binary > factor variable. I want to know if there is another way of doing the same > thing without sorting the data. > I tried the last line of code below but it gave me unexpected results. It > marks the first three rows with 0 and everything else with 1. Based on the > warning messages looks like it has something to do with longer object length > is not a multiple of shorter object length. But I am now stumped. > > #REPRODUCIBLE EXAMPLE > FY<-factor(c("FY09","FY10","FY11","FY12","FY12","FY09","FY10","FY10","FY11","FY12","FY13","FY13")) > ID<-c(rep(1,5),rep(2,7)) > TT<-factor(c(rep("HC",4),"TER","HC","HC","TER","HC","HC","HC","TER")) > HTDF<-data.frame(FY,ID,TT) > > #Summarize data and get max TT. TT is a binary factor variable > library(sqldf) > HTDF.MAX<-sqldf('SELECT ID,FY,Max(TT) "MAXTT" FROM HTDF GROUP BY ID,FY') > > # Initiate new variable and assign 0 or 1 > HTDF$EXCL<-0 > > # THIS IS WHERE I AM GETTING UNEXPECTE RESULTS > HTDF$EXCL<-ifelse(HTDF$FY==HTDF.MAX$FY&HTDF$ID==HTDF.MAX$ID&HTDF$TT==HTDF.MAX$MAXTT,0,1)
For each FY, ID group ave applies f to TT == 'TER' returning a logical vector that is TRUE for each HC if TER is in the group ad otherwise FALSE. Finally we add 0 to convert from TRUE/FALSE to 1/0. The rows of HTDF need not be in any specific order and their oreder will be preserved. > f <- function(x) any(x) & !x > transform(HTDF, EXCL = ave(TT == 'TER', FY, ID, FUN = f) + 0) FY ID TT EXCL 1 FY09 1 HC 0 2 FY10 1 HC 0 3 FY11 1 HC 0 4 FY12 1 HC 1 5 FY12 1 TER 0 6 FY09 2 HC 0 7 FY10 2 HC 1 8 FY10 2 TER 0 9 FY11 2 HC 0 10 FY12 2 HC 0 11 FY13 2 HC 1 12 FY13 2 TER 0 -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.