Try this: df[!duplicated(df[, 1:3]), ]
Jean Dgnn wrote on 12/07/2011 08:24:01 PM: > Hello. I am trying to remove from my dataframe, those rows in which the first > 7 columns are duplicated even if subsequent columns make those rows unique. > > df<-data.frame(id=rep(c('amy','bob','joe') , each=5), > pet1=sample(LETTERS[1:3],15, replace=T), > pet2=sample(LETTERS[1:3],15, replace=T), > pet3=sample(LETTERS[1:5],15, replace=T)) > > >df > > id pet1 pet2 pet3 > 1 amy C B A > 2 amy B A A > 3 amy A A D > 4 amy B C A > 5 amy C B B > 6 bob B A A > 7 bob C A C > 8 bob C C A > 9 bob B C E > 10 bob C B C > 11 joe C B A > 12 joe A B E > 13 joe C C B > 14 joe C A D > 15 joe A C C > > I am trying to identify and remove the rows of df that are duplicates in > df[,1:3]. > > culled.df<-unique(x[,1:3]) > >culled.df > id pet1 pet2 > 1 amy A A > 2 amy C C > 3 amy C A > 5 amy A B > 6 bob A B > 7 bob C C > 8 bob B C > 10 bob B A > 11 joe B B > 13 joe B C > 14 joe B A > > This is where I'm hung up. I've been trying match() or %in% to get the rows > of df where df[,1:3] match df.culled > > > df[df.culled %in% df[,1:3],] > > Is this a reasonable solution, or am I making it more difficult than it need > to be? > > Thanks for your suggestions, > > Jason [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.