Hi Val, The by() function could be used here. With the dataframe dfr:
# split the data by first name and check for more than one last name for each first name
res <- by(dfr, dfr['first'], function(x) length(unique(x$last)) > 1) # make the result more easily manipulated res <- as.table(res) res # first # Alex Bob Cory # TRUE FALSE FALSE # then use this result to subset the data nw.dfr <- dfr[!dfr$first %in% names(res[res]) , ] # sort if needed nw.dfr[order(nw.dfr$first) , ] first week last 2 Bob 1 John 5 Bob 2 John 6 Bob 3 John 3 Cory 1 Jack 4 Cory 2 Jack Philip On 12/02/2017 4:02 PM, Val wrote:
Hi all, I have a big data set and want to remove rows conditionally. In my data file each person were recorded for several weeks. Somehow during the recording periods, their last name was misreported. For each person, the last name should be the same. Otherwise remove from the data. Example, in the following data set, Alex was found to have two last names . Alex West Alex Joseph Alex should be removed from the data. if this happens then I want remove all rows with Alex. Here is my data set df<- read.table(header=TRUE, text='first week last Alex 1 West Bob 1 John Cory 1 Jack Cory 2 Jack Bob 2 John Bob 3 John Alex 2 Joseph Alex 3 West Alex 4 West ') Desired output first week last 1 Bob 1 John 2 Bob 2 John 3 Bob 3 John 4 Cory 1 Jack 5 Cory 2 Jack Thank you in advance ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.