Dear list, I have a data frame of survey respondents, a little like this:
set.seed(20081215) n <- 100 dat <- data.frame(id=1:100, addr1=sample(LETTERS, n, replace=TRUE), addr2=sample(LETTERS, n, replace=TRUE), addr3=sample(LETTERS, n, replace=TRUE)) head(dat) id addr1 addr2 addr3 1 1 R H Q 2 2 H C K 3 3 I P S 4 4 A H L 5 5 P Q P I wish to detect potential duplicates in the data frame. In my example, people can have up to three addresses. If two people have the same address, then there is a chance that the two entries are duplicates (for instance, persons 1, 2, and 4 in the sample data have the same entry "H" so I want to be sure they aren't duplicates). Person 5 has the same address "P" for addr1 and addr3 but this is not a duplicate, however, since that person may have the same address in several bits of information. I'm only concerned about multiple people sharing the same address. It's easy to find duplicates within individual columns, but I'm not sure how to do so across columns. Any advice you had would be more than welcome. Thanks! Regards, Andrew C. Ward CAPE Centre Department of Chemical Engineering The University of Queensland Brisbane Qld 4072 Australia ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.