try this: myDat <- read.table(textConnection("group id 1 101 1 201 1 301 2 401 2 501 2 601 3 701 3 801 3 901"),header=TRUE) closeAllConnections() corr_mat <-as.matrix(read.table(textConnection("1 1 .5 0 0 0 0 0 0 0 2 .5 1 0 0 0 0 0 0 0 3 0 0 1.0 0 0 0 0 0 0 4 0 0 0 1 .5 .5 0 0 0 5 0 0 0 .5 1 .5 0 0 0 6 0 0 0 .5 .5 1 0 0 0 7 0 0 0 0 0 0 1 0 0 8 0 0 0 0 0 0 0 1 .5 9 0 0 0 0 0 0 0 .5 1"),header=FALSE)) closeAllConnections() corr_mat <- corr_mat[,-1] colnames(corr_mat) <- myDat$id rownames(corr_mat) <- myDat$id # split out the groups groups <- split(as.character(myDat$id), myDat$group) # process each subgroup result <- lapply(groups, function(.grp){ subgroup <- corr_mat[.grp, .grp] output <- NULL # zero the diag diag(subgroup) <- 0 same <- apply(subgroup, 1, function(x) any(x != 0)) if (any(same)){ # some match, choose one output <- sample(same[same], 1) } if (any(!same)){ # get all that don't correlate output <- c(output, same[!same]) } output }) # output as matrix do.call(rbind, lapply(names(result), function(x) cbind(x, names(result[[x]]))))
On Mon, Dec 7, 2009 at 7:38 PM, Juliet Hannah <juliet.han...@gmail.com>wrote: > Hi List, > > Here is some example data. > > myDat <- read.table(textConnection("group id > 1 101 > 1 201 > 1 301 > 2 401 > 2 501 > 2 601 > 3 701 > 3 801 > 3 901"),header=TRUE) > closeAllConnections() > > corr_mat <-read.table(textConnection("1 1 .5 0 0 0 0 0 0 0 > 2 .5 1 0 0 0 0 0 0 0 > 3 0 0 1.0 0 0 0 0 0 0 > 4 0 0 0 1 .5 .5 0 0 0 > 5 0 0 0 .5 1 .5 0 0 0 > 6 0 0 0 .5 .5 1 0 0 0 > 7 0 0 0 0 0 0 1 0 0 > 8 0 0 0 0 0 0 0 1 .5 > 9 0 0 0 0 0 0 0 .5 1"),header=FALSE) > closeAllConnections() > > corr_mat <- corr_mat[,-1] > colnames(corr_mat) <- myDat$id > rownames(corr_mat) <- myDat$id > > I need to subset this data such that observations within a group are not > related, which is indicated by a 0 in corr_mat. > > For example, within group 1, 101 and 201 are related, so one of these > has to be selected, say > 101. 301 is not related to 101 or 201, so the final set for group 1 > consists of 101 and 301. There will always be at least 2 members in > each group. I need to carry this task on all groups. > > One possible final data set looks like: > > group id > 1 1 101 > 3 1 301 > 4 2 401 > 7 3 701 > 8 3 801 > > Any suggestions? Thanks! > > Juliet > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.