Thanks much for all the help, R-helpers. Ended up getting the counts of the categories of the matching variable in both x and y and then limiting the sample from there. No longer really random, but I think it's fine for my purposes.
Thanks again. LB On 28 September 2010 18:40, Michael Bedward <michael.bedw...@gmail.com>wrote: > Hello LB, > > It's one of those problems that's basic but tricky :) I don't have an > elegant one-liner for it but here's a function that would do it... > > function(xs, y) { > # sample matrix y such that col 2 of the sample matches > # col 2 of matrix xs > > used <- logical(nrow(y)) > yi <- integer(nrow(xs)) > > k <- 1 > for (xsval in xs[,2]) { > i <- which( !used & y[,2] == xsval ) > if (length(i) >= 1) { > yi[k] <- sample(i, 1) > used[ yi[k] ] <- TRUE > k <- k + 1 > } else { > stop("bummer: not possible to get a matching sample") > } > } > > y[yi, ] > } > > Note, I've assumed here that in your real data the first col won't > always contain the row index as it does in your example. > > Michael [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.