Thanks much for all the help, R-helpers. Ended up getting the counts of the
categories of the matching variable in both x and y and then limiting the
sample from there. No longer really random, but I think it's fine for my
purposes.

Thanks again.
LB

On 28 September 2010 18:40, Michael Bedward <michael.bedw...@gmail.com>wrote:

> Hello LB,
>
> It's one of those problems that's basic but tricky :)  I don't have an
> elegant one-liner for it but here's a function that would do it...
>
> function(xs, y) {
> # sample matrix y such that col 2 of the sample matches
> # col 2 of matrix xs
>
>  used <- logical(nrow(y))
>  yi <- integer(nrow(xs))
>
>  k <- 1
>  for (xsval in xs[,2]) {
>    i <- which( !used & y[,2] == xsval )
>    if (length(i) >= 1) {
>      yi[k] <- sample(i, 1)
>      used[ yi[k] ] <- TRUE
>      k <- k + 1
>    } else {
>      stop("bummer: not possible to get a matching sample")
>    }
>  }
>
>  y[yi, ]
> }
>
> Note, I've assumed here that in your real data the first col won't
> always contain the row index as it does in your example.
>
> Michael

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to