Re: [R] What don't I understand about sample()?

Ebert,Timothy Aaron Thu, 13 Mar 2025 21:20:03 -0700

This is fun.
In a stats class you are trying to deal with data. There is the underlying 
distribution. This is the random number generator. I have a population that is 
following the underlying distribution. In this case my population is 10,000 
individuals with a true population mean of 40 and a standard deviation of 6.
Population1 <- round(rnorm(10000, 40, 6), 2)  ### rounds numbers to 2 decimal 
places
However, Population1 can be any vector of any size, and the code will work.
Population1 <- 1:10
Population1 <- c("a", "b", "C", "D", "e", "f")
Population1 <- letters[1:13]   ### note square brackets here. Round ones do not 
work.
Population1 <- c(letters[1:13], LETTERS[8:21])


I cannot test every individual in the population of 10,000. My experiment must 
sample the population. I hope to get away with a sample size of five, but I 
want to understand the variability in my outcomes. I will take ten sets of five 
values (just as you have in your example)

Matrix1 <- matrix(sample(Population1, size = length(Matrix1), replace = TRUE), 
nrow = 5, ncol = 10)
print(Matrix1)

In some cases, I find it easier to understand if I use loops instead. This is 
just a different way to solve the same problem.
# Fill the matrix using for loops
Matrix1 <- matrix(0,5,10)   ### create and initialize the matrix
for (i in 1:nrow(Matrix1)) {
  for (j in 1:ncol(Matrix1)) {
    Matrix1[i, j] <- sample(Population1, 1)  # Pick a random value from 
Population1
  }
}
print(Matrix1)

If you want every row to have every value in Population1 (in the case where 
Population1 <- 1:10) then change replace=TRUE to replace=FALSE in
Matrix1 <- matrix(sample(Population1, size = length(Matrix1), replace = TRUE), 
nrow = 5, ncol = 10)
. If you want to make this more generic, a simple improvement would be to set 
the number of columns to be the length of Population1.
Matrix1 <- matrix(sample(Population1, size = length(Matrix1), replace = FALSE), 
nrow = 5, ncol = length(Population1))


In your example you told R to take a random sample of ten values (from the 
integers 1 to 10) and then R made five copies to fill the matrix. To make that 
approach work as planned you could make a hybrid approach like this where I 
take a random sample of ten values and then loop through that for each row in 
the matrix.

Matrix1 <- matrix(0, 5, 10)
# Fill the matrix row-wise using a single loop
for (i in 1:nrow(Matrix1)) {
  sample1 <- sample(Population1, 10, replace = TRUE)  # Sample 10 values for 
the row
  Matrix1[i, ] <- sample1  # Directly assign the entire row
}
# Print the filled matrix
print(Matrix1)

You can also make and use your own variables.
matrix_rows <- 5
matrix_columns <- 10
values <- matrix_rows * matrix_columns
pop_min <- 1
pop_max <- 10
Population1 <- pop_min : pop_max
Matrix1 <- matrix(sample(Population1, size = values, replace = TRUE), nrow = 
matrix_rows, ncol = matrix_columns)
print(Matrix1)

You can look at the effect of sample size by changing matrix_rows at the top of 
the program.


Tim


-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Kevin Zembower via 
R-help
Sent: Thursday, March 13, 2025 5:00 PM
To: r-help@r-project.org
Subject: [R] What don't I understand about sample()?

[External Email]

Hello, all,

I'm learning to do randomized distributions in my Stats 101 class*. I thought I 
could do it with a call to sample() inside a matrix(), like:

> matrix(sample(1:10, replace=TRUE), 5, 10, byrow=TRUE)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    8    2    3    1    8    2    8    8    9     8
[2,]    8    2    3    1    8    2    8    8    9     8
[3,]    8    2    3    1    8    2    8    8    9     8
[4,]    8    2    3    1    8    2    8    8    9     8
[5,]    8    2    3    1    8    2    8    8    9     8
>

Imagine my surprise to learn that all the rows were the same permutation. I 
thought each time sample() was called inside the matrix, it would generate a 
different permutation.

I modeled this after the bootstrap sample techniques in 
https://pages.stat.wisc.edu/~larget/stat302/chap3.pdf. I don't understand why 
it works in bootstrap samples (with replace=TRUE), but not in randomized 
distributions (with replace=FALSE).

Thanks for any insight you can share with me, and any suggestions for getting 
rows in a matrix with different permutations.

-Kevin

*No, this isn't a homework problem. We're using Lock5 as the text in class, 
along with its StatKey web application. I'm just trying to get more out of the 
class by also solving our problems using R, for which I'm not receiving any 
class credit.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide https://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] What don't I understand about sample()?

Reply via email to