Hi Bert and All: good morning I promise this would be the last time to write about this topic.
I come up with this R function (please see below), for sure with your help. It works for all sample sizes. I also provided three different simple examples. with many thanks abou ################## Here it is ############### Random.Sample.IDs <- function (N,n, ngroups){ #### N = population size, and n = sample size, ngroups = number of groups population.IDs <- seq(1, N, by = 1) sample.IDs <- sample(population.IDs,n) ##### to print sample.IDs in a column format ##### -------------------------------------------------- sample.IDs.in.column<-data.frame(sample.IDs) print(sample.IDs.in.column) reminder.n<-n%%ngroups reminder.n n.final<-n-reminder.n n.final m <- n %/% 3 m s <- sample(1:n, n) if (reminder.n == 0) { group1.IDs <- sample.IDs[s[1:m]] group2.IDs <- sample.IDs[s[(m+1):(2*m)]] group3.IDs <- sample.IDs[s[(m*2+1):(3*m)]] } else if(reminder.n == 1){ group1.IDs <- sample.IDs[s[1:(m+1)]] group2.IDs <- sample.IDs[s[(m+2):(2*m+1)]] group3.IDs <- sample.IDs[s[(m*2+2):(3*m+1)]] } else if(reminder.n == 2){ group1.IDs <- sample.IDs[s[1:(m+1)]] group2.IDs <- sample.IDs[s[(m+2):(2*m+2)]] group3.IDs <- sample.IDs[s[(m*2+3):(3*m+2)]] } nn<-max(length(group1.IDs),length(group2.IDs),length(group3.IDs)) nn length(group1.IDs) <- nn length(group2.IDs) <- nn length(group3.IDs) <- nn groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) groups.IDs } ##### Examples ##### -------- Random.Sample.IDs (100,12,3) #### group sizes are equal (n1=n2=n3=4) Random.Sample.IDs (100,13,3) #### group sizes are NOT equal (n1=5, n2=4, n3=4) Random.Sample.IDs (100,17,3) #### group sizes are NOT equal (n1=6, n2=6, n3=5) ______________________ *AbouEl-Makarim Aboueissa, PhD* *Professor, Statistics and Data Science* *Graduate Coordinator* *Department of Mathematics and Statistics* *University of Southern Maine* On Sun, Sep 5, 2021 at 6:50 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > In case anyone is still interested in my query, note that if there are > n total items to be split into g groups as evenly as possible, if we > define this as at most two different size groups whose size differs by > 1, then: > > if n = k*g + r, where 0 <= r < g, > then n = k*(g - r) + (k + 1)*r . > i.e. g-r groups of size k and r groups of size k+1 > > So using R's modular arithmetic operators, which are handy to know > about, we have: > > r = n %% g and k = n %/% g . > > (and note that you should disregard my previous stupid remark about > numerical analysis). > > Cheers, > Bert > > > On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > > > > I have a more general problem for you. > > > > Given n items and 2 <=g <<n , how do you divide the n items into g > > groups that are as "equal as possible." > > > > First, operationally define "as equal as possible." > > Second, define the algorithm to carry out the definition. Hint: Note > > that sum{m[i]} for i <=g must sum to n, where m[i] is the number of > > items in the ith group. > > Third, write R code for the algorithm. Exercise for the reader. > > > > I may be wrong, but I think numerical analysts might also have a > > little fun here. > > > > Randomization, of course, is trivial. > > > > Cheers, > > Bert > > > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > > and sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa > > <abouelmakarim1...@gmail.com> wrote: > > > > > > Dear Thomas: > > > > > > > > > Thank you very much for your input in this matter. > > > > > > > > > The core part of this R code(s) (please see below) was written by > *Richard > > > O'Keefe*. I had three examples with different sample sizes. > > > > > > > > > > > > *First sample of size n1 = 204* divided randomly into three groups of > sizes > > > 68. *No problems with this one*. > > > > > > > > > > > > *The second sample of size n2 = 112* divided randomly into three > groups of > > > sizes 37, 37, and 38. BUT this R code generated three groups of equal > sizes > > > (37, 37, and 37). *How to fix the code to make sure that the output > will be > > > three groups of sizes 37, 37, and 38*. > > > > > > > > > > > > *The third sample of size n3 = 284* divided randomly into three groups > of > > > sizes 94, 95, and 95. BUT this R code generated three groups of equal > sizes > > > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the > > > output will be three groups of sizes 94, 95, and 95*. > > > > > > > > > With many thanks > > > > > > abou > > > > > > > > > ########### ------------------------ ############# > > > > > > > > > N1 <- 485 > > > population1.IDs <- seq(1, N1, by = 1) > > > #### population1.IDs > > > > > > n1<-204 ##### in this case the > size > > > of each group of the three groups = 68 > > > sample1.IDs <- sample(population1.IDs,n1) > > > #### sample1.IDs > > > > > > #### n1 <- length(sample1.IDs) > > > > > > m1 <- n1 %/% 3 > > > s1 <- sample(1:n1, n1) > > > group1.IDs <- sample1.IDs[s1[1:m1]] > > > group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] > > > group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] > > > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > > > groups.IDs > > > > > > > > > ####### -------------------------- > > > > > > > > > N2 <- 266 > > > population2.IDs <- seq(1, N2, by = 1) > > > #### population2.IDs > > > > > > n2<-112 ##### in this case the sizes of the > three > > > groups are(37, 37, and 38) > > > ##### BUT this codes generate > > > three groups of equal sizes (37, 37, and 37) > > > sample2.IDs <- sample(population2.IDs,n2) > > > #### sample2.IDs > > > > > > #### n2 <- length(sample2.IDs) > > > > > > m2 <- n2 %/% 3 > > > s2 <- sample(1:n2, n2) > > > group1.IDs <- sample2.IDs[s2[1:m2]] > > > group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] > > > group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] > > > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > > > groups.IDs > > > > > > > > > ####### -------------------------- > > > > > > > > > > > > N3 <- 674 > > > population3.IDs <- seq(1, N3, by = 1) > > > #### population3.IDs > > > > > > n3<-284 ##### in this case the sizes of the > three > > > groups are(94, 95, and 95) > > > ##### BUT this codes generate > > > three groups of equal sizes (94, 94, and 94) > > > sample2.IDs <- sample(population2.IDs,n2) > > > sample3.IDs <- sample(population3.IDs,n3) > > > #### sample3.IDs > > > > > > #### n3 <- length(sample2.IDs) > > > > > > m3 <- n3 %/% 3 > > > s3 <- sample(1:n3, n3) > > > group1.IDs <- sample3.IDs[s3[1:m3]] > > > group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] > > > group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] > > > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > > > groups.IDs > > > > > > ______________________ > > > > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > > > *Professor, Statistics and Data Science* > > > *Graduate Coordinator* > > > > > > *Department of Mathematics and Statistics* > > > *University of Southern Maine* > > > > > > > > > > > > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia <tgs...@yahoo.com> wrote: > > > > > > > Abou, > > > > > > > > > > > > > > > > I’ve been following your question on how to split a data column > randomly > > > > into 3 groups using R. > > > > > > > > > > > > > > > > My method may not be amenable for a large set of data but it surely > worth > > > > considering since it makes sense intuitively. > > > > > > > > > > > > > > > > mydata <- LETTERS[1:11] > > > > > > > > > mydata > > > > > > > > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" > > > > > > > > > > > > > > > > # Let’s choose a random sample of size 4 from mydata > > > > > > > > > random_grp1 > > > > > > > > [1] "J" "H" "D" "A" > > > > > > > > > > > > > > > > Now my next random selection of data is defined by > > > > > > > > data_wo_random <- setdiff(mydata,random_grp1) > > > > > > > > # this makes sense because I need to choose random data from a set > which > > > > is defined by the difference of the sets mydata and random_grp1 > > > > > > > > > > > > > > > > > data_wo_random > > > > > > > > [1] "B" "C" "E" "F" "G" "I" "K" > > > > > > > > > > > > > > > > This is great! So now I can randomly select data of any size from > this set. > > > > > > > > Repeating this process can easily generate subgroups of your original > > > > dataset of any size you want. > > > > > > > > > > > > > > > > Surely this method could be improved so that this could be done > > > > automatically. > > > > > > > > Nevertheless, this is an intuitive method which I believe is easier > to > > > > understand than some of the other methods posted. > > > > > > > > > > > > > > > > Hope this helps! > > > > > > > > > > > > > > > > Thomas Subia > > > > > > > > Statistician > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.