In case anyone is still interested in my query, note that if there are n total items to be split into g groups as evenly as possible, if we define this as at most two different size groups whose size differs by 1, then:
if n = k*g + r, where 0 <= r < g, then n = k*(g - r) + (k + 1)*r . i.e. g-r groups of size k and r groups of size k+1 So using R's modular arithmetic operators, which are handy to know about, we have: r = n %% g and k = n %/% g . (and note that you should disregard my previous stupid remark about numerical analysis). Cheers, Bert On Sat, Sep 4, 2021 at 3:34 PM Bert Gunter <bgunter.4...@gmail.com> wrote: > > I have a more general problem for you. > > Given n items and 2 <=g <<n , how do you divide the n items into g > groups that are as "equal as possible." > > First, operationally define "as equal as possible." > Second, define the algorithm to carry out the definition. Hint: Note > that sum{m[i]} for i <=g must sum to n, where m[i] is the number of > items in the ith group. > Third, write R code for the algorithm. Exercise for the reader. > > I may be wrong, but I think numerical analysts might also have a > little fun here. > > Randomization, of course, is trivial. > > Cheers, > Bert > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Sat, Sep 4, 2021 at 2:13 PM AbouEl-Makarim Aboueissa > <abouelmakarim1...@gmail.com> wrote: > > > > Dear Thomas: > > > > > > Thank you very much for your input in this matter. > > > > > > The core part of this R code(s) (please see below) was written by *Richard > > O'Keefe*. I had three examples with different sample sizes. > > > > > > > > *First sample of size n1 = 204* divided randomly into three groups of sizes > > 68. *No problems with this one*. > > > > > > > > *The second sample of size n2 = 112* divided randomly into three groups of > > sizes 37, 37, and 38. BUT this R code generated three groups of equal sizes > > (37, 37, and 37). *How to fix the code to make sure that the output will be > > three groups of sizes 37, 37, and 38*. > > > > > > > > *The third sample of size n3 = 284* divided randomly into three groups of > > sizes 94, 95, and 95. BUT this R code generated three groups of equal sizes > > (94, 94, and 94). *Again*, h*ow to fix the code to make sure that the > > output will be three groups of sizes 94, 95, and 95*. > > > > > > With many thanks > > > > abou > > > > > > ########### ------------------------ ############# > > > > > > N1 <- 485 > > population1.IDs <- seq(1, N1, by = 1) > > #### population1.IDs > > > > n1<-204 ##### in this case the size > > of each group of the three groups = 68 > > sample1.IDs <- sample(population1.IDs,n1) > > #### sample1.IDs > > > > #### n1 <- length(sample1.IDs) > > > > m1 <- n1 %/% 3 > > s1 <- sample(1:n1, n1) > > group1.IDs <- sample1.IDs[s1[1:m1]] > > group2.IDs <- sample1.IDs[s1[(m1+1):(2*m1)]] > > group3.IDs <- sample1.IDs[s1[(m1*2+1):(3*m1)]] > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > groups.IDs > > > > > > ####### -------------------------- > > > > > > N2 <- 266 > > population2.IDs <- seq(1, N2, by = 1) > > #### population2.IDs > > > > n2<-112 ##### in this case the sizes of the three > > groups are(37, 37, and 38) > > ##### BUT this codes generate > > three groups of equal sizes (37, 37, and 37) > > sample2.IDs <- sample(population2.IDs,n2) > > #### sample2.IDs > > > > #### n2 <- length(sample2.IDs) > > > > m2 <- n2 %/% 3 > > s2 <- sample(1:n2, n2) > > group1.IDs <- sample2.IDs[s2[1:m2]] > > group2.IDs <- sample2.IDs[s2[(m2+1):(2*m2)]] > > group3.IDs <- sample2.IDs[s2[(m2*2+1):(3*m2)]] > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > groups.IDs > > > > > > ####### -------------------------- > > > > > > > > N3 <- 674 > > population3.IDs <- seq(1, N3, by = 1) > > #### population3.IDs > > > > n3<-284 ##### in this case the sizes of the three > > groups are(94, 95, and 95) > > ##### BUT this codes generate > > three groups of equal sizes (94, 94, and 94) > > sample2.IDs <- sample(population2.IDs,n2) > > sample3.IDs <- sample(population3.IDs,n3) > > #### sample3.IDs > > > > #### n3 <- length(sample2.IDs) > > > > m3 <- n3 %/% 3 > > s3 <- sample(1:n3, n3) > > group1.IDs <- sample3.IDs[s3[1:m3]] > > group2.IDs <- sample3.IDs[s3[(m3+1):(2*m3)]] > > group3.IDs <- sample3.IDs[s3[(m3*2+1):(3*m3)]] > > > > groups.IDs <-cbind(group1.IDs,group2.IDs,group3.IDs) > > > > groups.IDs > > > > ______________________ > > > > > > *AbouEl-Makarim Aboueissa, PhD* > > > > *Professor, Statistics and Data Science* > > *Graduate Coordinator* > > > > *Department of Mathematics and Statistics* > > *University of Southern Maine* > > > > > > > > On Sat, Sep 4, 2021 at 11:54 AM Thomas Subia <tgs...@yahoo.com> wrote: > > > > > Abou, > > > > > > > > > > > > I’ve been following your question on how to split a data column randomly > > > into 3 groups using R. > > > > > > > > > > > > My method may not be amenable for a large set of data but it surely worth > > > considering since it makes sense intuitively. > > > > > > > > > > > > mydata <- LETTERS[1:11] > > > > > > > mydata > > > > > > [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" > > > > > > > > > > > > # Let’s choose a random sample of size 4 from mydata > > > > > > > random_grp1 > > > > > > [1] "J" "H" "D" "A" > > > > > > > > > > > > Now my next random selection of data is defined by > > > > > > data_wo_random <- setdiff(mydata,random_grp1) > > > > > > # this makes sense because I need to choose random data from a set which > > > is defined by the difference of the sets mydata and random_grp1 > > > > > > > > > > > > > data_wo_random > > > > > > [1] "B" "C" "E" "F" "G" "I" "K" > > > > > > > > > > > > This is great! So now I can randomly select data of any size from this > > > set. > > > > > > Repeating this process can easily generate subgroups of your original > > > dataset of any size you want. > > > > > > > > > > > > Surely this method could be improved so that this could be done > > > automatically. > > > > > > Nevertheless, this is an intuitive method which I believe is easier to > > > understand than some of the other methods posted. > > > > > > > > > > > > Hope this helps! > > > > > > > > > > > > Thomas Subia > > > > > > Statistician > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.