Hi Chao Liu, I'm having difficulty following your question, and examples. And also, I don't see the motivation for increasing, then decreasing the sample sizes. Intuitively, one would compute the correct sample sizes, first time round...
But I thought I'd add some comments, just in case they're useful. If the problem relates to memberships (in clusters), then the problem can be simplified. All one needs is an integer vector, where each value is the index of the cluster. To compute random memberships of 600 observations in 20 clusters, one could run: m <- sample (1:20, 600, TRUE) To compute the number of observations per cluster, one could then run: table (m) In the above code, the probability of an observation being assigned to each cluster, is uniform. Non-uniform sampling can be achieved by supplying a 4th argument to the sample function, which is a numeric vector of weights. On Wed, Dec 16, 2020 at 10:08 PM Chao Liu <psychao...@gmail.com> wrote: > > Dear R experts, > > I want to simulate some unbalanced clustered data. The number of clusters > is 20 and the average number of observations is 30. However, I would like > to create an unbalanced clustered data per cluster where there are 10% more > observations than specified (i.e., 33 rather than 30). I then want to > randomly exclude an appropriate number of observations (i.e., 60) to arrive > at the specified average number of observations per cluster (i.e., 30). The > probability of excluding an observation within each cluster was not uniform > (i.e., some clusters had no cases removed and others had more excluded). > Therefore in the end I still have 600 observations in total. How to realize > that in R? Thank you for your help! > > Best, > > Liu > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.