If they want to generate directly from the empirical distribution, then sampling with replacement is the best choice (others had already suggested that). But the reference in the original post to the normal and beta distributions suggested to me that the original poster may have wanted a smooth approximation to the empirical distribution rather than the step function (but not locked to a specific distribution). The logspline package has functions for doing things like this. It has the advantage that it can give a smooth (non-step) plot of the cdf (estimated) as well as generate points that are based on the observed data, but could generate values outside the original range of the data and have fewer ties.
Whether these "advantages" make any difference depends on what they want to do with the observations (for many applications the difference is probably negligible and using sample is the simplest/best). But there may be some uses for which these "advantages" are beneficial. (using sample then adding a small random "error" to each value is another option, but I like the logspline option better). -- Gregory (Greg) L. Snow Ph.D. Statistical Data Center Intermountain Healthcare greg.s...@imail.org 801.408.8111 > -----Original Message----- > From: Frank Harrell [mailto:f.harr...@vanderbilt.edu] > Sent: Tuesday, July 27, 2010 4:54 PM > To: Greg Snow > Cc: xin wei; r-help@r-project.org > Subject: Re: [R] how to generate a random data from a empirical > distribition > > Easiest thing is to sample with replacement from the original data. > This is the idea behind the bootstrap, which is sampling from the > empirical CDF. > > Frank E Harrell Jr Professor and Chairman School of Medicine > Department of Biostatistics Vanderbilt > University > > On Tue, 27 Jul 2010, Greg Snow wrote: > > > Another option for fitting a smooth distribution to data (and > generating future observations from the smooth distribution) is to use > the logspline package. > > > > -- > > Gregory (Greg) L. Snow Ph.D. > > Statistical Data Center > > Intermountain Healthcare > > greg.s...@imail.org > > 801.408.8111 > > > > > >> -----Original Message----- > >> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r- > >> project.org] On Behalf Of xin wei > >> Sent: Monday, July 26, 2010 12:36 PM > >> To: r-help@r-project.org > >> Subject: [R] how to generate a random data from a empirical > >> distribition > >> > >> > >> hi, this is more a statistical question than a R question. but I do > >> want to > >> know how to implement this in R. > >> I have 10,000 data points. Is there any way to generate a empirical > >> probablity distribution from it (the problem is that I do not know > what > >> exactly this distribution follows, normal, beta?). My ultimate goal > is > >> to > >> generate addition 20,000 data point from this empirical distribution > >> created > >> from the existing 10,000 data points. > >> thank you all in advance. > >> > >> > >> -- > >> View this message in context: http://r.789695.n4.nabble.com/how-to- > >> generate-a-random-data-from-a-empirical-distribition- > >> tp2302716p2302716.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/posting- > >> guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting- > guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.