On 9/6/11 8:58 AM, Mikkel Meyer Andersen wrote: > 2011/9/6 Phil Steitz <phil.ste...@gmail.com>: >> On 9/6/11 12:00 AM, Mikkel Meyer Andersen wrote: >>> 2011/9/5 Phil Steitz <phil.ste...@gmail.com>: >>>> I have a couple of proposals for this class: >>>> >>>> 0) Merge the interface and impl. This is consistent with what we >>>> are doing in some other places where we have only one implementation. >>> Fine with me. >>>> 1) Extend this class to actually provide a distribution - i.e. >>>> implement the Distribution interface. >>> Won't we have problems, e.g. with implementing cumulativeProbability? >> The idea I had was to interpolate within bins. So to compute the >> cdf at x you would find its bin, sum the mass (based on number of >> original sample points contained, like the sampling does) of the >> bins below its containing bin and then use the defined kernel within >> bin to determine how much of its own bin's mass to include. > Seems reasonable. But: We might want to include a user specified > support - just simple (endpoints of an interval) - or else the highest > and lowest value specifies the support which might not be a good idea.
By the latter, do you mean just interpolate linearly between lowest and highest, or do you mean the lowest / highest actually observed points in the bin? The first is like using a uniform kernel in the bins. By "user-specified support" I guess you mean make the interpolation strategy pluggable somehow, right? What launched me into thinking about making the kernel used for sampling configurable was thinking about how uniform would probably be better / more defensible for use interpolating the cdf in some cases. Then you have to ask is it OK to use a different kernel for the sampling vs cdf computation. My instinct is to say no and keep it simple - allow a uniform kernel to be chosen in place of the hard-coded Gaussian there now and then use the configured kernel for both sampling and cdf computation. Even with mixed kernels, you will probably in most cases end up with decent fidelity between sampling results and the cdf; but I can imagine scenarios where Gaussian kernels with coarse grids could lead to funny sampling distributions that would not follow the linearly-interpolated cdf very well near grid points. Phil >>>> 2) make the kernel used within bins configurable. Currently, values >>>> are generated (and the cdf would be computed) assuming a Gaussian >>>> distribution within bins. I think at least a uniform option should >>>> be provided. >>> +1, maybe it can be generalised to providing user-defined kernels. >> Good idea. Need to think about how to enable that. >> >> Thanks! >> >> Phil >>>> Thanks in advance for any feedback on this or further suggestions >>>> for improvement. >>>> >>>> Phil >>>> > Cheers, Mikkel. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org