2011/9/7 Phil Steitz <phil.ste...@gmail.com>: > On 9/6/11 8:58 AM, Mikkel Meyer Andersen wrote: >> 2011/9/6 Phil Steitz <phil.ste...@gmail.com>: >>> On 9/6/11 12:00 AM, Mikkel Meyer Andersen wrote: >>>> 2011/9/5 Phil Steitz <phil.ste...@gmail.com>: >>>>> I have a couple of proposals for this class: >>>>> >>>>> 0) Merge the interface and impl. This is consistent with what we >>>>> are doing in some other places where we have only one implementation. >>>> Fine with me. >>>>> 1) Extend this class to actually provide a distribution - i.e. >>>>> implement the Distribution interface. >>>> Won't we have problems, e.g. with implementing cumulativeProbability? >>> The idea I had was to interpolate within bins. So to compute the >>> cdf at x you would find its bin, sum the mass (based on number of >>> original sample points contained, like the sampling does) of the >>> bins below its containing bin and then use the defined kernel within >>> bin to determine how much of its own bin's mass to include. >> Seems reasonable. But: We might want to include a user specified >> support - just simple (endpoints of an interval) - or else the highest >> and lowest value specifies the support which might not be a good idea. > > By the latter, do you mean just interpolate linearly between lowest > and highest, or do you mean the lowest / highest actually observed > points in the bin? The first is like using a uniform kernel in the > bins. By "user-specified support" I guess you mean make the > interpolation strategy pluggable somehow, right? What launched me > into thinking about making the kernel used for sampling configurable > was thinking about how uniform would probably be better / more > defensible for use interpolating the cdf in some cases. Then you > have to ask is it OK to use a different kernel for the sampling vs > cdf computation. My instinct is to say no and keep it simple - > allow a uniform kernel to be chosen in place of the hard-coded > Gaussian there now and then use the configured kernel for both > sampling and cdf computation. Even with mixed kernels, you will > probably in most cases end up with decent fidelity between sampling > results and the cdf; but I can imagine scenarios where Gaussian > kernels with coarse grids could lead to funny sampling distributions > that would not follow the linearly-interpolated cdf very well near > grid points. > > Phil "but I can imagine scenarios where Gaussian kernels with coarse grids could lead to funny sampling distributions that would not follow the linearly-interpolated cdf very well near grid points." Yes, precisely. Especially if trying to distribute the probability mass on a discrete grid :-).
To clearify what I ment by user-specified support: If a user has observations 1, 3, 4, we would probably want to open up for probability mass elsewhere than just at {1, 2, 3, 4} (2 is interpolated). Then I mean that it might make sense that the user can specify that that the distribution is discrete with a support of {0, 1, 2, 3, 4, 5} (2 is interpolated and 0/5 interpolated). Similar for continuous distributions. Of is that too ambitious? Regarding kernels, I'm okay with only supporting uniform and Gaussian, but we might think about it - we might come up with a clever solution giving pluggable kernels almost for free (if we are lucky :-)). Cheers, Mikkel. >>>>> 2) make the kernel used within bins configurable. Currently, values >>>>> are generated (and the cdf would be computed) assuming a Gaussian >>>>> distribution within bins. I think at least a uniform option should >>>>> be provided. >>>> +1, maybe it can be generalised to providing user-defined kernels. >>> Good idea. Need to think about how to enable that. >>> >>> Thanks! >>> >>> Phil >>>>> Thanks in advance for any feedback on this or further suggestions >>>>> for improvement. >>>>> >>>>> Phil >>>>> >> Cheers, Mikkel. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org >> For additional commands, e-mail: dev-h...@commons.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org