We should probably say which parts of the problem are important to us. It begins to sound like we each care about slightly different aspects of the problem.
The only points that I really care about are: - the user should have available some obvious way to sample from a distribution as a method on the distribution itself. This need is not met by having a completely separate class in a different package that the user must somehow intuit the existence of. - the user should have the widest possible number of distributions that have *some* kind of sampling procedure that produces accurate samples. Morevoer, this wide availability should happen very soon. Note that neither of these points really implies much about implementation other than where the user of commons-math can find an access to implementations and that we implement something across many distributions very soon. These are points that I explicitly don't care about: - should the implementation be based on inverse cumulative distributions if available? If there is another way to get lots of sampling algorithms implemented, I am all for it. Marsaglia's table method for discrete distributions is an interesting option for some cases. There may be other algorithms that could have wide applicability. Multiple approaches might be a good idea, special purpose samplers for some cases (like normal or exponential distributions), kind of general methods like Marsaglia's method where it can be done. If all of the common cases have special purpose, high quality generators, I don't see a problem with letting all of the other distributions that we haven't considered yet fall back to inverse cumulative methods. But all of these considerations are not what I really care about. I only care about very wide availability of *some* sampling method. - should there be random number generators that provide more generality/flexibility/alternative implementations for sampling for various distributions. This is an implementation question that can be answered many ways. I think that lots of alternatives are good. I even think that having pure implementations of one method or another might be an excellent way to allow us to stitch together the sampling available by default from the distribution. All of these consideration, however, are not what I really care about. What I care about is that all of these implementations should be ignorable by a less than devoted user of commons math. Now, it seems to me that the points that Phil cares most about fall mostly into the set of things that I care less about. Moreover, some of the opinions that Phil has expressed have been stated in ways that I may have misinterpreted. For instance, it sounded to me like Phil was saying that we shouldn't even implement the inverse cumulative sampler. On reflection, I think that his real point is that we should not use the inverse cumulative method where there are better methods, especially if we already have implementations of the better methods. Likewise, it sounded to me like Phil was saying that we absolutely shouldn't allow easy access to a community consensus sampling algorithm from the distribution. On further reflection, I think that his real point is that we simply should not be doing most implementation in the distribution function class, but should have a separate package to separate all that work away from the view of the users. That sounds like a really good idea, if only to decrease the noise for the casual user of the distribution classes. This sounds like the germ of compromise. On Mon, Nov 2, 2009 at 3:03 AM, Phil Steitz <phil.ste...@gmail.com> wrote: > I just don't like your suggested implementation and package > placement. I proposed an alternative (a generic method added > somewhere in the random package), which you did not like. There are > no doubt other better ways to do this. Perhaps others have ideas? > -- Ted Dunning, CTO DeepDyve