On Thu, Oct 9, 2014 at 12:08 PM, Gilles <gil...@harfang.homelinux.org> wrote:
> Hi. > > On Wed, 08 Oct 2014 21:45:30 +0200, Luc Maisonobe wrote: > >> Le 08/10/2014 19:32, Gilles a écrit : >> >>> > [...] >>>> >>> >>> What do you think of the solution proposed in >>> https://issues.apache.org/jira/browse/MATH-1158 >>> ? >>> >>> [No hidden RNG, no default RNG, no hidden performance >>> cost or delayed initialization, similar API, ...] >>> >> >> I may have missed something but it does not really separate the sampling >> from the distribution: >> > > This is what Phil wrote (in the reply to same post): > > > That is basically going back to not having sampling in the > > distribution classes. > > Doesn't this show that we don't seem to talk about the same things? > How are we supposed to make progress when one summarizes a proposal > with a blunt statement that misrepresents my attempt to go in the > direction which you both advocated in an earlier post. > > [It's this kind of nonsense which I tried to avoid in comments for > MATH-1154.] > > That said, I'm more inclined to agree with you: the sampling is still > available from a distribution instance (and it _should_ be so: the > previous discussion on this subject had led everyone to recognize that > it made no sense, mathematically and programmatically, to have large > swaths of duplicated code, as was the case before "sample()" was > introduced). > > [I'm certainly not the one to want to go back to that previous state. > And this proposal is nowhere near to it.] > > the Sampler instance is an inner class that calls >> the distribution sample(Rng) method. So this method still exists in the >> distribution and can be called directly from there (apart from being >> protected). >> > > Well, that is exactly the point: the "sample(RNG)" takes that RNG > argument and that's what everyone seemed to agree on (unless I'm > mistaken). > It is protected to enforce API separation: the distribution class provides > the necessary hooks (to the method and fields) needed to define (override) > the behaviour of "sample", while the caller will access the sampling > through > a dedicated, simple, class (implementing the "Sampler" interface). > > >> The good thing is the generator is visible. I think we all agree the >> generator must be visible, and it already is in the existing constructors. >> > > That's the other point: some people have asked why there is a RNG > argument although most of their usage do not need one. > In my proposal, there is indeed no more RNG in the distribution object; > only a user that needs it (for sampling, of course) will have to provide > one to the factory method. > > >> The difference I see between this solution and clearly specifying in >> documentation what happens when the distribution is built without a >> generator is that with your solution users can first build the >> distribution and later on deciding they need to sample from it and then >> create the sampler as an afterthought. I'm not sure this is worth the >> change, but I may be mistaken here, I don't use sampling on >> distributions so may well miss a classical use case. >> > > That proposal would > 1. fix the performance issue reported in MATH-1154, > this proposal would only fix it in 4.0, as we can not remove the rng references in the distribution classes immediately. > 2. fix that issue without resorting to the so called "smelly > workaround", > there is nothing smelly about it, and if you want to fix this issue in 3.4 there is not much choice (apart from the lazy initialization proposal). > 3. simplify the "RealDistribution" interface namely by > removing the "reseedRandomGenerator" method (whose presence > has always been a flaw, IMO): the RNG is and stays under the > caller's explicit responsibility, > 4. simplify the concrete distribution classes, by removing all > the constructors that take a RNG argument, > 5. make the distribution classes thread-safe, by making the > "devilishly hard" problem (of ensuring RNG thread-safety) go > away. > Moreover: > 6. An application layer that only needs sampling could pass a > "Sampler" object: hiding the implicit distribution could make > the application cleaner (self-documented). > 7. The modification can be backwards compatible, and user can > upgrade easily. > with the other points I agree, but I would make the sample(RandomGenerator) method public, so that users have the choice to directly call it rather than create a Sampler for this distribution. Thomas