Re: [math] Thread-safety and initialization overhead in distribution sample() methods

Gilles Thu, 09 Oct 2014 03:09:08 -0700

Hi.

On Wed, 08 Oct 2014 21:45:30 +0200, Luc Maisonobe wrote:

Le 08/10/2014 19:32, Gilles a écrit :

> [...]


What do you think of the solution proposed in
  https://issues.apache.org/jira/browse/MATH-1158
?

[No hidden RNG, no default RNG, no hidden performance
cost or delayed initialization, similar API, ...]

I may have missed something but it does not really separate thesampling

from the distribution:


This is what Phil wrote (in the reply to same post):

  > That is basically going back to not having sampling in the
  > distribution classes.

Doesn't this show that we don't seem to talk about the same things?
How are we supposed to make progress when one summarizes a proposal
with a blunt statement that misrepresents my attempt to go in the
direction which you both advocated in an earlier post.

[It's this kind of nonsense which I tried to avoid in comments for
MATH-1154.]

That said, I'm more inclined to agree with you: the sampling is still
available from a distribution instance (and it _should_ be so: the
previous discussion on this subject had led everyone to recognize that
it made no sense, mathematically and programmatically, to have large
swaths of duplicated code, as was the case before "sample()" was
introduced).

[I'm certainly not the one to want to go back to that previous state.
And this proposal is nowhere near to it.]

the Sampler instance is an inner class that calls
the distribution sample(Rng) method. So this method still exists inthe
distribution and can be called directly from there (apart from being
protected).


Well, that is exactly the point: the "sample(RNG)" takes that RNG
argument and that's what everyone seemed to agree on (unless I'm
mistaken).

It is protected to enforce API separation: the distribution classprovidesthe necessary hooks (to the method and fields) needed to define(override)the behaviour of "sample", while the caller will access the samplingthrough

a dedicated, simple, class (implementing the "Sampler" interface).

The good thing is the generator is visible. I think we all agree the
generator must be visible, and it already is in the existingconstructors.


That's the other point: some people have asked why there is a RNG
argument although most of their usage do not need one.
In my proposal, there is indeed no more RNG in the distribution object;

only a user that needs it (for sampling, of course) will have toprovide

one to the factory method.


The difference I see between this solution and clearly specifying in
documentation what happens when the distribution is built without a
generator is that with your solution users can first build the

distribution and later on deciding they need to sample from it andthen

create the sampler as an afterthought. I'm not sure this is worth the
change, but I may be mistaken here, I don't use sampling on
distributions so may well miss a classical use case.


That proposal would
1. fix the performance issue reported in MATH-1154,
2. fix that issue without resorting to the so called "smelly
   workaround",
3. simplify the "RealDistribution" interface namely by
   removing the "reseedRandomGenerator" method (whose presence
   has always been a flaw, IMO): the RNG is and stays under the
   caller's explicit responsibility,
4. simplify the concrete distribution classes, by removing all
   the constructors that take a RNG argument,
5. make the distribution classes thread-safe, by making the
   "devilishly hard" problem (of ensuring RNG thread-safety) go
   away.
Moreover:
6. An application layer that only needs sampling could pass a
   "Sampler" object: hiding the implicit distribution could make
   the application cleaner (self-documented).
7. The modification can be backwards compatible, and user can
   upgrade easily.

If that list is not enough to be worth the change, I don't know
what would be.


Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [math] Thread-safety and initialization overhead in distribution sample() methods

Reply via email to