Re: [MATH] Test failure in Continuum

Dennis Hendriks Mon, 06 Aug 2012 23:16:49 -0700

See below.

On 08/06/2012 05:29 PM, Phil Steitz wrote:





On Aug 6, 2012, at 6:06 AM, Dennis Hendriks<d.hendr...@tue.nl>  wrote:

See below.

Dennis


On 08/06/2012 02:48 PM, Phil Steitz wrote:





On Aug 5, 2012, at 11:21 PM, Dennis Hendriks<d.hendr...@tue.nl>   wrote:

See below.

On 08/06/2012 12:49 AM, Gilles Sadowski wrote:

On Sun, Aug 05, 2012 at 12:54:11PM -0700, Phil Steitz wrote:

On 8/4/12 10:57 AM, Gilles Sadowski wrote:

Hello.

Referring to this failed test (cf. messages from Continuum):
---CUT---
org.apache.commons.math3.exception.NumberIsTooLargeException: lower bound (65) 
must be strictly less than upper bound (65)
    at 
org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:73)
    at 
org.apache.commons.math3.distribution.UniformIntegerDistribution.<init>(UniformIntegerDistribution.java:53)
    at 
org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.generatePartition(AggregateSummaryStatisticsTest.java:275)
    at 
org.apache.commons.math3.stat.descriptive.AggregateSummaryStatisticsTest.testAggregationConsistency(AggregateSummaryStatisticsTest.java:89)


It is due to a precondition check while creating the
"UniformIntegerDistribution" instance:
---CUT---
if (lower>= upper) {
     throw new NumberIsTooLargeException(
                        LocalizedFormats.LOWER_BOUND_NOT_BELOW_UPPER_BOUND,
                        lower, upper, false);
}
---CUT---

The test referred to above was using this code (before I changed it use a
"UniformIntegerDistribution" instance):
---CUT---
final int next = (i == 4 || cur == length - 1) ? length - 1 : 
randomData.nextInt(cur, length - 1);
---CUT---

It is now (after the change):
---CUT---
final IntegerDistribution partitionPoint = new UniformIntegerDistribution(cur, 
length - 1);
final int next = (i == 4 || cur == length - 1) ? length - 1 : 
partitionPoint.sample();
---CUT---

Thus, AFAIK, the failure did not appear before because there was no
precondition enforcement in "nextInt".

The question is: Was the code in the test correct (in allowing the same
value for both bounds?
  * In the negative, how to change it?
  * The affirmative would mean that the precondition check in
    "UniformIntegerDistribution" should be relaxed to allow equal bounds.
    Does this make sense?
    If so, can we change it now, or is it forbidden in order to stay
    backwards compatible?


Your analysis above is correct.  The failure after the change is due
to the fact that post-change the distribution is instantiated before
the bounds check.  I changed the test to fix this.


Thanks.

  Both the
randomData nextInt and the UniformIntegerDistribution constructor
now forbid the degenerate case where there is only one point in the
domain.  In retrospect, I guess it would have probably been better
to allow this degenerate case.  Unfortunately, this would be an
incompatible change, so will have to wait until 4.0 if we want to do it.

The original code above illustrates the convenience of being able to
just make direct calls to randomData.nextXxx, which is why this
class exists ;)


As I wrote in another post, I'm not against the convenience methods. But
IMO, they should be located in a new "DistributionUtils" class.
And we should also find a way to remove the code duplication (in the
distribution's "sample()" method and in the corresponding "next..." method).


The RandomData class (or whatever it would be called) does indeed seem useful. 
If we plan to keep it, we should probably make sure that there is a 
sample/next/... method in that class for EVERY distribution, as some of them 
are missing, if I remember correctly. Perhaps this is a separate issue though?


All have the method now, but the impls delegate to RandomDataImpl.  In some 
cases, there is nothing better implemented than just inversion, provided by the 
default inversion sampler.  That is OK.  What we need to do is just move the 
implementations of the default and specialized samplers to the actual 
distribution classes.  These can't be static, as they use the RamdomData 
instance.  I will take care of this.


I'm not sure if I made myself clear. I meant to say that not all distributions 
have a corresponding nextX method in RandomData(Impl). What I propose is to 
make sure that for every distribution class, there is a corresponding method in 
RandomData(Impl) to make sure that RandomData(Impl) is actually a substitute 
for using the distributions.


I get it now, but I don't think I agree.  RandomData is meant to be a 
general-purpose  class used for generating random data with commonly desired 
characteristics, like coming from commonly used distributions such as Poisson, 
Gaussian, etc.  This class predates the sample() method that has been added to 
*all* distributions.  The default implementation of sampling for real 
distributions (inversion-based sampling)  has no dependency on RandomDataImpl, 
but the specialized implementations (impls that are better than inversion) for 
some distributions still live there.  Here is what I would like to do:

1. Move the specialized implementations from RandomDataImpl to the 
distributions that they sample from.

2.  Rename RandomDataImpl to merge the impl and the interface

I think we all agree on 1.  Regarding 2, I would personally prefer to leave the 
represented distributions as is, sticking with just the most commonly used 
distributions, having the methods accept parameters, but maintaining generators 
as instance variables so generator state can be maintained between calls (like 
sample() and the existing RandomDataImpl behave now).  If others feel strongly 
that every distribution should be represented, I am OK with that but would see 
it as clutter in the RD replacement.

I definitely do agree on 1. I also agree on 2 (the rename/merge). However,I thought the goal of having RandomData is to be convenient:

- Distributions require a separate instance per distributions. RandomDatarequires only a single instance.

- Distributions require a separate instance per combination ofparameters. RandomData allows providing the parameters when one actuallyasks for samples.

If so, then I don't see why that would be the case for only some of thedistributions, and not for others...? Am I missing something here?


Dennis

Phil


Obviously, the implementation should be only in the distributions OR in the 
RandomDataImpl. I agree that in the distributions would be better. I like the 
static methods in distributions idea.

Phil


Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [MATH] Test failure in Continuum

Reply via email to