On 12/29/2015 05:10 PM, Gilles wrote:
> On Tue, 29 Dec 2015 10:33:15 +0100, Luc Maisonobe wrote:
>> Hi all,
>>
>> Le 29/12/2015 09:21, Thomas Neidhart a écrit :
>>> On 12/29/2015 04:33 AM, Phil Steitz wrote:
>>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>>>> The significant refactoring to eliminate the (standard) next(int)
>>>>>> included in these changes has the possibility of introducing subtle
>>>>>> bugs or performance issues.  Please run some tests to verify that
>>>>>> the same sequences are generated by the 3_X code
>>>>>
>>>>> IIUC our unit tests of the RNGs, this is covered.
>>>>
>>>> No.  Not sufficient.  What you have done is changed the internal
>>>> implementation of all of the Bitstream generators.  I am not
>>>> convinced that you have not broken anything.  I will have to do the
>>>> testing myself.  I see no point in fiddling with the internals of
>>>> this code that has had a lot of eyeballs and testing on it.  I was
>>>> not personally looking forward to researching the algorithms to make
>>>> sure any invariants may be broken by these changes; but I am now
>>>> going to have to do this.  I have to ask why.  Please at some point
>>>> read [1], especially the sections on "Avoid Flexibility Syndrom" and
>>>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>>>> community energy.
>>>
>>> +1, on top of that I think we should aim to refactor the parts that
>>> really need refactoring
> 
> Though I could have liked to say as much on the parts of the library
> were my changes were much criticized because they failed to produce
> a perfect design (which the previous one wasn't either), I would have
> refrained to tell volunteers what they should do or not.

when I have seen the commit I was already thinking of writing a comment
about it but refrained until Phil was doing so. Afaict, your refactoring
moved the bit-shifting logic from one method (next(int)) to others
(nextDouble(), ...) in a slightly different way. On top of that, some
classes have been removed and added and I do not clearly see how this
has improved something, but maybe it was a change in the right direction
as Luc pointed out.

Nobody blames anybody for something, I believe, instead I have the
impression that people are focused on good technical solutions and this
sometimes means that you get criticism (and I got this as well a lot,
which can be quite frustrating).

I just got the impression lately that some people want to completely
refactor CM for the 4.0 release and I wonder if this will do any good to
the project, especially in areas where there is really no need to
further refactor anything.

>>> and try to keep the number of incompatibilities
>>> to the 3_X branch as minimal as possible.
> 
> I clearly and not surprisingly do not subscribe to that goal.
> And the recent discussions about RERO and "experimental" releases
> certainly were getting to a completely different consensus.

I am not sure what this has to do with my comment. RERO just means that
we should release more often, but you can only do this if you are not
introducing incompatibilities.

The discussion about "experimental" releases was just brain-storming
imho. There was no decision or real consensus on how to achieve this,
just some ideas that are at best very non-standard and in some cases
impractical.

I fear that Apache Commons is not the right place for doing such things.

btw. I proposed to use the scheme from guava to explicitly mark new
features with a Beta annotation to allow possible incompatible changes
later on, which I find much more practical, especially considering the
limited man-power and release cycles.

Thomas

>>>>>> and the refactored
>>>>>> code and benchmarks to show there is no loss in performance.
>>>>>
>>>>> Given that there are exactly two operations _less_ (a subtraction
>>>>> and a shift), it would be surprising.
>>>>>
>>>>>> It
>>>>>> would also be good to have some additional review of this code by
>>>>>> PRNG experts.
>>>>>
>>>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>>>> the little change above (in the last line of the "nextInt/next"
>>>>> code).
>>>>>
>>>>> That change in "nextInt/next" implied similarly tiny recodings in
>>>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>>>>> from that, were copied from "BitsStreamGenerator".
>>>>>
>>>>> [However tiny a change, I had made a mistake... and dozens of tests
>>>>> started to fail. Found the typo and all was quiet again...]
>>>>>
>>>>> About "next(int)" being standard, it would be interesting to know
>>>>> what that means.
>>
>> In all the papers I have read concerning pseudo random number
>> generation, the basic model was based on small chunks of bits,
>> much of the time the size of an int because this is what computer
>> manages directly (they have no provision to manage chunks of 5 or
>> 11 bits for example).
>>
>> Deriving other primitive types from this (boolean, long, double) is
>> really an add-on. I even asked an expert about the (Pierre L'Ecuyer
>> if I remember well) about some explanations for converting to double
>> (which is simply done by multiplying by a constant representing the
>> weight of the least significant bit in order to constrain the range to
>> [0; 1]). His answer was that this ensured the theoretical mathematical
>> proofs that apply to uniform distribution still apply, as only this
>> case (uniformity over a multi-dimensional integral grid) has been
>> studied. It seems nothing has been studied about using the exponential
>> features of floating point representation in relationship with
>> double random number generation directly.
>>
>> Hence everybody starts from int,
> 
> [Or a "long", as I could observe in some other source codes.]
> 
> Hence, do you agree that my move to "nextInt()" was a sensible one?
> 
>> and the mathematicians proved us
>> this method works and some properties are preserved (multi-dimensional
>> independance, long period, ...) that are essential typically for
>> Monte-Carlo analyses.
>>
>> I know nothing about random number generation for secure application
>> like cryptograpgy, except that it requires completely different
>> properties, often completely opposite to what is needed for
>> Monte-Carlo analysis. As an example, it should be impossible to
>> reproduce a secure sequence (it cannot be deterministic), whereas in
>> Monte-Carlo we *want* it to be reproducible if we reuse the same seed.
>>
>>>>
>>>> Have a look at the source code for the JDK generators, for example.
>>>>> As I indicated quite clearly in one of my first posts about this
>>>>> refactoring
>>>>> 1. all the CM implementations generate random bits in batches
>>>>>    of 32 bits, and
>>>>> 2. before returning, the "next(int bits)" method was truncating
>>>>>    the generated "int":
>>>>>      return x >>> (32 - bits);
>>>>>
>>>>> In all implementations, that was the only place where the "bits"
>>>>> parameter was used, from which I concluded that the randomness
>>>>> provider does not care if the request was to create less than 32
>>>>> random bits.
>>>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>>>> bits (or am I missing something?).
>>>>
>>>> Quite possibly, yes, you are missing something.
>>
>> I would guess it is linked to performance consideration. Pseudo
>> random number generation is sometimes put under very heavy stress
>> with billions of numbers generated. It should run extremelly fast,
>> and the algorithms have been designed to have tremendously long periods
>> (things like 2^19937 -1). With such long periods, we can waste 31
>> bits each time we produce 1 bit if it saves some overhead.
> 
> Hence I did not misunderstand that bits are wasted (and I guess will
> always be if performance is dependent on the CPU architecture).
> 
> IIUC the CM code, discarding those bits was unnecessary since the
> discarded ones were never used (and I guess could never be, as they
> are replaced by a non-random sequence of zeroes...).
> Hence I suppose that a bug could exist if the caller assumes that the
> most significant bits of the returned "int" were all zero.
> But nothing of the like can be inferred from the documentation of the
> "standard" method:
>  
> http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#next%28int%29
> 
> 
> 
> Thanks, Luc for constructive comments,
> Gilles
> 
>>
>> best regards,
>> Luc
>>
>>>>>
>>>>> Of course, if some implementation were able to store the bits not
>>>>> requested by the last call to "next(int)", then I'd understand that
>>>>> we must really provide access to a "next(int)" method.
>>>>>
>>>>> Perhaps that the overhead of such bookkeeping is why the practical
>>>>> algorithms chose to store integers rather than bits (?).
>>>>>
>>>>> As you dismissed my request about CM being able to care for a RNG
>>>>> implementation based on a "long", I don't quite understand the
>>>>> caring for a "next(int)" that serves no more purpose (as of current
>>>>> CM).
>>>>>
>>>> This change is
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>>> Phil
>>>>>>
>>>>>> On 12/28/15 10:23 AM, er...@apache.org wrote:
>>>>>>> Repository: commons-math
>>>>>>> Updated Branches:
>>>>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>>>>
>>>>>>>
>>>>>>> MATH-1307
>>>>>>>
>>>>>>> New base class for RNG implementations.
>>>>>>> The source of randomness is provided through the "nextInt()"
>>>>>>> method (to be defined in subclasses).
>>>>>>>
>>>>>>>
>>>>>>> [...]
>>>>>
>>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to