Re: [Math] About the refactoring of RNGs

Luc Maisonobe Tue, 29 Dec 2015 09:59:32 -0800

hi all,

Le 29/12/2015 18:32, Phil Steitz a écrit :
> 
> 
>> On Dec 29, 2015, at 8:41 AM, Gilles <gil...@harfang.homelinux.org>
>> wrote:
>> 
>>> On Mon, 28 Dec 2015 20:33:24 -0700, Phil Steitz wrote:
>>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote: The
>>>>> significant refactoring to eliminate the (standard)
>>>>> next(int) included in these changes has the possibility of
>>>>> introducing subtle bugs or performance issues.  Please run
>>>>> some tests to verify that the same sequences are generated by
>>>>> the 3_X code
>>>> 
>>>> IIUC our unit tests of the RNGs, this is covered.
>>> 
>>> No.  Not sufficient.  What you have done is changed the internal 
>>> implementation of all of the Bitstream generators.  I am not 
>>> convinced that you have not broken anything.  I will have to do
>>> the testing myself.  I see no point in fiddling with the
>>> internals of this code that has had a lot of eyeballs and testing
>>> on it.  I was not personally looking forward to researching the
>>> algorithms to make sure any invariants may be broken by these
>>> changes; but I am now going to have to do this.  I have to ask
>>> why.  Please at some point read [1], especially the sections on
>>> "Avoid Flexibility Syndrom" and "Value Laziness as a Virtue."
>>> Gratuitous refactoring drains community energy.
>> 
>> It seems that again you don't read what I write.[1] Hence the above
>> paragraph is spreading F -> "I am not convinced that you have not
>> broken anything." U -> "I will have to do the testing myself." D ->
>> "I see no point in fiddling [...]" .
>> 
>> Or maybe I have to rant about email communication. Please reread
>> the thread to fully appreciate that you could have shared your
>> doubts among the opinions which you gave about some of my
>> questions. When the reply answers to only some of the direct
>> questions, the OP is legitimately tempted to assume that the
>> non-comment is akin to "I don't care" (as in "left to the judgment
>> of the one who does the job").
>> 
>> As is often the case, you can dislike my ideas of improvements
>> (ideas that come on the basis of the information provided by what
>> the code does) but I don't appreciate the use of the word
>> "gratuitous", given that I clearly stated the purpose of making
>> explicit what the code actually does (that is, *generating* 32
>> random bits and not the number of bits passed as a parameter to
>> "next(int)"). So, the code is now self-documenting; it is a small
>> change, to be sure, but hardly gratuitous.
> 
> I did not respond to the original question about eliminating
> next(int) because I was not (still am not) expert in the internals of
> how the generators work and how the generated ints-of-bits are
> transformed to make doubles, gaussians, etc.  I was hoping someone
> who was expert would chime in and say either "don't do that" or
> "that's ok".

I intend to look at these changes, but concentrated on the last
things I wanted to be included in 3.6 (another mail will come
about this later this evening).

I am not an expert either on PRNG, but did work on some of the
implementations we have, so maybe I will be able to give an
opinion.

One thing we coud do is set up a branch for such an experimentation.
Branches in Git are simple. The work could then be merged back
to master once it has stabilized. It would also be easy for any
developer to switch back and forth between this feature branch
and master to get convinced there are no hidden problems.

> That did not happen before you committed the changes.
> That obligates us to review them.  The tests cover only nextInt so
> we need to convince ourselves that the transformations (which have
> all changed) are still correct.  That is in my mind just extra work
> generated for the community.  I would rather see our cycles go toward
> getting 3.6 out.

So let the dust settle down for a few days, I would really like
to finalize 3.6 too.

best regards,
Luc

> 
> I stand by my assertion that the rebasing to eliminate next(bits)
> adds nothing. If you can demonstrate via benchmarks this it is
> faster, I will retract that statement.
> 
> I know we disagree fundamentally on the priorities of [math].  To me,
> stability, correctness and ease of use (including ease of upgrades)
> are paramount.  That means you don't make implementation changes to
> hardened code without a good reason and you limit incompatible change
> to what is necessary to deliver practical benefit to users in new
> features or bug fixes.  I was +1 to dropping AbstractRandomGenerator
> and just leaving the bitstream generators in place.  That is a
> simplifying change.
>> 
>> I actually did go some way towards "Avoiding [a] Flexibility
>> Syndrom" that *was* present in a seeming flexibility of
>> "next(int)". This method was indeed letting users assume that it is
>> able to generate _less than 32 bits_ whereas the randomness
>> generators implemented in CM *always* generate exactly 32
>> (hopefully random) bits.
>> 
>> I stand guilty on the last count as I do indeed not always "value 
>> laziness as a virtue": I indeed attribute more value to design (and
>> code) aesthetics than to laziness. IMO, ugly code is often an early
>> hint that the design is broken, even if the functionality may not
>> be (yet?). [But note again that this change was not "just because
>> of aesthetic reasons"; in the wording of your reference, I think
>> that "just because" is important.]
>> 
>> In a real community,[2] you'd value that some people are willing
>> to tackle different tasks than you do. Rather than stifling any
>> change on the sole ground that it is a change, it would be less
>> "draining" on the community if reviewers would only voice concrete
>> concerns about the resulting code, and not just assume that the
>> coder's motivation is pointless.[3]
>> 
>> I understand that something can more likely become broken when it
>> is being touched rather than when left alone.  Of course, I do! But
>> with the help of an extensive and sensitive test suite, I felt I 
>> could give this small refactoring a try, being fairly confident
>> that mistakes would not go unnoticed. Your doubting that the test
>> suite could let this happen should question our assuming that it
>> could assess the correct behaviour of the previous code.
>> Alternately, all of the numerous tests passing should mean that
>> the new code is not buggier; and visual inspection can assess that
>> it cannot be slower.[4]
>> 
>> My last thought about how standard the method "next(int)" is, I let
>> it be conveyed by what is not mentioned in the following
>> unquestionably standard source: 
>> http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html
>>
>>
>> [I could have made additional comments on how various suggestions in
>> http://www.apachecon.com/eu2007/materials/ac2006.2.pdf are either
>> not applied in the CM project, or not taken "with a grain of salt"
>> whenever it suits you.  But this post has already drained me far 
>> too much.]
>> 
>> Gilles
>> 
>> [1] I can make mistakes (and I did, as told previously) in
>> "fiddling" with the code, but that can be spotted relatively easily
>> by inspecting three one-line changes commits and their few lines
>> consequences on the generic methods where "nextInt()" replaced
>> "next(int)", mutatis mutandis, in a single and quite small class. 
>> That is a far cry from "researching the algorithms". [2] To be
>> contrasted with the "common good" for which your stand against 
>> changes may be the right one for many (conservative) CM users. [3]
>> I can assure you that it is quite draining to have to defend any
>> and every change, especially when they are obviously towards
>> greater "standardness", even when they would occur in a new major
>> version, against an opposition that is only based on a conservatism
>> argument: When the code can be made more standard or more elegant
>> or more flexible or more state-of-the-art, it is deemed not
>> necessary because "we've always done it that way". This is a
>> general remark about other discussions. Here I totally agree that I
>> favoured non-standard Java in trashing "next(int bits)". In other 
>> languages, that signature is _not_ "standard". [4] In order to
>> defend a small and technically innocuous modification, I have to
>> myself do some "researching". Mind you, it's "just" reading from
>> web pages edited by people who AFAICT seem to know what they are
>> talking about (but I may be missing something, again). Among other
>> things which I've shared in this post, it appears that a benchmark
>> including the WELL (1204a) RNG indicates that it is 5 to 10 times 
>> slower than alternative RNGs. In this light, if you are so worried
>> about performance issues induced by a functionally no-op change,
>> then you probably should not use CM for RNG.
>> 
>>>>> and the refactored code and benchmarks to show there is no
>>>>> loss in performance.
>>>> 
>>>> Given that there are exactly two operations _less_ (a
>>>> subtraction and a shift), it would be surprising.
>>>> 
>>>>> It would also be good to have some additional review of this
>>>>> code by PRNG experts.
>>>> 
>>>> The "nextInt()" code is exactly the same as the "next(int)"
>>>> modulo the little change above (in the last line of the
>>>> "nextInt/next" code).
>>>> 
>>>> That change in "nextInt/next" implied similarly tiny recodings
>>>> in the generic methods "nextDouble()", "nextBoolean()", ...
>>>> which, apart from that, were copied from
>>>> "BitsStreamGenerator".
>>>> 
>>>> [However tiny a change, I had made a mistake... and dozens of
>>>> tests started to fail. Found the typo and all was quiet
>>>> again...]
>>>> 
>>>> About "next(int)" being standard, it would be interesting to
>>>> know what that means.
>>> 
>>> Have a look at the source code for the JDK generators, for
>>> example.
>>>> As I indicated quite clearly in one of my first posts about
>>>> this refactoring 1. all the CM implementations generate random
>>>> bits in batches of 32 bits, and 2. before returning, the
>>>> "next(int bits)" method was truncating the generated "int": 
>>>> return x >>> (32 - bits);
>>>> 
>>>> In all implementations, that was the only place where the
>>>> "bits" parameter was used, from which I concluded that the
>>>> randomness provider does not care if the request was to create
>>>> less than 32 random bits. Taking "nextBoolean()" for example,
>>>> it looks like a waste of 31 bits (or am I missing something?).
>>> 
>>> Quite possibly, yes, you are missing something.
>>>> 
>>>> Of course, if some implementation were able to store the bits
>>>> not requested by the last call to "next(int)", then I'd
>>>> understand that we must really provide access to a "next(int)"
>>>> method.
>>>> 
>>>> Perhaps that the overhead of such bookkeeping is why the
>>>> practical algorithms chose to store integers rather than bits
>>>> (?).
>>>> 
>>>> As you dismissed my request about CM being able to care for a
>>>> RNG implementation based on a "long", I don't quite understand
>>>> the caring for a "next(int)" that serves no more purpose (as of
>>>> current CM).
>>> This change is
>>>> 
>>>> Gilles
>>>> 
>>>> 
>>>>> Phil
>>>>> 
>>>>>> On 12/28/15 10:23 AM, er...@apache.org wrote: Repository:
>>>>>> commons-math Updated Branches: refs/heads/master 7b62d0155
>>>>>> -> 81585a3c4
>>>>>> 
>>>>>> 
>>>>>> MATH-1307
>>>>>> 
>>>>>> New base class for RNG implementations. The source of
>>>>>> randomness is provided through the "nextInt()" method (to
>>>>>> be defined in subclasses).
>>>>>> 
>>>>>> 
>>>>>> [...]
>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>> 
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
>> For additional commands, e-mail: dev-h...@commons.apache.org
>> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] About the refactoring of RNGs

Reply via email to