Re: [Math] How fast is fast enough?

Phil Steitz Fri, 05 Feb 2016 19:48:07 -0800

OK, I give up.  I am withdrawing as volunteer chair or member of the
new TLP.


Phil

On 2/5/16 7:23 PM, Gilles wrote:
> Phil,
>
> You talk again about me trying to push forward changes that
> serve no purpose besides "trash performance and correctness".
> This is again baseless FUD to which I've already answered
> (with detailed list of facts which you chose to ignore).
> You declare anything for which you don't have an answer as
> "bogus argument". Why is the reference to multi-threaded
> implementations bogus?  You contradict yourself in pretending
> that CM RNGs could be so good as to make people want to use
> them while refusing to consider whether another design might
> be better suited to such high(er)-performance extensions.
> This particular case is a long shot but if any and all
> discussions are stopped dead, how do you imagine that we can
> go anywhere?
> As you could read from experts, micro-benchmarks are deceiving;
> but you refuse to even consider alternative designs if there
> might be a slight suspicion of degradation.
> How can we ever set up a constructive discussion on how to
> make everybody enjoy this project if the purported chair is
> so bent to protecting existing code rather than nurture a good
> relationship with developers who may sometimes have other ideas?
> I'm trying to improve the code (in a dimension which you can't
> seem to understand unfortunately) but respectfully request
> data points from those users of said code, in order to be
> able to prove that no harm will be done.
> But you seem to prefer to not disclose anything that would
> get us closer to agreement (better design with similar
> performance and room for improvement, to be discussed
> together as a real development team -- Not you requiring,
> as a bad boss, that I bow to your standards for judging
> usefulness).
> This 1% which you throw at me, where does it come from?
> What does 1% mean when the benchmark shows standard deviations
> that vary from 4 to 26% in the "nextInt" case and from 3 to
> 7% in the "nextGaussian" case?
> This 1% looks meaningless without context; context is what I'm
> asking in order to try and establish objectively whether
> another design will have a measurable impact on actual tasks.
> I'm not going to show any "damaged" benchmark because of how
> unwelcome you make me feel every time I wish to talk about
> other aspects of the code.
> There is no development community here.  Only solitary
> coders who share a repository.
>
> Not sorry for the top-post,
> Gilles
>
>
> On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote:
>> On 2/5/16 12:59 PM, Gilles wrote:
>>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote:
>>>> On 2/4/16 3:59 PM, Gilles wrote:
>>>>> Hi.
>>>>>
>>>>> Here is a micro-benchmark report (performed with
>>>>> "PerfTestUtils"):
>>>>> -----
>>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100,
>>>>> time
>>>>> unit: ms)
>>>>>                         name time/call std dev total time ratio
>>>>> cv difference
>>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000
>>>>> 0.26 0.0000e+00
>>>>>    o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941
>>>>> 0.15 -1.2900e+02
>>>>>           o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097
>>>>> 0.04 2.1032e+02
>>>>>          o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239
>>>>> 0.14 5.1945e+02
>>>>>         o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374
>>>>> 0.14 8.1451e+02
>>>>>         o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450
>>>>> 0.06 9.7816e+02
>>>>>         o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763
>>>>> 0.08 1.6602e+03
>>>>>         o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795
>>>>> 0.14 1.7301e+03
>>>>>        o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074
>>>>> 0.16 1.6139e+02
>>>>> -----
>>>>> where "cv" is the ratio of the 3rd to the 2nd column.
>>>>>
>>>>> Questions are:
>>>>> * How meaningful are micro-benchmarks when the timed operation
>>>>> has
>>>>> a very
>>>>>   small duration (wrt e.g. the duration of other machine
>>>>> instructions that
>>>>>   are required to perform them)?
>>>>
>>>> It is harder to get good benchmarks for shorter duration
>>>> activities,
>>>> but not impossible.  One thing that it would be good to do is to
>>>> compare these results with JMH [1].
>>>
>>> I was expecting insights based on the benchmark which I did run.
>>
>> You asked whether or not benchmarks are meaningful when the task
>> being benchmarked is short duration.  I answered that question.
>>>
>>> We have a tool in CM; if it's wrong, we should remove it.
>>> How its results compare with JMH is an interesting question,
>>
>> I will look into this.
>>> I
>>> agree, but I don't have time to make an analysis of benchmarking
>>> tools (on top of what I've been doing since December because
>>> totally innocuous changes in the RNG classes were frowned upon
>>> out of baseless fear).
>>
>> Please cut the hypberbole.
>>>
>>>>> * In a given environment (HW, OS, JVM), is there a lower limit
>>>>> (absolute
>>>>>   duration) below which anything will be deemed good enough?
>>>>
>>>> That depends completely on the application.
>>>
>>> Sorry, I thought that it was obvious: I don't speak of applications
>>> that don't care about performance. :-)
>>>
>>> For those that do, I do not agree with the statement: the question
>>> relates to finding a point below which it is the environment that
>>> overwhelms the other conditions.
>>> A point where there will be _unavoidable_ overhead (transferring
>>> data
>>> from/to memory, JVM book-keeping, ...) and perturbations (context
>>> switches, ...) such that their duration adds a constant time (on
>>> average) that may render most enhancements to an already efficient
>>> algorithm barely noticeable in practice.
>>> Similarly, but in the opposite direction, some language constructs
>>> or design choices might slow down things a bit, but without
>>> endangering any user.
>>>
>>> A problem arises when any enhancement to the design is deemed
>>> harmful because it degrades a micro-benchmark, even though that
>>> benchmark may not reflect any real use-cases.
>>> Then, the real harm is against development.
>>>
>>>>> * Can a library like CM admit a trade-off between ultimate
>>>>> performance and
>>>>>   good design?   IOW, is there an acceptable overhead in exchange
>>>>> for other qualities
>>>>>   (clarity, non-redundancy, extensibility, etc.)?
>>>>
>>>> That is too general a question to be meaningful.   We need to look
>>>> at specific cases.  What exactly are you proposing?
>>>
>>> <rant>
>>> It is quite meaningful even if it refers to general principles.
>>> Those could (should, IMO) be taken into account when managing a
>>> project like CM, on a par with "performance" (whose intrinsic value
>>> is never questioned).
>>> </rant>
>>
>> Rant all you want.  Vague generalities and hyperbole have no value.
>>>
>>> Two specific cases are:
>>> * inheritance vs delegation (a.k.a. composition)
>>> * generics (that could require runtime casts)
>>
>> This is getting closer to meaningful.  Where exactly in the code are
>> you wanting to use something and seeing benchmark damage?
>>>
>>>>> * Does ultimate performance for the base functionality
>>>>> (generation
>>>>> of a
>>>>>   random number) trump any consideration of use-cases that would
>>>>> need an
>>>>>   extension (of the base functionality, such as computation to
>>>>> match another
>>>>>   distribution) that will unavoidably degrades the performance
>>>>> (hence the
>>>>>   micro-benchmark will be completely misleading for those users)?
>>>>
>>>> Again, this is vague and the answer depends on what exactly you
>>>> are
>>>> talking about. Significantly damaging performance of PRNG
>>>> implementations is a bad idea,
>>>
>>> Now, *this* is vague: what do you mean by "significantly"?
>>> That was actually my question in the first place.
>> If you are talking about PRNG performance, I would say a 1% hit is
>> significant.
>>> Referring to the
>>> benchmark above, people who'd know why they require ultimate
>>> performance
>>> should be able to tell what range of numbers they'd find
>>> acceptable in
>>> that table.
>>>
>>> <rant>
>>> Actually my questions are very precise, but the answers would
>>> require
>>> some decent analysis, rather than the usual "bad idea" dismissal.
>>> </rant>
>>>
>>> In the Javadoc of the "random" package, there is information about
>>> performance but no reference as to the benchmarking procedure.
>>
>> It would be great to repeat these using JMH, which is emerging as a
>> de facto standard for java benchmarking.  I will look into this.
>>>
>>> I can consistently observe a totally different behaviour (using
>>> "PerfTestUtils"):
>>>  1. "MersenneTwister" is *always* faster than all of the WELL RNGs;
>>>  2. moreover, the ratio *grows* with each of the longer periods
>>>     members of the WELL family (see the above table).
>>>
>>> This makes me wonder how someone who purports to need "ultimate"
>>> performance can have any objective basis to determine what is good
>>> or bad for his own applications.
>>>
>>>> unless there are actual practical use
>>>> cases you can point to that whatever changes you are proposing
>>>> enable.
>>>
>>> As I've explained in very much details in another thread, I've
>>> reviewed (from a design POV) the RNG code in "random" and IMHO,
>>> there
>>> is room for improvement (cf. above for what I mean by that term).
>>> <rant>
>>> I have some code ready for review but I had to resort to what I
>>> considered sub-optimal design (preemptively renouncing to propose a
>>> "delegation"-based design) solely because of the destructive
>>> community
>>> process that takes place here.[1]
>>> </rant>
>>
>> More vague hyperbole that serves no purpose.  Please focus on actual
>> code or design issues.
>>>
>>> The practical use-cases is anything that needs further
>>> processing of
>>> the numbers produced according to a uniform distribution:
>>
>> Isn't that what the samplers in the distributions package do?  What
>> we need from the PRNG implementations is just blocks of bits.  Since
>> we wanted a pluggable replacement for j.u.Random, we added uniform
>> ints, longs and floats and gaussian floats.  The samplers just need
>> uniform doubles.  The practical use case we need is well-supported
>> in the code we have.  What is missing, exactly?
>>> I agree that
>>> there would be little sense to code that latter part in a "pure" OO
>>> way[2].  And Luc made it indeed quite efficient, I think, in the
>>> various
>>> concrete classes.
>>> What I want to reconsider is how those concrete low-level
>>> algorithms can
>>> be plugged in a higher-level function that just requires a
>>> "source of
>>> randomness", as I'd call a provider of "int" (or "long") values,
>>> where
>>> the high level functionality does not care at all about the
>>> provider's
>>> inner working (a.o. how it's seeded!).
>>
>> This is why many higher-level samplers and other things that require
>> random data inside [math] have a pluggable RandomGenerator.
>>>
>>> A case in point is the sampling of other distributions (namely the
>>> Normal distribution).
>>
>> Or any of the others.  We have a default, inversion-based method
>> that the abstract distribution classes provide and some pretty good
>> specialized implementations within individual distributions.  Most
>> of these just require uniform random doubles as source.
>>
>>>
>>> Here is the benchmark report:
>>> -----
>>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100,
>>> time unit: ms)
>>>                         name time/call std dev total time ratio
>>> cv difference
>>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000
>>> 0.14 0.0000e+00
>>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371
>>> 0.07 1.2892e+04
>>>    o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330
>>> 0.06 1.0393e+04
>>>           o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733
>>> 0.07 1.1360e+04
>>>          o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797
>>> 0.04 1.1513e+04
>>>         o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052
>>> 0.03 1.2125e+04
>>>         o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970
>>> 0.06 1.1928e+04
>>>         o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804
>>> 0.04 1.3931e+04
>>>         o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882
>>> 0.06 1.4118e+04
>>>        o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603
>>> 0.08 1.1049e+04
>>> -----
>>> where the first line is JDK's "nextInt()" and the remaining are
>>> "nextGaussian()".
>>>
>>> The generation time is thus about 4-fold that of "nextInt()".
>>> Thus, degrading the performance of "nextInt()" by 10% would
>>> degrade the
>>> performance of "nextGaussian()" by half that.
>>>
>>> For a performance discussion to be meaningful, I think that we'd
>>> need
>>> to know how that fact would affect, even modestly, any moderately
>>> complex
>>> post-processing of the generated values.
>>>
>>> Another case, for modularity, would be to consider that other
>>> algorithms could
>>> be implemented to provide the required distribution.[3]
>>> In the current design (inheritance-based), that can only be done
>>> by creating
>>> a subclass, even though the core functionality ("nextDouble()") is
>>> not
>>> overridden.
>>>
>>>>> * What are usages of the CM RNGs?
>>>>>   Do those use-cases strictly forbid "loosing" a dozen
>>>>> milliseconds per
>>>>>   million calls?
>>>>
>>>> There are many different use cases.  My own applications use
>>>> them in
>>>> simulations to generate random deviates, to generate random hex
>>>> strings as identifiers and in stochastic algorithms like some
>>>> of our
>>>> internal uses.  The last case is definitely sensitive to PRNG
>>>> performance.
>>>
>>> Thanks for giving examples, but since we talk about performance, I
>>> was hoping for some real flesh, like the relative duration of
>>> numbers
>>> generation (e.g. the total duration of calls to the
>>> "RandomGenerator"
>>> instances wrt to the total duration of the application).
>>>
>>> I don't know if by "last case", you are referring to code that is
>>> inside CM.  I didn't spot anything that makes "heavy" usage of a
>>> RNG (in the sense that generation would count as a sizable part of
>>> the whole processing).
>> monteCarloP in KolmogorovSmirnovTest is one to check.
>>>
>>> As I pointed out many times: if an application is severely
>>> dependent
>>> on the performance of RNG, the user probably will turn to specific
>>> tools (e.g. GPUs? [4]) rather than use CM.
>>
>> That is a bogus argument.  We should make our PRNGs simple and fast
>> so their use can extend to performance-sensitive applications.
>>>
>>> Conversely, using Java might be preferred for its flexibility,
>>> which
>>> is destroyed by a search for ultimate performance (which nobody
>>> seems
>>> able to define reasonably).
>>> Performance is not a goal in itself; it should not be a trophy
>>> which
>>> sits uselessly on a shelf.
>>
>> Nor should "beautiful design" in the eyes of one person.
>>>
>>> My goal is not to deliberately slow things down; it is to allow
>>> some
>>> leeway so that designs which are deemed better (on all counts
>>> except,
>>> perhaps, performance) are given a chance to show their
>>> strengths, in
>>> particular in areas where performance in absolute terms is "good
>>> enough" for all use-cases which CM should care about (hence the
>>> need
>>> of actual data points[5]).
>>
>> I see no reason that we can't have it both ways - good design and
>> good performance. What we have now, modulo maybe some small changes
>> to reduce code duplication, works fine.  If you want to play with
>> 64-bit generators and can find reference implementations and verify
>> that they do in fact perform better, great.  If not, I don't see the
>> point.  You can rant and complain all you want; but I am not going
>> to let us trash performance or correctness of code in the random
>> class or anywhere else just because you think it is somehow "better
>> designed"  unless you can show specific, practical use cases
>> demonstrating the value of the changes.
>>
>> Phil
>>>
>>>
>>> Gilles
>>>
>>> [1] "Is it faster?"
>>>     "No."
>>>     "Then, no."
>>> [2] Although that is in some sense what you indirectly defend by
>>> wanting
>>>     to stick with a meaningless "next(int bits)" method.
>>> [3] http://www.doornik.com/research/ziggurat.pdf
>>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html
>>> [5] Hence the need to agree on a methodology/policy for
>>> benchmarking.
>>>
>>>>
>>>> Phil
>>>>
>>>> [1] http://openjdk.java.net/projects/code-tools/jmh/
>>>>>   IOW, would those users for which such a difference matters use
>>>>> CM at all?
>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Re: [Math] How fast is fast enough?

Reply via email to