OK, I give up. I am withdrawing as volunteer chair or member of the new TLP.
Phil On 2/5/16 7:23 PM, Gilles wrote: > Phil, > > You talk again about me trying to push forward changes that > serve no purpose besides "trash performance and correctness". > This is again baseless FUD to which I've already answered > (with detailed list of facts which you chose to ignore). > You declare anything for which you don't have an answer as > "bogus argument". Why is the reference to multi-threaded > implementations bogus? You contradict yourself in pretending > that CM RNGs could be so good as to make people want to use > them while refusing to consider whether another design might > be better suited to such high(er)-performance extensions. > This particular case is a long shot but if any and all > discussions are stopped dead, how do you imagine that we can > go anywhere? > As you could read from experts, micro-benchmarks are deceiving; > but you refuse to even consider alternative designs if there > might be a slight suspicion of degradation. > How can we ever set up a constructive discussion on how to > make everybody enjoy this project if the purported chair is > so bent to protecting existing code rather than nurture a good > relationship with developers who may sometimes have other ideas? > I'm trying to improve the code (in a dimension which you can't > seem to understand unfortunately) but respectfully request > data points from those users of said code, in order to be > able to prove that no harm will be done. > But you seem to prefer to not disclose anything that would > get us closer to agreement (better design with similar > performance and room for improvement, to be discussed > together as a real development team -- Not you requiring, > as a bad boss, that I bow to your standards for judging > usefulness). > This 1% which you throw at me, where does it come from? > What does 1% mean when the benchmark shows standard deviations > that vary from 4 to 26% in the "nextInt" case and from 3 to > 7% in the "nextGaussian" case? > This 1% looks meaningless without context; context is what I'm > asking in order to try and establish objectively whether > another design will have a measurable impact on actual tasks. > I'm not going to show any "damaged" benchmark because of how > unwelcome you make me feel every time I wish to talk about > other aspects of the code. > There is no development community here. Only solitary > coders who share a repository. > > Not sorry for the top-post, > Gilles > > > On Fri, 5 Feb 2016 17:07:16 -0700, Phil Steitz wrote: >> On 2/5/16 12:59 PM, Gilles wrote: >>> On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote: >>>> On 2/4/16 3:59 PM, Gilles wrote: >>>>> Hi. >>>>> >>>>> Here is a micro-benchmark report (performed with >>>>> "PerfTestUtils"): >>>>> ----- >>>>> nextInt() (calls per timed block: 2000000, timed blocks: 100, >>>>> time >>>>> unit: ms) >>>>> name time/call std dev total time ratio >>>>> cv difference >>>>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000 >>>>> 0.26 0.0000e+00 >>>>> o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941 >>>>> 0.15 -1.2900e+02 >>>>> o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097 >>>>> 0.04 2.1032e+02 >>>>> o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239 >>>>> 0.14 5.1945e+02 >>>>> o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374 >>>>> 0.14 8.1451e+02 >>>>> o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450 >>>>> 0.06 9.7816e+02 >>>>> o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763 >>>>> 0.08 1.6602e+03 >>>>> o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795 >>>>> 0.14 1.7301e+03 >>>>> o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074 >>>>> 0.16 1.6139e+02 >>>>> ----- >>>>> where "cv" is the ratio of the 3rd to the 2nd column. >>>>> >>>>> Questions are: >>>>> * How meaningful are micro-benchmarks when the timed operation >>>>> has >>>>> a very >>>>> small duration (wrt e.g. the duration of other machine >>>>> instructions that >>>>> are required to perform them)? >>>> >>>> It is harder to get good benchmarks for shorter duration >>>> activities, >>>> but not impossible. One thing that it would be good to do is to >>>> compare these results with JMH [1]. >>> >>> I was expecting insights based on the benchmark which I did run. >> >> You asked whether or not benchmarks are meaningful when the task >> being benchmarked is short duration. I answered that question. >>> >>> We have a tool in CM; if it's wrong, we should remove it. >>> How its results compare with JMH is an interesting question, >> >> I will look into this. >>> I >>> agree, but I don't have time to make an analysis of benchmarking >>> tools (on top of what I've been doing since December because >>> totally innocuous changes in the RNG classes were frowned upon >>> out of baseless fear). >> >> Please cut the hypberbole. >>> >>>>> * In a given environment (HW, OS, JVM), is there a lower limit >>>>> (absolute >>>>> duration) below which anything will be deemed good enough? >>>> >>>> That depends completely on the application. >>> >>> Sorry, I thought that it was obvious: I don't speak of applications >>> that don't care about performance. :-) >>> >>> For those that do, I do not agree with the statement: the question >>> relates to finding a point below which it is the environment that >>> overwhelms the other conditions. >>> A point where there will be _unavoidable_ overhead (transferring >>> data >>> from/to memory, JVM book-keeping, ...) and perturbations (context >>> switches, ...) such that their duration adds a constant time (on >>> average) that may render most enhancements to an already efficient >>> algorithm barely noticeable in practice. >>> Similarly, but in the opposite direction, some language constructs >>> or design choices might slow down things a bit, but without >>> endangering any user. >>> >>> A problem arises when any enhancement to the design is deemed >>> harmful because it degrades a micro-benchmark, even though that >>> benchmark may not reflect any real use-cases. >>> Then, the real harm is against development. >>> >>>>> * Can a library like CM admit a trade-off between ultimate >>>>> performance and >>>>> good design? IOW, is there an acceptable overhead in exchange >>>>> for other qualities >>>>> (clarity, non-redundancy, extensibility, etc.)? >>>> >>>> That is too general a question to be meaningful. We need to look >>>> at specific cases. What exactly are you proposing? >>> >>> <rant> >>> It is quite meaningful even if it refers to general principles. >>> Those could (should, IMO) be taken into account when managing a >>> project like CM, on a par with "performance" (whose intrinsic value >>> is never questioned). >>> </rant> >> >> Rant all you want. Vague generalities and hyperbole have no value. >>> >>> Two specific cases are: >>> * inheritance vs delegation (a.k.a. composition) >>> * generics (that could require runtime casts) >> >> This is getting closer to meaningful. Where exactly in the code are >> you wanting to use something and seeing benchmark damage? >>> >>>>> * Does ultimate performance for the base functionality >>>>> (generation >>>>> of a >>>>> random number) trump any consideration of use-cases that would >>>>> need an >>>>> extension (of the base functionality, such as computation to >>>>> match another >>>>> distribution) that will unavoidably degrades the performance >>>>> (hence the >>>>> micro-benchmark will be completely misleading for those users)? >>>> >>>> Again, this is vague and the answer depends on what exactly you >>>> are >>>> talking about. Significantly damaging performance of PRNG >>>> implementations is a bad idea, >>> >>> Now, *this* is vague: what do you mean by "significantly"? >>> That was actually my question in the first place. >> If you are talking about PRNG performance, I would say a 1% hit is >> significant. >>> Referring to the >>> benchmark above, people who'd know why they require ultimate >>> performance >>> should be able to tell what range of numbers they'd find >>> acceptable in >>> that table. >>> >>> <rant> >>> Actually my questions are very precise, but the answers would >>> require >>> some decent analysis, rather than the usual "bad idea" dismissal. >>> </rant> >>> >>> In the Javadoc of the "random" package, there is information about >>> performance but no reference as to the benchmarking procedure. >> >> It would be great to repeat these using JMH, which is emerging as a >> de facto standard for java benchmarking. I will look into this. >>> >>> I can consistently observe a totally different behaviour (using >>> "PerfTestUtils"): >>> 1. "MersenneTwister" is *always* faster than all of the WELL RNGs; >>> 2. moreover, the ratio *grows* with each of the longer periods >>> members of the WELL family (see the above table). >>> >>> This makes me wonder how someone who purports to need "ultimate" >>> performance can have any objective basis to determine what is good >>> or bad for his own applications. >>> >>>> unless there are actual practical use >>>> cases you can point to that whatever changes you are proposing >>>> enable. >>> >>> As I've explained in very much details in another thread, I've >>> reviewed (from a design POV) the RNG code in "random" and IMHO, >>> there >>> is room for improvement (cf. above for what I mean by that term). >>> <rant> >>> I have some code ready for review but I had to resort to what I >>> considered sub-optimal design (preemptively renouncing to propose a >>> "delegation"-based design) solely because of the destructive >>> community >>> process that takes place here.[1] >>> </rant> >> >> More vague hyperbole that serves no purpose. Please focus on actual >> code or design issues. >>> >>> The practical use-cases is anything that needs further >>> processing of >>> the numbers produced according to a uniform distribution: >> >> Isn't that what the samplers in the distributions package do? What >> we need from the PRNG implementations is just blocks of bits. Since >> we wanted a pluggable replacement for j.u.Random, we added uniform >> ints, longs and floats and gaussian floats. The samplers just need >> uniform doubles. The practical use case we need is well-supported >> in the code we have. What is missing, exactly? >>> I agree that >>> there would be little sense to code that latter part in a "pure" OO >>> way[2]. And Luc made it indeed quite efficient, I think, in the >>> various >>> concrete classes. >>> What I want to reconsider is how those concrete low-level >>> algorithms can >>> be plugged in a higher-level function that just requires a >>> "source of >>> randomness", as I'd call a provider of "int" (or "long") values, >>> where >>> the high level functionality does not care at all about the >>> provider's >>> inner working (a.o. how it's seeded!). >> >> This is why many higher-level samplers and other things that require >> random data inside [math] have a pluggable RandomGenerator. >>> >>> A case in point is the sampling of other distributions (namely the >>> Normal distribution). >> >> Or any of the others. We have a default, inversion-based method >> that the abstract distribution classes provide and some pretty good >> specialized implementations within individual distributions. Most >> of these just require uniform random doubles as source. >> >>> >>> Here is the benchmark report: >>> ----- >>> nextGaussian() (calls per timed block: 2000000, timed blocks: 100, >>> time unit: ms) >>> name time/call std dev total time ratio >>> cv difference >>> o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000 >>> 0.14 0.0000e+00 >>> o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371 >>> 0.07 1.2892e+04 >>> o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330 >>> 0.06 1.0393e+04 >>> o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733 >>> 0.07 1.1360e+04 >>> o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797 >>> 0.04 1.1513e+04 >>> o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052 >>> 0.03 1.2125e+04 >>> o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970 >>> 0.06 1.1928e+04 >>> o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804 >>> 0.04 1.3931e+04 >>> o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882 >>> 0.06 1.4118e+04 >>> o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603 >>> 0.08 1.1049e+04 >>> ----- >>> where the first line is JDK's "nextInt()" and the remaining are >>> "nextGaussian()". >>> >>> The generation time is thus about 4-fold that of "nextInt()". >>> Thus, degrading the performance of "nextInt()" by 10% would >>> degrade the >>> performance of "nextGaussian()" by half that. >>> >>> For a performance discussion to be meaningful, I think that we'd >>> need >>> to know how that fact would affect, even modestly, any moderately >>> complex >>> post-processing of the generated values. >>> >>> Another case, for modularity, would be to consider that other >>> algorithms could >>> be implemented to provide the required distribution.[3] >>> In the current design (inheritance-based), that can only be done >>> by creating >>> a subclass, even though the core functionality ("nextDouble()") is >>> not >>> overridden. >>> >>>>> * What are usages of the CM RNGs? >>>>> Do those use-cases strictly forbid "loosing" a dozen >>>>> milliseconds per >>>>> million calls? >>>> >>>> There are many different use cases. My own applications use >>>> them in >>>> simulations to generate random deviates, to generate random hex >>>> strings as identifiers and in stochastic algorithms like some >>>> of our >>>> internal uses. The last case is definitely sensitive to PRNG >>>> performance. >>> >>> Thanks for giving examples, but since we talk about performance, I >>> was hoping for some real flesh, like the relative duration of >>> numbers >>> generation (e.g. the total duration of calls to the >>> "RandomGenerator" >>> instances wrt to the total duration of the application). >>> >>> I don't know if by "last case", you are referring to code that is >>> inside CM. I didn't spot anything that makes "heavy" usage of a >>> RNG (in the sense that generation would count as a sizable part of >>> the whole processing). >> monteCarloP in KolmogorovSmirnovTest is one to check. >>> >>> As I pointed out many times: if an application is severely >>> dependent >>> on the performance of RNG, the user probably will turn to specific >>> tools (e.g. GPUs? [4]) rather than use CM. >> >> That is a bogus argument. We should make our PRNGs simple and fast >> so their use can extend to performance-sensitive applications. >>> >>> Conversely, using Java might be preferred for its flexibility, >>> which >>> is destroyed by a search for ultimate performance (which nobody >>> seems >>> able to define reasonably). >>> Performance is not a goal in itself; it should not be a trophy >>> which >>> sits uselessly on a shelf. >> >> Nor should "beautiful design" in the eyes of one person. >>> >>> My goal is not to deliberately slow things down; it is to allow >>> some >>> leeway so that designs which are deemed better (on all counts >>> except, >>> perhaps, performance) are given a chance to show their >>> strengths, in >>> particular in areas where performance in absolute terms is "good >>> enough" for all use-cases which CM should care about (hence the >>> need >>> of actual data points[5]). >> >> I see no reason that we can't have it both ways - good design and >> good performance. What we have now, modulo maybe some small changes >> to reduce code duplication, works fine. If you want to play with >> 64-bit generators and can find reference implementations and verify >> that they do in fact perform better, great. If not, I don't see the >> point. You can rant and complain all you want; but I am not going >> to let us trash performance or correctness of code in the random >> class or anywhere else just because you think it is somehow "better >> designed" unless you can show specific, practical use cases >> demonstrating the value of the changes. >> >> Phil >>> >>> >>> Gilles >>> >>> [1] "Is it faster?" >>> "No." >>> "Then, no." >>> [2] Although that is in some sense what you indirectly defend by >>> wanting >>> to stick with a meaningless "next(int bits)" method. >>> [3] http://www.doornik.com/research/ziggurat.pdf >>> [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html >>> [5] Hence the need to agree on a methodology/policy for >>> benchmarking. >>> >>>> >>>> Phil >>>> >>>> [1] http://openjdk.java.net/projects/code-tools/jmh/ >>>>> IOW, would those users for which such a difference matters use >>>>> CM at all? >>>> >>>>> >>>>> >>>>> Thanks, >>>>> Gilles > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org