On Fri, Feb 5, 2016 at 6:32 PM, Gilles <gil...@harfang.homelinux.org> wrote:
> On Sat, 6 Feb 2016 01:53:54 +0000, Niall Pemberton wrote: > >> Are you not concerned about forming a TLP of 7 around Math when one of the >> seven is clearly not a happy camper? >> > > Of course I am. > Besides stopping to annoy non-Math Commons developers, I've asked > that we clarify the positive objectives of the move, as far as > development would be concerned. > No one except Thomas seemed interested. > > Gilles > > >> Niall >> >> On Sat, Feb 6, 2016 at 12:07 AM, Phil Steitz <phil.ste...@gmail.com> >> wrote: >> >> On 2/5/16 12:59 PM, Gilles wrote: >>> > On Fri, 5 Feb 2016 06:50:10 -0700, Phil Steitz wrote: >>> >> On 2/4/16 3:59 PM, Gilles wrote: >>> >>> Hi. >>> >>> >>> >>> Here is a micro-benchmark report (performed with "PerfTestUtils"): >>> >>> ----- >>> >>> nextInt() (calls per timed block: 2000000, timed blocks: 100, time >>> >>> unit: ms) >>> >>> name time/call std dev total time ratio >>> >>> cv difference >>> >>> o.a.c.m.r.JDKRandomGenerator 1.088e-05 2.8e-06 2.1761e+03 1.000 >>> >>> 0.26 0.0000e+00 >>> >>> o.a.c.m.r.MersenneTwister 1.024e-05 1.5e-06 2.0471e+03 0.941 >>> >>> 0.15 -1.2900e+02 >>> >>> o.a.c.m.r.Well512a 1.193e-05 4.4e-07 2.3864e+03 1.097 >>> >>> 0.04 2.1032e+02 >>> >>> o.a.c.m.r.Well1024a 1.348e-05 1.9e-06 2.6955e+03 1.239 >>> >>> 0.14 5.1945e+02 >>> >>> o.a.c.m.r.Well19937a 1.495e-05 2.1e-06 2.9906e+03 1.374 >>> >>> 0.14 8.1451e+02 >>> >>> o.a.c.m.r.Well19937c 1.577e-05 8.8e-07 3.1542e+03 1.450 >>> >>> 0.06 9.7816e+02 >>> >>> o.a.c.m.r.Well44497a 1.918e-05 1.4e-06 3.8363e+03 1.763 >>> >>> 0.08 1.6602e+03 >>> >>> o.a.c.m.r.Well44497b 1.953e-05 2.8e-06 3.9062e+03 1.795 >>> >>> 0.14 1.7301e+03 >>> >>> o.a.c.m.r.ISAACRandom 1.169e-05 1.9e-06 2.3375e+03 1.074 >>> >>> 0.16 1.6139e+02 >>> >>> ----- >>> >>> where "cv" is the ratio of the 3rd to the 2nd column. >>> >>> >>> >>> Questions are: >>> >>> * How meaningful are micro-benchmarks when the timed operation has >>> >>> a very >>> >>> small duration (wrt e.g. the duration of other machine >>> >>> instructions that >>> >>> are required to perform them)? >>> >> >>> >> It is harder to get good benchmarks for shorter duration activities, >>> >> but not impossible. One thing that it would be good to do is to >>> >> compare these results with JMH [1]. >>> >> We use JMH over at Log4j and it does a fine job. See the log4j-perf module. Gary > > >>> > I was expecting insights based on the benchmark which I did run. >>> >>> You asked whether or not benchmarks are meaningful when the task >>> being benchmarked is short duration. I answered that question. >>> > >>> > We have a tool in CM; if it's wrong, we should remove it. >>> > How its results compare with JMH is an interesting question, >>> >>> I will look into this. >>> > I >>> > agree, but I don't have time to make an analysis of benchmarking >>> > tools (on top of what I've been doing since December because >>> > totally innocuous changes in the RNG classes were frowned upon >>> > out of baseless fear). >>> >>> Please cut the hypberbole. >>> > >>> >>> * In a given environment (HW, OS, JVM), is there a lower limit >>> >>> (absolute >>> >>> duration) below which anything will be deemed good enough? >>> >> >>> >> That depends completely on the application. >>> > >>> > Sorry, I thought that it was obvious: I don't speak of applications >>> > that don't care about performance. :-) >>> > >>> > For those that do, I do not agree with the statement: the question >>> > relates to finding a point below which it is the environment that >>> > overwhelms the other conditions. >>> > A point where there will be _unavoidable_ overhead (transferring data >>> > from/to memory, JVM book-keeping, ...) and perturbations (context >>> > switches, ...) such that their duration adds a constant time (on >>> > average) that may render most enhancements to an already efficient >>> > algorithm barely noticeable in practice. >>> > Similarly, but in the opposite direction, some language constructs >>> > or design choices might slow down things a bit, but without >>> > endangering any user. >>> > >>> > A problem arises when any enhancement to the design is deemed >>> > harmful because it degrades a micro-benchmark, even though that >>> > benchmark may not reflect any real use-cases. >>> > Then, the real harm is against development. >>> > >>> >>> * Can a library like CM admit a trade-off between ultimate >>> >>> performance and >>> >>> good design? IOW, is there an acceptable overhead in exchange >>> >>> for other qualities >>> >>> (clarity, non-redundancy, extensibility, etc.)? >>> >> >>> >> That is too general a question to be meaningful. We need to look >>> >> at specific cases. What exactly are you proposing? >>> > >>> > <rant> >>> > It is quite meaningful even if it refers to general principles. >>> > Those could (should, IMO) be taken into account when managing a >>> > project like CM, on a par with "performance" (whose intrinsic value >>> > is never questioned). >>> > </rant> >>> >>> Rant all you want. Vague generalities and hyperbole have no value. >>> > >>> > Two specific cases are: >>> > * inheritance vs delegation (a.k.a. composition) >>> > * generics (that could require runtime casts) >>> >>> This is getting closer to meaningful. Where exactly in the code are >>> you wanting to use something and seeing benchmark damage? >>> > >>> >>> * Does ultimate performance for the base functionality (generation >>> >>> of a >>> >>> random number) trump any consideration of use-cases that would >>> >>> need an >>> >>> extension (of the base functionality, such as computation to >>> >>> match another >>> >>> distribution) that will unavoidably degrades the performance >>> >>> (hence the >>> >>> micro-benchmark will be completely misleading for those users)? >>> >> >>> >> Again, this is vague and the answer depends on what exactly you are >>> >> talking about. Significantly damaging performance of PRNG >>> >> implementations is a bad idea, >>> > >>> > Now, *this* is vague: what do you mean by "significantly"? >>> > That was actually my question in the first place. >>> If you are talking about PRNG performance, I would say a 1% hit is >>> significant. >>> > Referring to the >>> > benchmark above, people who'd know why they require ultimate >>> > performance >>> > should be able to tell what range of numbers they'd find >>> > acceptable in >>> > that table. >>> > >>> > <rant> >>> > Actually my questions are very precise, but the answers would require >>> > some decent analysis, rather than the usual "bad idea" dismissal. >>> > </rant> >>> > >>> > In the Javadoc of the "random" package, there is information about >>> > performance but no reference as to the benchmarking procedure. >>> >>> It would be great to repeat these using JMH, which is emerging as a >>> de facto standard for java benchmarking. I will look into this. >>> > >>> > I can consistently observe a totally different behaviour (using >>> > "PerfTestUtils"): >>> > 1. "MersenneTwister" is *always* faster than all of the WELL RNGs; >>> > 2. moreover, the ratio *grows* with each of the longer periods >>> > members of the WELL family (see the above table). >>> > >>> > This makes me wonder how someone who purports to need "ultimate" >>> > performance can have any objective basis to determine what is good >>> > or bad for his own applications. >>> > >>> >> unless there are actual practical use >>> >> cases you can point to that whatever changes you are proposing >>> >> enable. >>> > >>> > As I've explained in very much details in another thread, I've >>> > reviewed (from a design POV) the RNG code in "random" and IMHO, there >>> > is room for improvement (cf. above for what I mean by that term). >>> > <rant> >>> > I have some code ready for review but I had to resort to what I >>> > considered sub-optimal design (preemptively renouncing to propose a >>> > "delegation"-based design) solely because of the destructive >>> > community >>> > process that takes place here.[1] >>> > </rant> >>> >>> More vague hyperbole that serves no purpose. Please focus on actual >>> code or design issues. >>> > >>> > The practical use-cases is anything that needs further processing of >>> > the numbers produced according to a uniform distribution: >>> >>> Isn't that what the samplers in the distributions package do? What >>> we need from the PRNG implementations is just blocks of bits. Since >>> we wanted a pluggable replacement for j.u.Random, we added uniform >>> ints, longs and floats and gaussian floats. The samplers just need >>> uniform doubles. The practical use case we need is well-supported >>> in the code we have. What is missing, exactly? >>> > I agree that >>> > there would be little sense to code that latter part in a "pure" OO >>> > way[2]. And Luc made it indeed quite efficient, I think, in the >>> > various >>> > concrete classes. >>> > What I want to reconsider is how those concrete low-level >>> > algorithms can >>> > be plugged in a higher-level function that just requires a "source of >>> > randomness", as I'd call a provider of "int" (or "long") values, >>> > where >>> > the high level functionality does not care at all about the >>> > provider's >>> > inner working (a.o. how it's seeded!). >>> >>> This is why many higher-level samplers and other things that require >>> random data inside [math] have a pluggable RandomGenerator. >>> > >>> > A case in point is the sampling of other distributions (namely the >>> > Normal distribution). >>> >>> Or any of the others. We have a default, inversion-based method >>> that the abstract distribution classes provide and some pretty good >>> specialized implementations within individual distributions. Most >>> of these just require uniform random doubles as source. >>> >>> > >>> > Here is the benchmark report: >>> > ----- >>> > nextGaussian() (calls per timed block: 2000000, timed blocks: 100, >>> > time unit: ms) >>> > name time/call std dev total time ratio >>> > cv difference >>> > o.a.c.m.r.JDKRandomGenerator 1.200e-05 1.7e-06 2.4001e+03 1.000 >>> > 0.14 0.0000e+00 >>> > o.a.c.m.r.JDKRandomGenerator 7.646e-05 5.1e-06 1.5292e+04 6.371 >>> > 0.07 1.2892e+04 >>> > o.a.c.m.r.MersenneTwister 6.396e-05 3.6e-06 1.2793e+04 5.330 >>> > 0.06 1.0393e+04 >>> > o.a.c.m.r.Well512a 6.880e-05 5.0e-06 1.3760e+04 5.733 >>> > 0.07 1.1360e+04 >>> > o.a.c.m.r.Well1024a 6.956e-05 3.0e-06 1.3913e+04 5.797 >>> > 0.04 1.1513e+04 >>> > o.a.c.m.r.Well19937a 7.262e-05 2.0e-06 1.4525e+04 6.052 >>> > 0.03 1.2125e+04 >>> > o.a.c.m.r.Well19937c 7.164e-05 4.3e-06 1.4329e+04 5.970 >>> > 0.06 1.1928e+04 >>> > o.a.c.m.r.Well44497a 8.166e-05 3.2e-06 1.6332e+04 6.804 >>> > 0.04 1.3931e+04 >>> > o.a.c.m.r.Well44497b 8.259e-05 4.6e-06 1.6518e+04 6.882 >>> > 0.06 1.4118e+04 >>> > o.a.c.m.r.ISAACRandom 6.724e-05 5.4e-06 1.3449e+04 5.603 >>> > 0.08 1.1049e+04 >>> > ----- >>> > where the first line is JDK's "nextInt()" and the remaining are >>> > "nextGaussian()". >>> > >>> > The generation time is thus about 4-fold that of "nextInt()". >>> > Thus, degrading the performance of "nextInt()" by 10% would >>> > degrade the >>> > performance of "nextGaussian()" by half that. >>> > >>> > For a performance discussion to be meaningful, I think that we'd need >>> > to know how that fact would affect, even modestly, any moderately >>> > complex >>> > post-processing of the generated values. >>> > >>> > Another case, for modularity, would be to consider that other >>> > algorithms could >>> > be implemented to provide the required distribution.[3] >>> > In the current design (inheritance-based), that can only be done >>> > by creating >>> > a subclass, even though the core functionality ("nextDouble()") is >>> > not >>> > overridden. >>> > >>> >>> * What are usages of the CM RNGs? >>> >>> Do those use-cases strictly forbid "loosing" a dozen >>> >>> milliseconds per >>> >>> million calls? >>> >> >>> >> There are many different use cases. My own applications use them in >>> >> simulations to generate random deviates, to generate random hex >>> >> strings as identifiers and in stochastic algorithms like some of our >>> >> internal uses. The last case is definitely sensitive to PRNG >>> >> performance. >>> > >>> > Thanks for giving examples, but since we talk about performance, I >>> > was hoping for some real flesh, like the relative duration of numbers >>> > generation (e.g. the total duration of calls to the "RandomGenerator" >>> > instances wrt to the total duration of the application). >>> > >>> > I don't know if by "last case", you are referring to code that is >>> > inside CM. I didn't spot anything that makes "heavy" usage of a >>> > RNG (in the sense that generation would count as a sizable part of >>> > the whole processing). >>> monteCarloP in KolmogorovSmirnovTest is one to check. >>> > >>> > As I pointed out many times: if an application is severely dependent >>> > on the performance of RNG, the user probably will turn to specific >>> > tools (e.g. GPUs? [4]) rather than use CM. >>> >>> That is a bogus argument. We should make our PRNGs simple and fast >>> so their use can extend to performance-sensitive applications. >>> > >>> > Conversely, using Java might be preferred for its flexibility, which >>> > is destroyed by a search for ultimate performance (which nobody seems >>> > able to define reasonably). >>> > Performance is not a goal in itself; it should not be a trophy which >>> > sits uselessly on a shelf. >>> >>> Nor should "beautiful design" in the eyes of one person. >>> > >>> > My goal is not to deliberately slow things down; it is to allow some >>> > leeway so that designs which are deemed better (on all counts except, >>> > perhaps, performance) are given a chance to show their strengths, in >>> > particular in areas where performance in absolute terms is "good >>> > enough" for all use-cases which CM should care about (hence the need >>> > of actual data points[5]). >>> >>> I see no reason that we can't have it both ways - good design and >>> good performance. What we have now, modulo maybe some small changes >>> to reduce code duplication, works fine. If you want to play with >>> 64-bit generators and can find reference implementations and verify >>> that they do in fact perform better, great. If not, I don't see the >>> point. You can rant and complain all you want; but I am not going >>> to let us trash performance or correctness of code in the random >>> class or anywhere else just because you think it is somehow "better >>> designed" unless you can show specific, practical use cases >>> demonstrating the value of the changes. >>> >>> Phil >>> > >>> > >>> > Gilles >>> > >>> > [1] "Is it faster?" >>> > "No." >>> > "Then, no." >>> > [2] Although that is in some sense what you indirectly defend by >>> > wanting >>> > to stick with a meaningless "next(int bits)" method. >>> > [3] http://www.doornik.com/research/ziggurat.pdf >>> > [4] http://http.developer.nvidia.com/GPUGems3/gpugems3_ch37.html >>> > [5] Hence the need to agree on a methodology/policy for benchmarking. >>> > >>> >> >>> >> Phil >>> >> >>> >> [1] http://openjdk.java.net/projects/code-tools/jmh/ >>> >>> IOW, would those users for which such a difference matters use >>> >>> CM at all? >>> >> >>> >>> >>> >>> >>> >>> Thanks, >>> >>> Gilles >>> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition <http://www.manning.com/bauer3/> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> Spring Batch in Action <http://www.manning.com/templier/> Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory