On Fri, 23 Feb 2018 12:43:06 -0600, Python wrote: > Even if testing optimized > code is the point, as the article claims, it utterly fails to do that. > Bad science.
You've used that statement two or three times now. *This isn't science*. There's nothing scientific about writing benchmarks, or even objective. It is through and through subjective choices given a paper-thin patina of objectivity because the results include numbers. But those numbers depend on the precise implementation of the benchmark. They depend on the machine you run them on, sometimes strongly enough that the order of which language is faster can swap. I remember a bug in Python's urllib module, I think it was, that made code using it literally hundreds of times slower on Windows than Linux or OS X. The choice of algorithms used is not objective, or fair. Most of it is tradition: the famous "whetstone" benchmark apparently measures something which has little or no connection to anything software developers should care about. It, like the Dhrystone variant, were invented to benchmark CPU performance. The relevance to comparing languages is virtually zero. "As this data reveals, Dhrystone is not a particularly representative sample of the kinds of instruction sequences that are typical of today's applications. The majority of embedded applications make little use of the C libraries for example, and even desktop applications are unlikely to have such a high weighting of a very small number of specific library calls." http://dell.docjava.com/courses/cr346/data/papers/DhrystoneMIPS- CriticismbyARM.pdf Take the Fibonacci double-recursion benchmark. Okay, it tests how well your language does at making millions of function calls. Why? How often do you make millions of function calls? For most application code, executing the function is far more costly than the overhead of calling it, and the call overhead is dwarfed by the rest of the application. For many, many applications, the *entire* program run could take orders of magnitude fewer function calls than a single call to fib(38). If you have a language with tail recursion elimination, you can bet that's its benchmarks will include examples of tail recursion and tail recursion will be a favoured idiom in that language. If it doesn't, it won't. I'm going to end with a quote: "And of course, the very success of a benchmark program is a danger in that people may tune their compilers and/or hardware to it, and with this action make it less useful." Reinhold P. Weicker, Siemens AG, April 1989 Author of the Dhrystone Benchmark -- Steve -- https://mail.python.org/mailman/listinfo/python-list