Folks: Merely my opinions, of course ...
Just to amplify a little on Philippe's remarks by paraphrasing comments made many times on this list before. In a galaxy far away a long time ago ... John Chambers and his Bell Labs colleagues -- and subsequently R&R (Ross Ihaka and Robert Gentleman) and R's Core Team Developers -- made the decision to develop a language/software for data analysis, data graphics and statistics. Recognizing that "most" tasks within this arena were for "one-off" custom problems rather than repetitive "production" applications, they emphasized flexibility, ease of use and relatively straightforward extensibility. While I'm sure that they did not ignore performance, it was not the primary consideration (Chambers, et al's Blue Book speaks to these issues much more eloquently; I think it should be required reading _BEFORE_ one launches into criticism). As has been frequently mentioned, they knew that there are two "outs" for such matters: Moore's Law and the ability to easily incorporate customized C code into R. I submit that the data bear out the overwhelming wisdom of their choice. This is not to that R is perfect: there are certainly times when performance is inadequate, and design or implementation could have been (or be) improved. But no one bats a thousand (baseball idiom): as Philippe said, for many (maybe most?) of us R is both awesome and indispensable! For me the real challenge is: what's next? R/S is so blazingly successful that it seems to extingush the need for continuing improvement(the demise of Luke Tierney's X-Lisp Stat is an example): what's the next step in the sequence IMSL --> SAS ---> S/R --> ?? . But hopefully this is merely my ignorance speaking, and smart folks are already working on it. Regards to all, Bert Gunter -----Original Message----- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Philippe Grosjean Sent: Sunday, January 04, 2009 2:02 AM To: Stefan Grosse Cc: r-h...@stat.math.ethz.ch Subject: Re: [R] R badly lags matlab on performance? I wrote once the benchmark mentioned in Stefan's post (based on initial work by Stephan Steinhaus), and it is still available for those who would like to update it. Note that it is lacking some checking of the results to make sure that calculation is not only faster, but correct! Now, I'll tell why I haven't update it, and you'll see it is connected with the current topic. First, lack of time, for sure. Second, this benchmark has always been very criticized by several people including from the R Core Team. Basically, this is just toy examples, disconnected from the reality. Even with better cases, benchmarks do not take into account the time needed to write your code for your particular application (from the question to the results). I wrote this benchmark at a time when I overemphasized on the pure performances of the software, at a time I was looking for the best software I would choose as a tool for my future career. Now, what's my choice, ten years later? Not two, not three software... but just ONE: R. I tend to do 95% of my calculations with R (the rest is ImageJ/Java). Indeed, this benchmark results (and the toy example of Ajay Shah, a <- a + 1) should be only considered very marginally, because what is important is how your software tool is performing in real application, not in simplistic toy examples. R lays behind Matlab for pure arithmetic calculation... right! But R has a better object oriented approach, features more variable types (factor, for instance), and has a richer mechanism for metadata handling (col/row names, various other attributes, ...) that makes it richer to instanciate complex datasets or analyzes than Matlab. Of course, this has a small cost in performance. As soon as you think your problem in a vectorized way, R is one of the best tool, I think, to go "from the question to the answer" in real situations. How could we quantify this? I would only see big contests where experts of each language would be presented real problems and one would measure the time needed to solve the problem,... Also, one should measure: the robustness, reusability, flexibility, "elegance" of the code produced (how to quantify these?). Such kind of contest between R, Matlab, Octave, Scilab, etc. is very unlikely to happen. At the end, it is really a matter of personal feeling: you can make your own little contest by yourself: trying to solve a given problem in several software... and then decide which one you prefer. I think many people do/did this, and the still exponential growth of R use (at least, as it can be observed by the increasing number of CRAN R packages) is probably a good sign that R is probably one of the top performers when it comes to efficiency "from the question to the answer" in real problems, not just on toy little examples! (sorry for been so long, I think I miss some interaction with the R community this time ;-) Best, Philippe ..............................................<°}))><........ ) ) ) ) ) ( ( ( ( ( Prof. Philippe Grosjean ) ) ) ) ) ( ( ( ( ( Numerical Ecology of Aquatic Systems ) ) ) ) ) Mons-Hainaut University, Belgium ( ( ( ( ( .............................................................. Stefan Grosse wrote: >> I don't have octave (on the same machine) to compare these with. >> And I don't have MatLab at all. So I can't provide a comparison >> on that front, I'm afraid. >> Ted. >> > > Just to add some timings, I was running 1000 repetitions (adding up to > a=1001) on a notebook with core 2 duo T7200 > > R 2.8.1 on Fedora 10: mean 0.10967, st.dev 0.005238 > R 2.8.1 on Windows Vista: mean 0.13245, st.dev 0.00943 > > Octave 3.0.3 on Fedora 10: mean 0.097276, st.dev 0.0041296 > > Matlab 2008b on Windows Vista: 0.0626 st.dev 0.005 > > But I am not sure how representative this is with that very simple > example. To compare Matlab speed with R a kind of benchmark suite is > necessary. Like: http://www.sciviews.org/benchmark/index.html but that > one is very old. I would guess that there did not change much: sometimes > R is faster, sometimes not. > > This difference between the Windows and Linux timing is probably not > really relevant: when I was comparing the timings of my usual analysis > there was no difference between the two operating systems. (count data > and time series stuff) > > Cheers > Stefan > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.