Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-24 Thread Wes McKinney
On Wed, Jun 24, 2020 at 9:48 PM Micah Kornfield wrote: > > In that case I would propose the following: > 1. Standardize on clang for performance generating numbers for performance > related PRs > 2. Adjust our binary artifact builds to use clang where feasible (I think > should wait until after

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-24 Thread Micah Kornfield
In that case I would propose the following: 1. Standardize on clang for performance generating numbers for performance related PRs 2. Adjust our binary artifact builds to use clang where feasible (I think should wait until after our next release). 3. Add to the contributors guide summarizing the

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-23 Thread Uwe L. Korn
FTR: We can use the latest(!) clang for all platform for conda and wheels. It isn't probably even that much of a complicated setup. On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote: > We should aim to improve the performance of the most widely used > *default* packages, which are p

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Micah Kornfield
I think there is a lot of good discussion on this thread. Let me try to try to summarize and give my thoughts: 1. Will this be a hard rule? No, based on the feedback, I think there will always be some subjectivity (e.g. tradeoff in increase in code size and maintainability). My intent here was

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Wes McKinney
For the curious, I ran benchmarks for both gcc-8 and clang-8 on my laptop for the ARROW-9197 patch * Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470 * GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670 It's apparent in this particular instance that GCC was

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Neal Richardson
FTR, by default R on macOS uses system clang, not homebrew; for Windows, it is gcc (unless you're adventurous like Uwe and can hack it to work with clang ;); and on Linux, CRAN checks both gcc and clang. Is it unreasonable to check *both* gcc and clang when there's a potentially significant perfor

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Francois Saint-Jacques
We should aim to improve the performance of the most widely used *default* packages, which are python pip, python conda and R (all platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose (haven't checked) that clang is

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Wes McKinney
Based on some of my performance work recently, I'm growing uncomfortable with using gcc as the performance baseline since the results can be significantly different (sometimes 3-4x or more on certain fast algorithms) from clang and MSVC. The perf results on https://github.com/apache/arrow/pull/7506

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Yibo Cai
On 6/22/20 5:07 PM, Antoine Pitrou wrote: Le 22/06/2020 à 06:27, Micah Kornfield a écrit : There has been significant effort recently trying to optimize our C++ code. One thing that seems to come up frequently is different benchmark results between GCC and Clang. Even different versions of t

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Antoine Pitrou
Also, in my experience, it is much easier to install another clang version than another gcc version on Linux (gcc is more or less married to a given libstdc++ version, AFAICT). Regards Antoine. Le 22/06/2020 à 09:06, Uwe L. Korn a écrit : > With my conda-forge background, I would suggest to u

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Antoine Pitrou
Le 22/06/2020 à 06:27, Micah Kornfield a écrit : > There has been significant effort recently trying to optimize our C++ > code. One thing that seems to come up frequently is different benchmark > results between GCC and Clang. Even different versions of the same > compiler can yield significa

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

2020-06-22 Thread Uwe L. Korn
With my conda-forge background, I would suggest to use clang as a performance baseline, because it's currently the only compiler that works reliably on all platforms. Most Linux distributions are nowadays built with gcc, also making a strong argument, but on OSX and Windows the picture is a bit