https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81828
Bug ID: 81828 Summary: Cilkplus performance regression on ARM... Product: gcc Version: 7.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: ejolson at unr dot edu Target Milestone: --- Created attachment 41979 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41979&action=edit Graph showing performance regression... Code for gcc version 7.1 using Cilkplus parallel programming extensions on ARM is running much slower than the same code with version 6.2. Details may by viewed graphically as http://fractal.math.unr.edu/~ejolson/bench/dotprod/gcc71-8.png which consistently shows a loss of performance using any combination of 1 to 8 cores on a Samsung/Nexell S5P6818 based SBC. More information and example code is available at https://www.raspberrypi.org/forums/viewtopic.php?p=711196#p1197225 My impression is that this regression affects almost all Cilkplus code on ARM and is possibly the result unaligned cactus stack additional overhead in switching tasks that was not present in the 6.2 version. It is likely that performance-based tests for ARM Cilkplus are needed to insure such regressions do not happen in the future. Note that the performance of serial code is not affected. The test code was compiled for 32-bit mode using options -fcilkplus -O3 -mcpu=cortex-a7 -mfpu=neon-vfpv4 -mfloat-abi=hard -ffast-math and run under identical circumstances in both cases.