Frame pointer optimization issues

2014-08-20 Thread Wilco Dijkstra
Hi, Various targets implement -momit-leaf-frame-pointer to avoid using a frame pointer in leaf functions. Currently the GCC mid-end does not provide a way of doing this, so targets have resorted to hacks. Typically this involves forcing flag_omit_frame_pointer to be true in the _option_override

RE: Frame pointer optimization issues

2014-08-21 Thread Wilco Dijkstra
> Richard Henderson wrote: > On 08/20/2014 08:22 AM, Wilco Dijkstra wrote: > > 2. Change the mid-end to call _frame_pointer_required even when > > !flag_omit_frame_pointer. > > Um, it does that already. At least as far as I can see from > ira_setup_eliminable_regset

Register allocation: caller-save vs spilling

2014-08-27 Thread Wilco Dijkstra
Hi, I'm investigating various register allocation inefficiencies. The first thing that stands out is that GCC both supports caller-saves as well as spilling. Spilling seems to spill all definitions and all uses of a liverange. This means you often end up with multiple reloads close together, wh

RE: Register allocation: caller-save vs spilling

2014-09-04 Thread Wilco Dijkstra
ys be lower than that of a caller-save (given memory_move_cost=4 and register_move_cost=2 as commonly used by targets, anything that can be rematerialized should have less than half the cost of being spilled or caller-saved). Wilco > -Original Message- > From: Wilco Dijkstra [m

IRA preferencing issues

2015-04-17 Thread Wilco Dijkstra
Hi, While investigating why the IRA preferencing algorithm often chooses incorrect preferences from the costs, I noticed this thread: https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html I am seeing the exact same issue on AArch64 - during the final preference selection ira-costs takes the union of

RE: IRA preferencing issues

2015-04-17 Thread Wilco Dijkstra
> Matthew Fortune wrote: > Wilco Dijkstra writes: > > While investigating why the IRA preferencing algorithm often chooses > > incorrect preferences from the costs, I noticed this thread: > > https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html > > > > I am se

RE: IRA preferencing issues

2015-04-20 Thread Wilco Dijkstra
Interestingly even when the preferences are accurate, lra_constraints completely ignores the preferred/allocno class. If the cost of 2 alternatives is equal in every way (which will be the case if they are both legal matches as the standard cost functions are not used at all), the wrong one may be

RFC: Creating a more efficient sincos interface

2018-09-13 Thread Wilco Dijkstra
Hi, The existing sincos functions use 2 pointers to return the sine and cosine result. In most cases 4 memory accesses are necessary per call. This is inefficient and often significantly slower than returning values in registers. I ran a few experiments on the new optimized sincosf implementati

Re: multiple definition of symbols" when linking executables on ARM32 and AArch64

2020-01-06 Thread Wilco Dijkstra
On 06.01.20 11:03, Andrew Pinski wrote: > +GCC > > On Mon, Jan 6, 2020 at 1:52 AM Matthias Klose wrote: >> >> In an archive test rebuild with binutils and GCC trunk, I see a lot of build >> failures on both aarch64-linux-gnu and arm-linux-gnueabihf failing with >> "multiple definition of symbols"

Re: multiple definition of symbols" when linking executables on ARM32 and AArch64

2020-01-06 Thread Wilco Dijkstra
Hi, > However, this is an undocumented change in the current NEWS, and seeing >> literally hundreds of package failures, I doubt that's the right thing to >> do, at >> least without any deprecation warning first.  Could that be handled, >> deprecating >> in GCC 10 first, and the changing t

Re: [ARM] LLVM's -arm-assume-misaligned-load-store equivalent in GCC?

2020-01-09 Thread Wilco Dijkstra
Hi Christophe, > Actually I got a confirmation of what I suspected: the offending function > foo() > is part of ARM CMSIS libraries, although the users are able to recompile them, > they don't want to modify that source code. Having a compilation option to > avoid generating problematic code sequ

Re: help with PR78809 - inline strcmp for small constant strings

2017-08-04 Thread Wilco Dijkstra
Richard Henderson wrote:  > On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote: > > For i386, it seems strcmp is expanded inline via cmpstr optab by > > expand_builtin_strcmp if one of the strings is constant. Could we similarly > > define cmpstr pattern for AArch64? > > Certainly that's possi

RFC: Improving GCC8 default option settings

2017-09-12 Thread Wilco Dijkstra
Hi all, At the GNU Cauldron I was inspired by several interesting talks about improving GCC in various ways. While GCC has many great optimizations, a common theme is that its default settings are rather conservative. As a result users are required to enable several additional optimizations by ha

Re: [RFC] type promotion pass

2017-09-15 Thread Wilco Dijkstra
Hi Prathamesh, I've tried out the latest version and it works really well. It built and ran SPEC2017 without any issues or regressions (I didn't do a detailed comparison which would mean multiple runs, however a single run showed performance is pretty much the same on INT and 0.1% faster on FP)

Re: [RFC] type promotion pass

2017-09-15 Thread Wilco Dijkstra
David Edelsohn wrote: > Why does AArch64 define PROMOTE_MODE as SImode? GCC ports for other > RISC targets mostly seem to use a 64-bit mode. Maybe SImode is the > correct definition based on the current GCC optimization > infrastructure, but this seems like a change that should be applied to > a

Re: Possible gcc 4.8.5 bug about RELOC_HIDE marcro in latest kernel code

2017-09-21 Thread Wilco Dijkstra
Hi Justin, > I tried centos 7.4 gcc 4.8.5-16, which seems to announce to fix this issue. > And I checked the source code, the patch had been included in. > But no luck, the bug is still there. > > Could you please please any advice to me? eg. Is there any ways to disable > such > reload compilati

Re: Possible gcc 4.8.5 bug about RELOC_HIDE marcro in latest kernel code

2017-09-21 Thread Wilco Dijkstra
Hi Justin, > The 4.8.5 is default gcc version for centos 7.x If there is no newer version available you should talk to your distro. It is worth reporting this bug to them as more of their users may be affected by it. Wilco

Re: "GOT" under aarch64

2017-09-22 Thread Wilco Dijkstra
Hi, You'll get GOT relocations to globals when you use -fpic: int x; int f(void) { return x; } >gcc -O2 -S -o- -fpic f: adrpx0, :got:x ldr x0, [x0, #:got_lo12:x] ldr w0, [x0] ret So it doesn't depend on the compiler but what options you compile for. T

Re: Potential bug on Cortex-M due to used registers/interrupts.

2017-11-17 Thread Wilco Dijkstra
Hi, > These other registers - r4 to r12 - are "callee saved". To be precise, R4-R11 are callee-saved, R0-R3, R12, LR are caller-saves and LR and PSR are clobbered by calls. LR is slightly odd in that it is a callee-save in the prolog, but not in the epilog (since LR is assumed clobbered after a c

Fortran array slices and -frepack-arrays

2018-04-13 Thread Wilco Dijkstra
Hi, I looked at a few performance anomalies between gfortran and Flang - it appears array slices are treated differently. Using -frepack-arrays fixed a performance issue in gfortran and didn't cause any regressions. Making input array slices contiguous helps both locality and enables more vecto

Re: Fortran array slices and -frepack-arrays

2018-04-13 Thread Wilco Dijkstra
Bin.Cheng wrote:   > I don't know the implementation of the option, so two questions: > 1) When the repack is done during compilation?  Is new code > manipulating data layout added > by frontend?  If yes, better to do it during optimization thus is > can be on demanding?  This > looks like

Re: How to get GCC on par with ICC?

2018-06-15 Thread Wilco Dijkstra
Martin wrote: > Keep in mind that when discussing FP benchmarks, the used math library > can be (almost) as important as the compiler. In the case of 481.wrf, > we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU) > performance is about 70% of ICC's. When we just linked against AMD

Missing optimization: mempcpy(3) vs memcpy(3)

2022-12-12 Thread Wilco Dijkstra via Gcc
Hi, I don't believe there is a missing optimization here: compilers expand mempcpy by default into memcpy since that is the standard library call. That means even if your source code contains mempcpy, there will never be any calls to mempcpy. The reason is obvious: most targets support optimized