Hi,
Various targets implement -momit-leaf-frame-pointer to avoid using a frame
pointer in leaf
functions. Currently the GCC mid-end does not provide a way of doing this, so
targets have resorted
to hacks. Typically this involves forcing flag_omit_frame_pointer to be true in
the
_option_override
> Richard Henderson wrote:
> On 08/20/2014 08:22 AM, Wilco Dijkstra wrote:
> > 2. Change the mid-end to call _frame_pointer_required even when
> > !flag_omit_frame_pointer.
>
> Um, it does that already. At least as far as I can see from
> ira_setup_eliminable_regset
Hi,
I'm investigating various register allocation inefficiencies. The first thing
that stands out is
that GCC both supports caller-saves as well as spilling. Spilling seems to
spill all definitions and
all uses of a liverange. This means you often end up with multiple reloads
close together, wh
ys be lower than that of a caller-save (given
memory_move_cost=4 and
register_move_cost=2 as commonly used by targets, anything that can be
rematerialized should have
less than half the cost of being spilled or caller-saved).
Wilco
> -Original Message-
> From: Wilco Dijkstra [m
Hi,
While investigating why the IRA preferencing algorithm often chooses incorrect
preferences from the
costs, I noticed this thread: https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html
I am seeing the exact same issue on AArch64 - during the final preference
selection ira-costs takes
the union of
> Matthew Fortune wrote:
> Wilco Dijkstra writes:
> > While investigating why the IRA preferencing algorithm often chooses
> > incorrect preferences from the costs, I noticed this thread:
> > https://gcc.gnu.org/ml/gcc/2011-05/msg00186.html
> >
> > I am se
Interestingly even when the preferences are accurate, lra_constraints
completely ignores the preferred/allocno class. If the cost of 2 alternatives
is equal in every way (which will be the case if they are both legal matches
as the standard cost functions are not used at all), the wrong one may be
Hi,
The existing sincos functions use 2 pointers to return the sine and cosine
result. In
most cases 4 memory accesses are necessary per call. This is inefficient and
often
significantly slower than returning values in registers. I ran a few
experiments on the
new optimized sincosf implementati
On 06.01.20 11:03, Andrew Pinski wrote:
> +GCC
>
> On Mon, Jan 6, 2020 at 1:52 AM Matthias Klose wrote:
>>
>> In an archive test rebuild with binutils and GCC trunk, I see a lot of build
>> failures on both aarch64-linux-gnu and arm-linux-gnueabihf failing with
>> "multiple definition of symbols"
Hi,
> However, this is an undocumented change in the current NEWS, and seeing
>> literally hundreds of package failures, I doubt that's the right thing to
>> do, at
>> least without any deprecation warning first. Could that be handled,
>> deprecating
>> in GCC 10 first, and the changing t
Hi Christophe,
> Actually I got a confirmation of what I suspected: the offending function
> foo()
> is part of ARM CMSIS libraries, although the users are able to recompile them,
> they don't want to modify that source code. Having a compilation option to
> avoid generating problematic code sequ
Richard Henderson wrote:
> On 08/04/2017 05:59 AM, Prathamesh Kulkarni wrote:
> > For i386, it seems strcmp is expanded inline via cmpstr optab by
> > expand_builtin_strcmp if one of the strings is constant. Could we similarly
> > define cmpstr pattern for AArch64?
>
> Certainly that's possi
Hi all,
At the GNU Cauldron I was inspired by several interesting talks about improving
GCC in various ways. While GCC has many great optimizations, a common theme is
that its default settings are rather conservative. As a result users are
required to enable several additional optimizations by ha
Hi Prathamesh,
I've tried out the latest version and it works really well. It built and ran
SPEC2017 without any issues or regressions (I didn't do a detailed comparison
which would mean multiple runs, however a single run showed performance is
pretty much the same on INT and 0.1% faster on FP)
David Edelsohn wrote:
> Why does AArch64 define PROMOTE_MODE as SImode? GCC ports for other
> RISC targets mostly seem to use a 64-bit mode. Maybe SImode is the
> correct definition based on the current GCC optimization
> infrastructure, but this seems like a change that should be applied to
> a
Hi Justin,
> I tried centos 7.4 gcc 4.8.5-16, which seems to announce to fix this issue.
> And I checked the source code, the patch had been included in.
> But no luck, the bug is still there.
>
> Could you please please any advice to me? eg. Is there any ways to disable
> such
> reload compilati
Hi Justin,
> The 4.8.5 is default gcc version for centos 7.x
If there is no newer version available you should talk to your distro.
It is worth reporting this bug to them as more of their users may be
affected by it.
Wilco
Hi,
You'll get GOT relocations to globals when you use -fpic:
int x;
int f(void) { return x; }
>gcc -O2 -S -o- -fpic
f:
adrpx0, :got:x
ldr x0, [x0, #:got_lo12:x]
ldr w0, [x0]
ret
So it doesn't depend on the compiler but what options you compile for.
T
Hi,
> These other registers - r4 to r12 - are "callee saved".
To be precise, R4-R11 are callee-saved, R0-R3, R12, LR are caller-saves
and LR and PSR are clobbered by calls. LR is slightly odd in that it is
a callee-save in the prolog, but not in the epilog (since LR is assumed
clobbered after a c
Hi,
I looked at a few performance anomalies between gfortran and Flang - it appears
array slices
are treated differently. Using -frepack-arrays fixed a performance issue in
gfortran and didn't
cause any regressions. Making input array slices contiguous helps both locality
and enables
more vecto
Bin.Cheng wrote:
> I don't know the implementation of the option, so two questions:
> 1) When the repack is done during compilation? Is new code
> manipulating data layout added
> by frontend? If yes, better to do it during optimization thus is
> can be on demanding? This
> looks like
Martin wrote:
> Keep in mind that when discussing FP benchmarks, the used math library
> can be (almost) as important as the compiler. In the case of 481.wrf,
> we found that the GCC 8 + glibc 2.26 (so the "out-of-the box" GNU)
> performance is about 70% of ICC's. When we just linked against AMD
Hi,
I don't believe there is a missing optimization here: compilers expand mempcpy
by default into memcpy since that is the standard library call. That means even
if your source code contains mempcpy, there will never be any calls to mempcpy.
The reason is obvious: most targets support optimized
23 matches
Mail list logo