Hi all,
The vector rotate splitter has some logic to deal with post-reload splitting
but not all cases in aarch64_emit_opt_vec_rotate are post-reload-safe.
In particular the ROTATE+XOR expansion for TARGET_SHA3 can create RTL that
can later be simplified to a simple ROTATE post-reload, which would
Hi all,
I'd like to continue the discussion on teaching GCC to optimise code layout
for locality between callees and callers. This is work that we've been doing
at NVIDIA, primarily Prachi Godbole (CC'ed) and myself.
This is a follow-up to the discussion we had at GNU Cauldron at the IPA/LTO
BoF [
On Mon, Nov 04, 2024 at 10:21:58AM +, Andrew Stubbs wrote:
> @@ -999,6 +1000,18 @@ omp_max_vf (void)
> && OPTION_SET_P (flag_tree_loop_vectorize)))
> return 1;
>
> + if (ENABLE_OFFLOADING && offload)
> +{
> + for (const char *c = getenv ("OFFLOAD_TARGET_NAMES"); c;)
> +
On Tue, Nov 5, 2024 at 11:18 AM Florian Weimer wrote:
>
> * David Brown via Gcc:
>
> > I would have thought it would be better as part of the compiler. For
> > each compilation unit, you generate one or more data sections
> > depending on the variable initialisations, compiler options and target
* David Brown via Gcc:
> I would have thought it would be better as part of the compiler. For
> each compilation unit, you generate one or more data sections
> depending on the variable initialisations, compiler options and target
> (.bss, .data, .rodata, .sbss, etc.). If the compiler has
> "-al