Re: Incremental updating of SSA_NAMEs that are passed to b_c_p

2020-10-30 Thread Richard Biener via Gcc
On Thu, Oct 29, 2020 at 6:20 PM Ilya Leoshkevich  wrote:
>
> On Wed, 2020-10-28 at 12:18 +0100, Richard Biener wrote:
> > On Tue, Oct 27, 2020 at 7:36 PM Ilya Leoshkevich via Gcc
> >  wrote:
> > > Hi,
> > >
> > > I'd like to revive the old discussion regarding the interaction of
> > > jump threading and b_c_p causing the latter to incorrectly return 1
> > > in
> > > certain cases:
> > >
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> > > https://gcc.gnu.org/pipermail/gcc-patches/2020-July/549288.html
> > >
> > > The conclusion was that this happening during threading is just a
> > > symptom of a deeper problem: SSA_NAMEs that are passed to b_c_p
> > > should
> > > not be registered for incremental updating.
> > >
> > > I performed a little experiment and added an assertion to
> > > create_new_def_for:
> > >
> > > --- a/gcc/tree-into-ssa.c
> > > +++ b/gcc/tree-into-ssa.c
> > > @@ -2996,6 +3014,8 @@ create_new_def_for (tree old_name, gimple
> > > *stmt,
> > > def_operand_p def)
> > >  {
> > >tree new_name;
> > >
> > > +  gcc_checking_assert (!used_by_bcp_p (old_name));
> > > +
> > >timevar_push (TV_TREE_SSA_INCREMENTAL);
> > >
> > >if (!update_ssa_initialized_fn)
> > >
> > > This has of course fired when performing basic block duplication
> > > during
> > > threading, which can be fixed by avoiding duplication of basic
> > > blocks
> > > wi
> > > th b_c_p calls:
> > >
> > > --- a/gcc/tree-cfg.c
> > > +++ b/gcc/tree-cfg.c
> > > @@ -6224,7 +6224,8 @@ gimple_can_duplicate_bb_p (const_basic_block
> > > bb)
> > >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_EXIT)
> > >   || gimple_call_internal_p (g, IFN_GOMP_SIMT_VOTE_ANY)
> > >   || gimple_call_internal_p (g,
> > > IFN_GOMP_SIMT_XCHG_BFLY)
> > > - || gimple_call_internal_p (g,
> > > IFN_GOMP_SIMT_XCHG_IDX)))
> > > + || gimple_call_internal_p (g, IFN_GOMP_SIMT_XCHG_IDX)
> > > + || gimple_call_builtin_p (g, BUILT_IN_CONSTANT_P)))
> > > return false;
> > >  }
> > >
> > > The second occurrence is a bit more interesting:
> > >
> > > gimple *
> > > vrp_insert::build_assert_expr_for (tree cond, tree v)
> > > {
> > >   ...
> > >   a = build2 (ASSERT_EXPR, TREE_TYPE (v), v, cond);
> > >   assertion = gimple_build_assign (NULL_TREE, a);
> > >   ...
> > >   tree new_def = create_new_def_for (v, assertion, NULL);
> > >
> > > The fix is also simple though:
> > >
> > > --- a/gcc/tree-vrp.c
> > > +++ b/gcc/tree-vrp.c
> > > @@ -3101,6 +3101,9 @@ vrp_insert::process_assert_insertions_for
> > > (tree
> > > name, assert_locus *loc)
> > >if (loc->expr == loc->val)
> > >  return false;
> > >
> > > +  if (used_by_bcp_p (name))
> > > +return false;
> > > +
> > >cond = build2 (loc->comp_code, boolean_type_node, loc->expr,
> > > loc-
> > > > val);
> > >assert_stmt = build_assert_expr_for (cond, name);
> > >if (loc->e)
> > >
> > > My original testcase did not trigger anything else.  I'm planning
> > > to
> > > check how this affects the testsuite, but before going further I
> > > would
> > > like to ask: is this the right direction now?  To me it looks a
> > > little-bit more heavy-handed than the original approach, but maybe
> > > it's
> > > worth it.
> >
> > Disabling threading looks reasonable but I'd rather not disallow BB
> > duplication
> > or renaming.  For VRP I guess we want to instead change
> > register_edge_assert_for* to not register assertions for the result
> > of
> > __builtin_constant_p rather than just not allowing VRP to process it
> > (there are other consumers still).
>
> If I understood Jeff correctly, we should disable incremental updates
> for absolutely all b_c_p arguments, so affecting as many consumers as
> possible must actually be a good thing when this approach is
> considered?
>
> That said, regtest has revealed one more place where this is happening:
> rewrite_into_loop_closed_ssa_1 -> ... -> add_exit_phi ->
> create_new_def_for.  The reduced code is:
>
> int a;
>
> void
> b (void)
> {
>   while (a)
> __builtin_constant_p (a) ?: 0;
>   if (a == 8)
> while (1)
>   ;
> }
>
> I guess we are not allowed to cancel this transformation, becase we
> really need LCSSA for later passes?  This now contradicts the "prohibit
> all updates" idea..

Yes.  I believe this looks at the issue from a wrong angle.  SSA rewrite
is just renaming and that's OK.

> If you think that disabling threading is reasonable, could you please
> have another look at the original patches?
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547236.html
> v2: https://gcc.gnu.org/pipermail/gcc-patches/2020-June/549211.html
>
> Can anything like this go in after all?

For v2 I'm not sure why we would allow b_c_p late?  That said, both
approaches are reasonable - disallow threading or force bcp to
evaluate to zero (guess that relies on the non-threaded variant to
be known to be _not_ constant, otherwise we create another
inconsistecy

Disassemble the .lib file compiled with gcc-arm-8.3-2019.03-x86_64-arm-eabi compilation tool chain, and found that malloc is optimized to calloc.

2020-10-30 Thread 魏亚茹
Dear gcc:
I find that disassemble the .lib file compiled with 
gcc-arm-8.3-2019.03-x86_64-arm-eabi compilation tool chain, and found that 
malloc is optimized to calloc. I want to know under what circumstances malloc 
will be optimized to calloc?
Thanks


Implementing OpenMP 5.0 requires directive

2020-10-30 Thread Chung-Lin Tang

Hi Jakub,
We've been going over how we should implement the requires directive, in a bit 
more complete
sense than the current state (i.e. only atomic_default_mem_order working).

For the three clauses where the specification requires that "must appear in all 
compilation
units of a program that contain device constructs or device routines or in none of 
them":
   - reverse_offload
   - unified_address
   - unified_shared_memory

The current design we're contemplating is to generate a mask variable of these 
3 clauses for
compilation units built with -fopenmp, have them tagged with an attribute to be 
collected
into a special section (e.g. ".gnu.gomp.requires"). Later at runtime device 
startup, have
them checked by the runtime against the capabilities of the libgomp offload 
target.
Cross-checking each word (assuming it is a word that we generate for each 
compile unit)
against each other can also implement the consistency requirement.

(actually, as a first stage implementation, we were hoping to just have the 
special section
implemented, which allows compilation of OpenMP programs using 
requires-directive, and
implement any runtime checking at a later stage)

We hope to check with you first on any design issues. Have you given any 
thought on this
directive?

Thanks,
Chung-Lin


Re: Implementing OpenMP 5.0 requires directive

2020-10-30 Thread Jakub Jelinek via Gcc
On Fri, Oct 30, 2020 at 10:48:09PM +0800, Chung-Lin Tang wrote:
> We've been going over how we should implement the requires directive, in a 
> bit more complete
> sense than the current state (i.e. only atomic_default_mem_order working).
> 
> For the three clauses where the specification requires that "must appear in 
> all compilation
> units of a program that contain device constructs or device routines or in 
> none of them":
>- reverse_offload
>- unified_address
>- unified_shared_memory
> 
> The current design we're contemplating is to generate a mask variable of 
> these 3 clauses for
> compilation units built with -fopenmp, have them tagged with an attribute to 
> be collected
> into a special section (e.g. ".gnu.gomp.requires"). Later at runtime device 
> startup, have
> them checked by the runtime against the capabilities of the libgomp offload 
> target.
> Cross-checking each word (assuming it is a word that we generate for each 
> compile unit)
> against each other can also implement the consistency requirement.
> 
> (actually, as a first stage implementation, we were hoping to just have the 
> special section
> implemented, which allows compilation of OpenMP programs using 
> requires-directive, and
> implement any runtime checking at a later stage)
> 
> We hope to check with you first on any design issues. Have you given any 
> thought on this
> directive?

I vaguely considered each TU with such requires directives would just call
in the ctors of the TU some libgomp routine that would tell libgomp the
bitmask requirement, and either it would be called before the devices are
initialized, in that case it would result in filtering of the devices -
devices not satisfying those requirements wouldn't be registered - or if
called after the devices are loaded (e.g. due to dlopen etc.), dunno, either
terminate the program or just finalize those devices.

Now, you're certainly right it is better to track it somewhere in data
and just let it be resolved during linking.

Anyway, we shouldn't record these 3 requires flags anywhere in TUs that
don't contain any device constructs or device routines - the current
OMP_REQUIRES_TARGET_USED is set by target construct only, while it applies
I think at least to omp target{, data, update, enter data, exit data},
probably some declare variants, and various omp_target_* etc. calls
(so we'd need e.g. have some function list and compare direct calls
to those functions somewhere (perhaps gimplify)).

One possible spot to encode the mask could be somewhere in the offloading
LTO sections and let mkoffload collect it, diagnose and let it runtime
library know about the requirements (and let the runtime library again check
those requirements across the binary and shared libraries).

Just note that there can be even TUs that don't really have anything to
offload, but still have some device constructs or call device routines and
so we'd need to force the offloading sections even for those.

As for dynamic_allocators, I'd like to implement that soon, though because
the wording is unfortunately bad (except for allocate clause on target
construct, the requirement that the allocator is constant expression when
not dynamic_allocators is talking about target region and therefore
something to be discovered only at runtime rather than compile time), I
think all we can do is emit warnings.

Jakub



Time for std::bit_cast

2020-10-30 Thread sotrdg sotrdg via Gcc
Thank you! Jakub. YES to std::bit_cast

Sent from Mail for Windows 10



Dead Field Elimination and Field Reordering

2020-10-30 Thread Erick Ochoa

Hello again,

I've been working on several implementations of data layout 
optimizations for GCC, and I am again kindly requesting for a review of 
the type escape based dead field elimination and field reorg.


Thanks to everyone that has helped me. The main differences between the 
previous commits have been fixing the style, adding comments explaining 
classes and families of functions, exit gracefully if we handle unknown 
gimple syntax, and added a heuristic to handle void* casts.


This patchset is organized in the following way:

* Adds a link-time warning if dead fields are detected
* Allows for the dead-field elimination transformation to be applied
* Reorganizes fields in structures.
* Adds some documentation
* Gracefully does not apply transformation if unknown syntax is detected.
* Adds a heuristic to handle void* casts

I have tested this transformations as extensively as I can. The way to 
trigger these transformations are:


-fipa-field-reorder and -fipa-type-escape-analysis

Having said that, I welcome all criticisms and will try to address those 
criticisms which I can. Please let me know if you have any questions or 
comments, I will try to answer in a timely manner.


The code is in:

  refs/vendors/ARM/heads/arm-struct-reorg-wip

Future work includes extending the current heuristic with ipa-modref 
extending the analysis to use IPA-PTA as discussed previously.


Few notes:

* Currently it is not safe to use -fipa-sra.
* I added some tests which are now failing by default. This is because 
there is no way to safely determine within the test case that a layout 
has been transformed. I used to determine that a field was eliminated 
doing pointer arithmetic on the fields. And since that is not safe, the 
analysis decides not to apply the transformation. There is a way to deal 
with this (add a flag to allow the address of a field to be taken) but I 
wanted to hear other possibilities to see if there is a better option.
* At this point we’d like to thank the again GCC community for their 
patient help so far on the mailing list and in other channels. And we 
ask for your support in terms of feedback, comments and testing.


gcc-9-20201030 is now available

2020-10-30 Thread GCC Administrator via Gcc
Snapshot gcc-9-20201030 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/9-20201030/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 9 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 
revision b2593aa80448b9dfa90c878997ac2cb5cf979cfc

You'll find:

 gcc-9-20201030.tar.xzComplete GCC

  SHA256=7d4f8cbfd082596e7bd3f7938224cb657ba9795df55bf22f0eae2198617c5715
  SHA1=758c33e70816b7e03eb82c4340c6e44b9b949b5e

Diffs from 9-20201023 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.