Re: Function signatures in extern "C".

2020-09-10 Thread Florian Weimer via Gcc
* Jakub Jelinek via Gcc:

> On Mon, Sep 07, 2020 at 10:27:13AM +0100, Jonathan Wakely via Gcc wrote:
>> On Mon, 7 Sep 2020 at 09:18, Iain Sandoe wrote:
>> >
>> > Perhaps the PR should be reopened with “accepts invalid”?
>> 
>> My impression from the PR is that the reporter was using a different
>> ABI, where the name isn't reserved. Maybe the testcase should only be
>> accepted with -fno-threadsafe-statics or -ffreestanding or something
>> to say "I'm doing things differently".
>> 
>> Or we could just say that G++ reserves the Itanium ABI names
>> unconditionally, even if it doesn't need to use them, in which case it
>> would be accepts-invalid.
>
> All identifiers starting with two underscores are reserved for the
> implementation already.

But which implementation?

__ identifiers are used heavily across the GNU project, not just in GCC
and glibc (as one would expect).  A lot of C software outside the GNU
project is similar.  I think this attempt at namespace management has
failed.

For the Itanium C++ ABI symbols, it would be useful to document which
ones can be user-defined (which can be very interesting to avoid a
dependency on libstdc++).  I do not know how much value there is in
supporting a semantically different definition, or a declaration with
different types (probably not much).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: irange best practices document

2020-09-10 Thread Aldy Hernandez via Gcc

On 9/9/20 7:03 PM, Martin Sebor wrote:

On 9/3/20 1:14 AM, Aldy Hernandez via Gcc wrote:
Below is a documented we have drafted to provide guidance on using 
irange's and converting passes to it.  This will future proof any such 
passes so that they will work with the ranger, or any other mechanism 
using multiple sub-ranges (as opposed to the more limited value_range).


The official document will live here, but is included below for 
discussion:


 https://gcc.gnu.org/wiki/irange-best-practices

Feel free to respond to this post with any questions or comments.


Thanks for the writeup!

The biggest question on my mind at the moment is probably about
how to go about rewriting classes that maintain their own ranges
(most often as an array of two offset_int) to use one of the Ranger
classes.  Two examples are the access_ref class in builtins.h and
the builtin_memref class in gimple-ssa-warn-restrict.c.

Both of these compute the size of an object (as a simple range)
and the offset into it (also as a range).  In the future, they will
track of sizes of multiple objects (from PHI nodes).

My thinking is to do this in two steps: In 1) replace each
offset_int[2] member array with an instance of int_range<1> and
then rewrite all the code that manipulates the array with calls
to the ranger API. That should be fairly straightforward. In 2)


This sounds like a sound approach.  Instead of passing pairs of 
integers, pass an irange and limit yourself to the non-deprecated part 
of the API.



replace the simple int_range<1> with something more interesting
(int_range_max?) and rewrite the final code that processes


The ranger will use whatever sized range you pass it.  So if you want 
fine granularity, pass it an int_range_max.


However, if your final destination in memory is constrained (you're 
storing a gazillion of these), you can use the irange_pool to allocate 
the minimum amount of storage needed.  Note, that the irange_pool has 
not been contributed yet.  It'll come with the ranger.


Taking the strlen code as an example, you could transform the following:

  vr_values *v = CONST_CAST (vr_values *, rvals);
  const value_range_equiv *vr = v->get_value_range (si->nonzero_chars);
  if (vr->kind () != VR_RANGE || !range_int_cst_p (vr))
return false;

...into:

int_range_max r;
if (!ranger->range_of_expr (r, si->nonzero_chars, si->stmt))
  return false;

Where ranger is declared somewhere at the top of your pass.  Perhaps in 
printf_strlen_execute() and pass it around like you do rvals.


Note that for better precision, you should pass the gimple context. 
That is, the statement from which si->nonzero_chars came from.  In the 
above case, I think you want si->stmt.  Currently get_value_range() has 
a STMT parameter which is unused.  This was the reason for providing 
such parameter.


Finally... you can store R into an int_range depending on how much 
space you want to use in your ultimate storage location.  The assignment 
operator will squish R into whatever space you give it.  So the 
following will truncate things appropriately:


int_range<3> final_location = r;

However, if you're using irange_pool to allocate just the amount of 
space needed, you could do:


// Presumably declared wherever you declared the ranger.
irange_pool pool;
...
...
irange *p = pool.allocate (r);

p will hold a pointer to a range that contains R with no extra space. 
This is ideal for longer term storage.  For intermediate results, use 
int_range_max.


Note that I am already converting sprintf/strlen to use the ranger 
instead of evrp/vr_values (aldyh/ranger-staging git branch).  So the 
work of converting to range_of_expr above, and passing around gimple 
context, will already be done with the contribution of ranger.  What I 
will not provide for now, is the wholesale conversion of other passes to 
the "clean" irange API.  So if your pass is still using kind() and 
stashing away pairs of offset_int's, that will have to be done 
separately.  I can offer guidance though, and help as time permits.



the results to scan all the subranges for overflow or overlap,
as well as the code that presents them in warnings/notes.  It
would be nice to have support for the formatting of ranges in
the pretty-printer to cut down on the repetitive tests that
determine how to format a constant (%E, vs a range [%E, %E],
vs a multi-range [%E, %E][%E, %E], ...[%E, %E]).


I'm not a big fan of including ranges in warnings/errors.  The ranger 
and the vrp twins, are bound to change over releases with finer and 
finer ranges.  Having our error messages depend on ranges that may or 
may not change can be confusing for users, and it means we must change 
regression tests to match a changing compiler at each release.


I would much rather see a generic warning of "P may be out of bounds" 
than "P may be [X,Y][Z,A][C,D]".  That's just confusing :).  Also, note 
that the ranger can come up with ranges that are completely 

Re: JUMP_LABEL returns NULL for the just created jump instruction

2020-09-10 Thread Richard Sandiford
Anton Youdkevitch  writes:
> Folks,
>
> I'm trying to deal with CFG construction at RTL level and I bumped into a
> problem
> when I created a jump to a certain label. After the jump is created I try
> to extract the
> label using JUMP_LABEL but I get nothing.
>
> The code looks like like this:
>
> begin_sequence ();
> code_label lab = gen_label_rtx ()
> rtx x = gen_rtx_GT (...);
> x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, lab));
> rtx_insn *j = emit_jump_insn (gen_rtx_SET (x));
> end_sequence ();
>
> rtx lab1 = JUMP_LABEL (j)  // <--- I get NULL here
>
> What am I missing?

JUMP_LABEL is normally set up later, see the rebuild_jump_labels call
in pass_expand::execute.  You can force it to happen “early” for new
sequences by using rebuild_jump_labels_chain.

Thanks,
Richard


Re: JUMP_LABEL returns NULL for the just created jump instruction

2020-09-10 Thread Anton Youdkevitch
Richard,

On Thu, Sep 10, 2020 at 11:22 AM Richard Sandiford <
richard.sandif...@arm.com> wrote:

> Anton Youdkevitch  writes:
> > Folks,
> >
> > I'm trying to deal with CFG construction at RTL level and I bumped into a
> > problem
> > when I created a jump to a certain label. After the jump is created I try
> > to extract the
> > label using JUMP_LABEL but I get nothing.
> >
> > The code looks like like this:
> >
> > begin_sequence ();
> > code_label lab = gen_label_rtx ()
> > rtx x = gen_rtx_GT (...);
> > x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, lab));
> > rtx_insn *j = emit_jump_insn (gen_rtx_SET (x));
> > end_sequence ();
> >
> > rtx lab1 = JUMP_LABEL (j)  // <--- I get NULL here
> >
> > What am I missing?
>
> JUMP_LABEL is normally set up later, see the rebuild_jump_labels call
> in pass_expand::execute.  You can force it to happen “early” for new
> sequences by using rebuild_jump_labels_chain.
>
Thanks a lot for the advice! I will try to do as you suggested.

-- 
  Anton


Re: A problem with one instruction multiple latencies and pipelines

2020-09-10 Thread Richard Sandiford
Segher Boessenkool  writes:
> Hi!
>
> On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote:
>> This is just personal opinion, but in general (from the point of view
>> of a new port, or a new subport like SVE), I think the best approach
>> to handling the "type" attribute is to start with the coarsest
>> classification that makes sense, then split these relatively coarse
>> types up whenever there's a specific need.
>
> Agreed.
>
>> When taking that approach, it's OK (and perhaps even a good sign)
>> for an existing type to sometimes be too coarse for a new CPU.
>> 
>> So thanks for asking about this, and please don't feel constrained
>> by the existing "type" classification.  IMO we should split existing
>> types wherever that makes sense for new CPUs.
>
> You can also use some other attributes to classify instructions, you
> don't have to put it all in one "type" attribute.  This can of course be
> done later, at a time when it is clearer what a good design will be.
> Sometimes it is obvious from the start though :-)
>
> (This primarily makes the pipeline descriptions much simpler, but also
> custom scheduling code and the like.  If one core has two types of "A"
> insn, say "Aa" and "Ab", it isn't nice if all other cores now have to
> handle both "Aa" and "Ab" instead of just "A").

I guess it makes the description of other cores more verbose, but it
doesn't IMO make them more complicated.  It's still just one test
against one attribute.  And updating existing descriptions can be
done automatically by sed.

The difficulty with splitting into subattributes is that the tests in
the cores that *do* care then become more complicated.  E.g. you have
to do:

   (ior (and (eq_attr "type" "foo")
 (eq_attr "foo_subtype" "foo1"))
(eq_attr "type" "...others.."))

and:

   (ior (and (eq_attr "type" "foo")
 (eq_attr "foo_subtype" "!foo1"))
(eq_attr "type" "...others.."))

instead of just:

   (eq_attr "type" "...")

in both cases.

It's not too bad when it's just one subtype (like above), but it could
easily get out of hand.

Perhaps in this case there's an argument in favour of having a separate
attribute due to combinatorial explosion.  E.g. we have:

- alu_shift_imm
- alus_shift_imm
- logic_shift_imm
- logics_shift_imm

so having one attribute that describes the shift in all four cases
would perhaps be better than splitting each of them into two.
Arguments in the other direction:

- Once we have a separate attribute, it isn't clear where the line
  should be drawn.  Alternatives include:

  (1) keeping the new attribute specific to shift immediates only

  (2) also having a “register” value, and folding:
  alu_shift_imm, alu_shift_reg -> alu_shift

  (3) also having a “none” value, and doing away with the *_shift
  type variants altogether:
  alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg

  I think we should have a clear reason for whichever we pick,
  otherwise we could be creating technical debt for later.

- That approach is the opposite of what we did for the neon_* attributes:
  every type has a q variant, rather than the size being a separate
  attribute.

FWIW, we shouldn't assume that this distinction is specific to a64fx. :-)

Thanks,
Richard


Re: subreg vs vec_select

2020-09-10 Thread Ilya Leoshkevich via Gcc
On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> Hi Ilya,
> 
> On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc
> wrote:
> > I have a vector pseudo containing a single 128-bit value (V1TFmode)
> > and
> > I need to access its last 64 bits (DFmode). Which of the two
> > options
> > is better?
> > 
> > (subreg:DF (reg:V1TF) 8)
> > 
> > or
> > 
> > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > 1)]))
> > 
> > If I use the first one, I run into a problem with set_noop_p (): it
> > thinks that
> > 
> > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > 
> > is a no-op, because it doesn't check the mode after stripping the
> > subreg:
> > 
> > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > 
> > However this is not correct, because SET_DEST is the second
> > register in
> > a register pair, and SET_SRC is half of a vector register that
> > overlaps
> > the first register in the corresponding pair. So it looks as if
> > mode
> > needs to be considered there.
> 
> Yes.
> 
> > This helps:
> > 
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > return 0;
> >src = SUBREG_REG (src);
> >dst = SUBREG_REG (dst);
> > +  if (GET_MODE (src) != GET_MODE (dst))
> > +   return 0;
> >  }
> > 
> > but I'm not sure whether I'm not missing something about subreg
> > semantics in the first place.
> 
> You probably should just see if both modes are the same number of
> hard
> registers?  HARD_REGNO_NREGS.

I've refined my patch as follows:

--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
return 0;
   src = SUBREG_REG (src);
   dst = SUBREG_REG (dst);
+  if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
+ && HARD_REGISTER_P (dst)
+ && hard_regno_nregs (REGNO (src), GET_MODE (src))
+!= hard_regno_nregs (REGNO (dst), GET_MODE (dst)))
+   return 0;
 }
 
This also helps, and is less restrictive than my first variant.
Two questions, just for my understanding:

1) This mode confusion problem must never happen to pseudos, because,
   unlike hard registers, pseudos must be always referred to in their
   natural mode.  Is this correct?

2) Can there be a hypothetical machine, where modes XF and YF refer to
   64-bit and 128-bit register pairs respectively?  This would cause
   the mode confusion problem again.  Is there anything in RTL
   semantics that would prohibit existence of such modes?

Best regards,
Ilya



gcc-7-arm vendor branch

2020-09-10 Thread Pop, Sebastian via Gcc
Hi,
I would like to thank David Edelsohn for his help during the process of
getting Amazon’s contributions to GCC covered by an FSF copyright assignment.

As a first contribution, I would like to create a vendor branch for gcc-7
that will contain back-ported patches for Arm such as -moutline-atomics flag
and missing NEON intrinsics.
The branch will facilitate integration of these important changes in systems
such as Ubuntu 18.04 and Amazon Linux 2 that are still relying on gcc-7.
I will ask reviews and recommendations from Arm maintainers for the patches
to be integrated in the gcc-7-arm branch.

Thanks,
Sebastian



Re: subreg vs vec_select

2020-09-10 Thread Segher Boessenkool
Hi!

On Thu, Sep 10, 2020 at 12:21:47PM +0200, Ilya Leoshkevich wrote:
> On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote:
> > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc
> > wrote:
> > > I have a vector pseudo containing a single 128-bit value (V1TFmode)
> > > and
> > > I need to access its last 64 bits (DFmode). Which of the two
> > > options
> > > is better?
> > > 
> > > (subreg:DF (reg:V1TF) 8)
> > > 
> > > or
> > > 
> > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int
> > > 1)]))
> > > 
> > > If I use the first one, I run into a problem with set_noop_p (): it
> > > thinks that
> > > 
> > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8))
> > > 
> > > is a no-op, because it doesn't check the mode after stripping the
> > > subreg:
> > > 
> > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616
> > > 
> > > However this is not correct, because SET_DEST is the second
> > > register in
> > > a register pair, and SET_SRC is half of a vector register that
> > > overlaps
> > > the first register in the corresponding pair. So it looks as if
> > > mode
> > > needs to be considered there.
> > 
> > Yes.
> > 
> > > This helps:
> > > 
> > > --- a/gcc/rtlanal.c
> > > +++ b/gcc/rtlanal.c
> > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set)
> > > return 0;
> > >src = SUBREG_REG (src);
> > >dst = SUBREG_REG (dst);
> > > +  if (GET_MODE (src) != GET_MODE (dst))
> > > +   return 0;
> > >  }
> > > 
> > > but I'm not sure whether I'm not missing something about subreg
> > > semantics in the first place.
> > 
> > You probably should just see if both modes are the same number of
> > hard
> > registers?  HARD_REGNO_NREGS.
> 
> I've refined my patch as follows:
> 
> --- a/gcc/rtlanal.c
> +++ b/gcc/rtlanal.c
> @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set)
> return 0;
>src = SUBREG_REG (src);
>dst = SUBREG_REG (dst);
> +  if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst)
> + && HARD_REGISTER_P (dst)
> + && hard_regno_nregs (REGNO (src), GET_MODE (src))
> +!= hard_regno_nregs (REGNO (dst), GET_MODE (dst)))

(The "!" should align with the "h".)

> +   return 0;
>  }

Looks good to me, thanks!

> This also helps, and is less restrictive than my first variant.
> Two questions, just for my understanding:
> 
> 1) This mode confusion problem must never happen to pseudos, because,
>unlike hard registers, pseudos must be always referred to in their
>natural mode.  Is this correct?

Only hard registers can be accessed in more than one mode, yes.  To
access pseudos in another mode you need a subreg.

> 2) Can there be a hypothetical machine, where modes XF and YF refer to
>64-bit and 128-bit register pairs respectively?  This would cause
>the mode confusion problem again.  Is there anything in RTL
>semantics that would prohibit existence of such modes?

They would have to be the same hard register number as well.  Not very
likely, but I don't see why it couldn't happen theoretically.


Segher


gcc-8-20200910 is now available

2020-09-10 Thread GCC Administrator via Gcc
Snapshot gcc-8-20200910 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/8-20200910/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 8 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 
revision 945644e2ab0cda9117deb6b66da0b24d6e75a5d8

You'll find:

 gcc-8-20200910.tar.xzComplete GCC

  SHA256=c1abca7ae056fbdd784f0b9f86a91777dcf0500a8d0dccc66a7ccdfc5e6e657b
  SHA1=4300a8a9056eb19d65421d73f55a2cf7971cde1c

Diffs from 8-20200903 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: A problem with one instruction multiple latencies and pipelines

2020-09-10 Thread Segher Boessenkool
On Thu, Sep 10, 2020 at 11:04:26AM +0100, Richard Sandiford wrote:
> Segher Boessenkool  writes:
> > You can also use some other attributes to classify instructions, you
> > don't have to put it all in one "type" attribute.  This can of course be
> > done later, at a time when it is clearer what a good design will be.
> > Sometimes it is obvious from the start though :-)
> >
> > (This primarily makes the pipeline descriptions much simpler, but also
> > custom scheduling code and the like.  If one core has two types of "A"
> > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to
> > handle both "Aa" and "Ab" instead of just "A").
> 
> I guess it makes the description of other cores more verbose, but it
> doesn't IMO make them more complicated.  It's still just one test
> against one attribute.  And updating existing descriptions can be
> done automatically by sed.

Consider cores that do not care about the "subtype" at all: when using
just "type", all cores have to test for "foo,foo_subtype", while with
a separate attribute they can just test for "foo".

A typical example in rs6000 is the "sign_extend" attribute for load
instructions.  All cores before power4 do not care at all about it.
(Load and store insns get into combinatorial explosion land as well,
as you discuss below, with "update" and "indexed" forms).

> The difficulty with splitting into subattributes is that the tests in
> the cores that *do* care then become more complicated.  E.g. you have
> to do:
> 
>(ior (and (eq_attr "type" "foo")
>  (eq_attr "foo_subtype" "foo1"))
> (eq_attr "type" "...others.."))
> 
> and:
> 
>(ior (and (eq_attr "type" "foo")
>  (eq_attr "foo_subtype" "!foo1"))
> (eq_attr "type" "...others.."))
> 
> instead of just:
> 
>(eq_attr "type" "...")
> 
> in both cases.

Yes.  It is a trade-off.

> It's not too bad when it's just one subtype (like above), but it could
> easily get out of hand.
> 
> Perhaps in this case there's an argument in favour of having a separate
> attribute due to combinatorial explosion.  E.g. we have:
> 
> - alu_shift_imm
> - alus_shift_imm
> - logic_shift_imm
> - logics_shift_imm
> 
> so having one attribute that describes the shift in all four cases
> would perhaps be better than splitting each of them into two.

Yeas, exactly.  And for rs6000 we *did* have many more types before, and
very frequently one was missed (in a scheduling description usually).
Especially common was when some new type attribute was added, not all
existing cpu descriptions were updated correctly.  Maybe this is the
strongest argument "for" actually :-)

> Arguments in the other direction:
> 
> - Once we have a separate attribute, it isn't clear where the line
>   should be drawn.

Good taste ;-)

>  Alternatives include:
> 
>   (1) keeping the new attribute specific to shift immediates only
> 
>   (2) also having a “register” value, and folding:
>   alu_shift_imm, alu_shift_reg -> alu_shift
> 
>   (3) also having a “none” value, and doing away with the *_shift
>   type variants altogether:
>   alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg
> 
>   I think we should have a clear reason for whichever we pick,
>   otherwise we could be creating technical debt for later.

Yes.

One example of what we do:

;; Is this instruction using operands[2] as shift amount, and can that be a
;; register?
;; This is used for shift insns.
(define_attr "maybe_var_shift" "no,yes" (const_string "no"))

;; Is this instruction using a shift amount from a register?
;; This is used for shift insns.
(define_attr "var_shift" "no,yes"
  (if_then_else (and (eq_attr "type" "shift")
 (eq_attr "maybe_var_shift" "yes"))
(if_then_else (match_operand 2 "gpc_reg_operand")
  (const_string "yes")
  (const_string "no"))
(const_string "no")))

define_insns only use maybe_var_shift.  Only the power6 and e*500* cpu
descriptions ever look at var_shift.  (Directly.  There is some other
scheduling code that looks at it too -- and all but the power6 ones seem
to be incorrect!  That is all a direct translation of "type-only" code
there was before...)

> - That approach is the opposite of what we did for the neon_* attributes:
>   every type has a q variant, rather than the size being a separate
>   attribute.
> 
> FWIW, we shouldn't assume that this distinction is specific to a64fx. :-)

Yeah, absolutely.  This classifies instructions, of course it is very
strongly influenced by what scheduling descriptions want, but it helps
a lot if you describe more general characteristics :-)


Segher