Re: Function signatures in extern "C".
* Jakub Jelinek via Gcc: > On Mon, Sep 07, 2020 at 10:27:13AM +0100, Jonathan Wakely via Gcc wrote: >> On Mon, 7 Sep 2020 at 09:18, Iain Sandoe wrote: >> > >> > Perhaps the PR should be reopened with “accepts invalid”? >> >> My impression from the PR is that the reporter was using a different >> ABI, where the name isn't reserved. Maybe the testcase should only be >> accepted with -fno-threadsafe-statics or -ffreestanding or something >> to say "I'm doing things differently". >> >> Or we could just say that G++ reserves the Itanium ABI names >> unconditionally, even if it doesn't need to use them, in which case it >> would be accepts-invalid. > > All identifiers starting with two underscores are reserved for the > implementation already. But which implementation? __ identifiers are used heavily across the GNU project, not just in GCC and glibc (as one would expect). A lot of C software outside the GNU project is similar. I think this attempt at namespace management has failed. For the Itanium C++ ABI symbols, it would be useful to document which ones can be user-defined (which can be very interesting to avoid a dependency on libstdc++). I do not know how much value there is in supporting a semantically different definition, or a declaration with different types (probably not much). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
Re: irange best practices document
On 9/9/20 7:03 PM, Martin Sebor wrote: On 9/3/20 1:14 AM, Aldy Hernandez via Gcc wrote: Below is a documented we have drafted to provide guidance on using irange's and converting passes to it. This will future proof any such passes so that they will work with the ranger, or any other mechanism using multiple sub-ranges (as opposed to the more limited value_range). The official document will live here, but is included below for discussion: https://gcc.gnu.org/wiki/irange-best-practices Feel free to respond to this post with any questions or comments. Thanks for the writeup! The biggest question on my mind at the moment is probably about how to go about rewriting classes that maintain their own ranges (most often as an array of two offset_int) to use one of the Ranger classes. Two examples are the access_ref class in builtins.h and the builtin_memref class in gimple-ssa-warn-restrict.c. Both of these compute the size of an object (as a simple range) and the offset into it (also as a range). In the future, they will track of sizes of multiple objects (from PHI nodes). My thinking is to do this in two steps: In 1) replace each offset_int[2] member array with an instance of int_range<1> and then rewrite all the code that manipulates the array with calls to the ranger API. That should be fairly straightforward. In 2) This sounds like a sound approach. Instead of passing pairs of integers, pass an irange and limit yourself to the non-deprecated part of the API. replace the simple int_range<1> with something more interesting (int_range_max?) and rewrite the final code that processes The ranger will use whatever sized range you pass it. So if you want fine granularity, pass it an int_range_max. However, if your final destination in memory is constrained (you're storing a gazillion of these), you can use the irange_pool to allocate the minimum amount of storage needed. Note, that the irange_pool has not been contributed yet. It'll come with the ranger. Taking the strlen code as an example, you could transform the following: vr_values *v = CONST_CAST (vr_values *, rvals); const value_range_equiv *vr = v->get_value_range (si->nonzero_chars); if (vr->kind () != VR_RANGE || !range_int_cst_p (vr)) return false; ...into: int_range_max r; if (!ranger->range_of_expr (r, si->nonzero_chars, si->stmt)) return false; Where ranger is declared somewhere at the top of your pass. Perhaps in printf_strlen_execute() and pass it around like you do rvals. Note that for better precision, you should pass the gimple context. That is, the statement from which si->nonzero_chars came from. In the above case, I think you want si->stmt. Currently get_value_range() has a STMT parameter which is unused. This was the reason for providing such parameter. Finally... you can store R into an int_range depending on how much space you want to use in your ultimate storage location. The assignment operator will squish R into whatever space you give it. So the following will truncate things appropriately: int_range<3> final_location = r; However, if you're using irange_pool to allocate just the amount of space needed, you could do: // Presumably declared wherever you declared the ranger. irange_pool pool; ... ... irange *p = pool.allocate (r); p will hold a pointer to a range that contains R with no extra space. This is ideal for longer term storage. For intermediate results, use int_range_max. Note that I am already converting sprintf/strlen to use the ranger instead of evrp/vr_values (aldyh/ranger-staging git branch). So the work of converting to range_of_expr above, and passing around gimple context, will already be done with the contribution of ranger. What I will not provide for now, is the wholesale conversion of other passes to the "clean" irange API. So if your pass is still using kind() and stashing away pairs of offset_int's, that will have to be done separately. I can offer guidance though, and help as time permits. the results to scan all the subranges for overflow or overlap, as well as the code that presents them in warnings/notes. It would be nice to have support for the formatting of ranges in the pretty-printer to cut down on the repetitive tests that determine how to format a constant (%E, vs a range [%E, %E], vs a multi-range [%E, %E][%E, %E], ...[%E, %E]). I'm not a big fan of including ranges in warnings/errors. The ranger and the vrp twins, are bound to change over releases with finer and finer ranges. Having our error messages depend on ranges that may or may not change can be confusing for users, and it means we must change regression tests to match a changing compiler at each release. I would much rather see a generic warning of "P may be out of bounds" than "P may be [X,Y][Z,A][C,D]". That's just confusing :). Also, note that the ranger can come up with ranges that are completely
Re: JUMP_LABEL returns NULL for the just created jump instruction
Anton Youdkevitch writes: > Folks, > > I'm trying to deal with CFG construction at RTL level and I bumped into a > problem > when I created a jump to a certain label. After the jump is created I try > to extract the > label using JUMP_LABEL but I get nothing. > > The code looks like like this: > > begin_sequence (); > code_label lab = gen_label_rtx () > rtx x = gen_rtx_GT (...); > x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, lab)); > rtx_insn *j = emit_jump_insn (gen_rtx_SET (x)); > end_sequence (); > > rtx lab1 = JUMP_LABEL (j) // <--- I get NULL here > > What am I missing? JUMP_LABEL is normally set up later, see the rebuild_jump_labels call in pass_expand::execute. You can force it to happen “early” for new sequences by using rebuild_jump_labels_chain. Thanks, Richard
Re: JUMP_LABEL returns NULL for the just created jump instruction
Richard, On Thu, Sep 10, 2020 at 11:22 AM Richard Sandiford < richard.sandif...@arm.com> wrote: > Anton Youdkevitch writes: > > Folks, > > > > I'm trying to deal with CFG construction at RTL level and I bumped into a > > problem > > when I created a jump to a certain label. After the jump is created I try > > to extract the > > label using JUMP_LABEL but I get nothing. > > > > The code looks like like this: > > > > begin_sequence (); > > code_label lab = gen_label_rtx () > > rtx x = gen_rtx_GT (...); > > x = gen_rtx_IF_THEN_ELSE (VOIDmode, x, gen_rtx_LABEL_REF (Pmode, lab)); > > rtx_insn *j = emit_jump_insn (gen_rtx_SET (x)); > > end_sequence (); > > > > rtx lab1 = JUMP_LABEL (j) // <--- I get NULL here > > > > What am I missing? > > JUMP_LABEL is normally set up later, see the rebuild_jump_labels call > in pass_expand::execute. You can force it to happen “early” for new > sequences by using rebuild_jump_labels_chain. > Thanks a lot for the advice! I will try to do as you suggested. -- Anton
Re: A problem with one instruction multiple latencies and pipelines
Segher Boessenkool writes: > Hi! > > On Mon, Sep 07, 2020 at 09:20:59PM +0100, Richard Sandiford wrote: >> This is just personal opinion, but in general (from the point of view >> of a new port, or a new subport like SVE), I think the best approach >> to handling the "type" attribute is to start with the coarsest >> classification that makes sense, then split these relatively coarse >> types up whenever there's a specific need. > > Agreed. > >> When taking that approach, it's OK (and perhaps even a good sign) >> for an existing type to sometimes be too coarse for a new CPU. >> >> So thanks for asking about this, and please don't feel constrained >> by the existing "type" classification. IMO we should split existing >> types wherever that makes sense for new CPUs. > > You can also use some other attributes to classify instructions, you > don't have to put it all in one "type" attribute. This can of course be > done later, at a time when it is clearer what a good design will be. > Sometimes it is obvious from the start though :-) > > (This primarily makes the pipeline descriptions much simpler, but also > custom scheduling code and the like. If one core has two types of "A" > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to > handle both "Aa" and "Ab" instead of just "A"). I guess it makes the description of other cores more verbose, but it doesn't IMO make them more complicated. It's still just one test against one attribute. And updating existing descriptions can be done automatically by sed. The difficulty with splitting into subattributes is that the tests in the cores that *do* care then become more complicated. E.g. you have to do: (ior (and (eq_attr "type" "foo") (eq_attr "foo_subtype" "foo1")) (eq_attr "type" "...others..")) and: (ior (and (eq_attr "type" "foo") (eq_attr "foo_subtype" "!foo1")) (eq_attr "type" "...others..")) instead of just: (eq_attr "type" "...") in both cases. It's not too bad when it's just one subtype (like above), but it could easily get out of hand. Perhaps in this case there's an argument in favour of having a separate attribute due to combinatorial explosion. E.g. we have: - alu_shift_imm - alus_shift_imm - logic_shift_imm - logics_shift_imm so having one attribute that describes the shift in all four cases would perhaps be better than splitting each of them into two. Arguments in the other direction: - Once we have a separate attribute, it isn't clear where the line should be drawn. Alternatives include: (1) keeping the new attribute specific to shift immediates only (2) also having a “register” value, and folding: alu_shift_imm, alu_shift_reg -> alu_shift (3) also having a “none” value, and doing away with the *_shift type variants altogether: alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg I think we should have a clear reason for whichever we pick, otherwise we could be creating technical debt for later. - That approach is the opposite of what we did for the neon_* attributes: every type has a q variant, rather than the size being a separate attribute. FWIW, we shouldn't assume that this distinction is specific to a64fx. :-) Thanks, Richard
Re: subreg vs vec_select
On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > Hi Ilya, > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc > wrote: > > I have a vector pseudo containing a single 128-bit value (V1TFmode) > > and > > I need to access its last 64 bits (DFmode). Which of the two > > options > > is better? > > > > (subreg:DF (reg:V1TF) 8) > > > > or > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int > > 1)])) > > > > If I use the first one, I run into a problem with set_noop_p (): it > > thinks that > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8)) > > > > is a no-op, because it doesn't check the mode after stripping the > > subreg: > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > However this is not correct, because SET_DEST is the second > > register in > > a register pair, and SET_SRC is half of a vector register that > > overlaps > > the first register in the corresponding pair. So it looks as if > > mode > > needs to be considered there. > > Yes. > > > This helps: > > > > --- a/gcc/rtlanal.c > > +++ b/gcc/rtlanal.c > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > return 0; > >src = SUBREG_REG (src); > >dst = SUBREG_REG (dst); > > + if (GET_MODE (src) != GET_MODE (dst)) > > + return 0; > > } > > > > but I'm not sure whether I'm not missing something about subreg > > semantics in the first place. > > You probably should just see if both modes are the same number of > hard > registers? HARD_REGNO_NREGS. I've refined my patch as follows: --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) return 0; src = SUBREG_REG (src); dst = SUBREG_REG (dst); + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst) + && HARD_REGISTER_P (dst) + && hard_regno_nregs (REGNO (src), GET_MODE (src)) +!= hard_regno_nregs (REGNO (dst), GET_MODE (dst))) + return 0; } This also helps, and is less restrictive than my first variant. Two questions, just for my understanding: 1) This mode confusion problem must never happen to pseudos, because, unlike hard registers, pseudos must be always referred to in their natural mode. Is this correct? 2) Can there be a hypothetical machine, where modes XF and YF refer to 64-bit and 128-bit register pairs respectively? This would cause the mode confusion problem again. Is there anything in RTL semantics that would prohibit existence of such modes? Best regards, Ilya
gcc-7-arm vendor branch
Hi, I would like to thank David Edelsohn for his help during the process of getting Amazon’s contributions to GCC covered by an FSF copyright assignment. As a first contribution, I would like to create a vendor branch for gcc-7 that will contain back-ported patches for Arm such as -moutline-atomics flag and missing NEON intrinsics. The branch will facilitate integration of these important changes in systems such as Ubuntu 18.04 and Amazon Linux 2 that are still relying on gcc-7. I will ask reviews and recommendations from Arm maintainers for the patches to be integrated in the gcc-7-arm branch. Thanks, Sebastian
Re: subreg vs vec_select
Hi! On Thu, Sep 10, 2020 at 12:21:47PM +0200, Ilya Leoshkevich wrote: > On Wed, 2020-09-09 at 16:09 -0500, Segher Boessenkool wrote: > > On Wed, Sep 09, 2020 at 11:50:56AM +0200, Ilya Leoshkevich via Gcc > > wrote: > > > I have a vector pseudo containing a single 128-bit value (V1TFmode) > > > and > > > I need to access its last 64 bits (DFmode). Which of the two > > > options > > > is better? > > > > > > (subreg:DF (reg:V1TF) 8) > > > > > > or > > > > > > (vec_select:DF (subreg:V2DF (reg:V1TF) 0) (parallel [(const_int > > > 1)])) > > > > > > If I use the first one, I run into a problem with set_noop_p (): it > > > thinks that > > > > > > (set (subreg:DF (reg:TF %f0) 8) (subreg:DF (reg:V1TF %f0) 8)) > > > > > > is a no-op, because it doesn't check the mode after stripping the > > > subreg: > > > > > > https://gcc.gnu.org/git/?p=gcc.git;a=blob;f=gcc/rtlanal.c;h=5ae38b79#l1616 > > > > > > However this is not correct, because SET_DEST is the second > > > register in > > > a register pair, and SET_SRC is half of a vector register that > > > overlaps > > > the first register in the corresponding pair. So it looks as if > > > mode > > > needs to be considered there. > > > > Yes. > > > > > This helps: > > > > > > --- a/gcc/rtlanal.c > > > +++ b/gcc/rtlanal.c > > > @@ -1619,6 +1619,8 @@ set_noop_p (const_rtx set) > > > return 0; > > >src = SUBREG_REG (src); > > >dst = SUBREG_REG (dst); > > > + if (GET_MODE (src) != GET_MODE (dst)) > > > + return 0; > > > } > > > > > > but I'm not sure whether I'm not missing something about subreg > > > semantics in the first place. > > > > You probably should just see if both modes are the same number of > > hard > > registers? HARD_REGNO_NREGS. > > I've refined my patch as follows: > > --- a/gcc/rtlanal.c > +++ b/gcc/rtlanal.c > @@ -1619,6 +1619,11 @@ set_noop_p (const_rtx set) > return 0; >src = SUBREG_REG (src); >dst = SUBREG_REG (dst); > + if (REG_P (src) && HARD_REGISTER_P (src) && REG_P (dst) > + && HARD_REGISTER_P (dst) > + && hard_regno_nregs (REGNO (src), GET_MODE (src)) > +!= hard_regno_nregs (REGNO (dst), GET_MODE (dst))) (The "!" should align with the "h".) > + return 0; > } Looks good to me, thanks! > This also helps, and is less restrictive than my first variant. > Two questions, just for my understanding: > > 1) This mode confusion problem must never happen to pseudos, because, >unlike hard registers, pseudos must be always referred to in their >natural mode. Is this correct? Only hard registers can be accessed in more than one mode, yes. To access pseudos in another mode you need a subreg. > 2) Can there be a hypothetical machine, where modes XF and YF refer to >64-bit and 128-bit register pairs respectively? This would cause >the mode confusion problem again. Is there anything in RTL >semantics that would prohibit existence of such modes? They would have to be the same hard register number as well. Not very likely, but I don't see why it couldn't happen theoretically. Segher
gcc-8-20200910 is now available
Snapshot gcc-8-20200910 is now available on https://gcc.gnu.org/pub/gcc/snapshots/8-20200910/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 8 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-8 revision 945644e2ab0cda9117deb6b66da0b24d6e75a5d8 You'll find: gcc-8-20200910.tar.xzComplete GCC SHA256=c1abca7ae056fbdd784f0b9f86a91777dcf0500a8d0dccc66a7ccdfc5e6e657b SHA1=4300a8a9056eb19d65421d73f55a2cf7971cde1c Diffs from 8-20200903 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: A problem with one instruction multiple latencies and pipelines
On Thu, Sep 10, 2020 at 11:04:26AM +0100, Richard Sandiford wrote: > Segher Boessenkool writes: > > You can also use some other attributes to classify instructions, you > > don't have to put it all in one "type" attribute. This can of course be > > done later, at a time when it is clearer what a good design will be. > > Sometimes it is obvious from the start though :-) > > > > (This primarily makes the pipeline descriptions much simpler, but also > > custom scheduling code and the like. If one core has two types of "A" > > insn, say "Aa" and "Ab", it isn't nice if all other cores now have to > > handle both "Aa" and "Ab" instead of just "A"). > > I guess it makes the description of other cores more verbose, but it > doesn't IMO make them more complicated. It's still just one test > against one attribute. And updating existing descriptions can be > done automatically by sed. Consider cores that do not care about the "subtype" at all: when using just "type", all cores have to test for "foo,foo_subtype", while with a separate attribute they can just test for "foo". A typical example in rs6000 is the "sign_extend" attribute for load instructions. All cores before power4 do not care at all about it. (Load and store insns get into combinatorial explosion land as well, as you discuss below, with "update" and "indexed" forms). > The difficulty with splitting into subattributes is that the tests in > the cores that *do* care then become more complicated. E.g. you have > to do: > >(ior (and (eq_attr "type" "foo") > (eq_attr "foo_subtype" "foo1")) > (eq_attr "type" "...others..")) > > and: > >(ior (and (eq_attr "type" "foo") > (eq_attr "foo_subtype" "!foo1")) > (eq_attr "type" "...others..")) > > instead of just: > >(eq_attr "type" "...") > > in both cases. Yes. It is a trade-off. > It's not too bad when it's just one subtype (like above), but it could > easily get out of hand. > > Perhaps in this case there's an argument in favour of having a separate > attribute due to combinatorial explosion. E.g. we have: > > - alu_shift_imm > - alus_shift_imm > - logic_shift_imm > - logics_shift_imm > > so having one attribute that describes the shift in all four cases > would perhaps be better than splitting each of them into two. Yeas, exactly. And for rs6000 we *did* have many more types before, and very frequently one was missed (in a scheduling description usually). Especially common was when some new type attribute was added, not all existing cpu descriptions were updated correctly. Maybe this is the strongest argument "for" actually :-) > Arguments in the other direction: > > - Once we have a separate attribute, it isn't clear where the line > should be drawn. Good taste ;-) > Alternatives include: > > (1) keeping the new attribute specific to shift immediates only > > (2) also having a “register” value, and folding: > alu_shift_imm, alu_shift_reg -> alu_shift > > (3) also having a “none” value, and doing away with the *_shift > type variants altogether: > alu_sreg, alu_shift_imm, alu_shift_reg -> alu_reg > > I think we should have a clear reason for whichever we pick, > otherwise we could be creating technical debt for later. Yes. One example of what we do: ;; Is this instruction using operands[2] as shift amount, and can that be a ;; register? ;; This is used for shift insns. (define_attr "maybe_var_shift" "no,yes" (const_string "no")) ;; Is this instruction using a shift amount from a register? ;; This is used for shift insns. (define_attr "var_shift" "no,yes" (if_then_else (and (eq_attr "type" "shift") (eq_attr "maybe_var_shift" "yes")) (if_then_else (match_operand 2 "gpc_reg_operand") (const_string "yes") (const_string "no")) (const_string "no"))) define_insns only use maybe_var_shift. Only the power6 and e*500* cpu descriptions ever look at var_shift. (Directly. There is some other scheduling code that looks at it too -- and all but the power6 ones seem to be incorrect! That is all a direct translation of "type-only" code there was before...) > - That approach is the opposite of what we did for the neon_* attributes: > every type has a q variant, rather than the size being a separate > attribute. > > FWIW, we shouldn't assume that this distinction is specific to a64fx. :-) Yeah, absolutely. This classifies instructions, of course it is very strongly influenced by what scheduling descriptions want, but it helps a lot if you describe more general characteristics :-) Segher