Re: [RFC, LRA] Incorrect subreg resolution?
Returning to this old thread... Richard Sandiford writes: > Tejas Belagod writes: >> When I relaxed CANNOT_CHANGE_MODE_CLASS to undefined for AArch64, >> gcc.c-torture/execute/copysign1.c generates incorrect code because LRA >> cannot >> seem to handle subregs like >> >> (subreg:DI (reg:TF hard_reg) 8) >> >> on hard registers where the subreg byte offset is unaligned to a hard >> register >> boundary(16 for AArch64). It seems to quietly ignore the 8 and resolves this >> to >> incorrect an hard register during reload. >> >> When I compile this test with -O3, >> >> long double >> cl (long double x, long double y) >> { >>return __builtin_copysignl (x, y); >> } >> >> cs.c.213r.ira: >> >> (insn 26 10 33 2 (set (reg:DI 87 [ y+8 ]) >> (subreg:DI (reg:TF 33 v1 [ y ]) 8)) cs.c:4 34 {*movdi_aarch64} >> (expr_list:REG_DEAD (reg:TF 33 v1 [ y ]) >> (nil))) >> (insn 33 26 35 2 (set (reg:TF 93) >> (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64} >> (expr_list:REG_DEAD (reg:TF 32 v0 [ x ]) >> (nil))) >> (insn 35 33 34 2 (set (reg:DI 92 [ x+8 ]) >> (subreg:DI (reg:TF 93) 8)) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> (insn 34 35 23 2 (set (reg:DI 91 [ x ]) >> (subreg:DI (reg:TF 93) 0)) cs.c:4 34 {*movdi_aarch64} >> (expr_list:REG_DEAD (reg:TF 93) >> (nil))) >> >> >> cs.c.214r.reload >> >> (insn 26 10 33 2 (set (reg:DI 2 x2 [orig:87 y+8 ] [87]) >> (reg:DI 33 v1 [ y+8 ])) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> (insn 33 26 35 2 (set (reg:TF 0 x0 [93]) >> (reg:TF 32 v0 [ x ])) cs.c:4 40 {*movtf_aarch64} >> (nil)) >> (insn 35 33 34 2 (set (reg:DI 1 x1 [orig:92 x+8 ] [92]) >> (reg:DI 1 x1 [+8 ])) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> (insn 34 35 8 2 (set (reg:DI 0 x0 [orig:91 x ] [91]) >> (reg:DI 0 x0 [93])) cs.c:4 34 {*movdi_aarch64} >> (nil)) >> . >> >> You can see the changes to insn 26 before and after reload - the SUBREG_BYTE >> offset of 8 seems to have been translated to v0 instead of v0.d[1] by >> get_hard_regno (). >> >> What's interesting here is that the SUBREG_BYTE that is generated for >> >> (subreg:DI (reg:TF 33 v1 [ y ]) 8) >> >> isn't aligned to a hard register boundary on SIMD regs where UNITS_PER_VREG >> for >> AArch64 is 16. Therefore when this subreg is resolved, it resolves to v1 >> instead >> of v1.d[1]. Is this something going wrong in LRA or is this a more >> fundamental >> problem with generating subregs of hard regs with unaligned subreg byte >> offsets? >> The same subreg on a pseudo works OK because in insn 33, the TF mode is >> allocated integer registers and all is well. > > I think this is the same problem that was being discussed for x86 > after your no-op vec-select patch: > >http://gcc.gnu.org/ml/gcc-patches/2013-12/msg00801.html > > and long following thread. > > I'd still like to solve this in a target-independent way rather than add > an offset to CANNOT_CHANGE_MODE_CLASS, but I haven't had time to look at > it... FWIW, here's one possible approach. The main part is to make the invalid_mode_change code calculate a set of registers that are either (a) invalid for the pseudo mode to begin with or (b) do not allow one of the subregs to be taken (as calculated by simplify_subreg_regno, which includes the original CANNOT_CHANGE_MODE_CLASS check). One concern might be about compilation speed when collecting this info. OTOH, the query is now genuinely constant time, whereas the old bitmap test was O(num-pseudos) in the worst case. It might also be possible to speed things up by walking the subregs using the DF information, if it's up-to-date at this point (haven't checked). It would also be possible to give an ID to each (inner mode, outer mode, byte) combination and lazily cache the invalid register set for each one. I went through the other uses of CANNOT_CHANGE_MODE_CLASS. Most of them were checking for lowpart mode changes so look safe. The exception was combine.c:subst. This is really four patches squashed into one, but it's not ready to be submitted yet. Was just wondering whether this solved your problem. Thanks, Richard *** /tmp/OCSP7f_combine.c 2014-03-11 07:34:37.928138693 + --- gcc/combine.c 2014-03-10 21:39:09.428718086 + *** subst (rtx x, rtx from, rtx to, int in_d *** 5082,5096 ) return gen_rtx_CLOBBER (VOIDmode, const0_rtx); - #ifdef CANNOT_CHANGE_MODE_CLASS if (code == SUBREG && REG_P (to) && REGNO (to) < FIRST_PSEUDO_REGISTER ! && REG_CANNOT_CHANGE_MODE_P (REGNO (to), ! GET_MODE (to), ! GET_MODE (x))) return gen_rtx_CLOBBER (VOIDmode, const0_rtx); - #endif new_rtx = (unique_copy && n_occu
Re: dom requires PROP_loops
On Mon, Mar 10, 2014 at 12:57 PM, Paulo Matos wrote: > Hello, > > In an attempt to test some optimization I destroyed the loop property in > pass_tree_loop_done and reinstated it in pass_rtl_loop_init, however then I > noticed that pass_dominator started generating wrong code. > My guess is that we should mark pass_dominator with PROP_loops as a required > property? Do you agree? No, "PROP_loops" is something artificial. Passes needing loops will compute them (call loop_optimizer_init). You probably did sth wrong with how you "destroy" PROP_loops. Richard. > Cheers, > > Paulo Matos > >
Re: status of current_pass (notably in gates) .... [possible bug in 4.9]
On Mon, Mar 10, 2014 at 1:30 PM, Basile Starynkevitch wrote: > Hello All, > > > I am a bit confused (or unhappy) about the current_pass variable > (in GCC 4.9 svn rev.208447); I believe we have some incoherency about it. > > It is generally (as it used to be in previous versions of GCC) > a global pointer to some opt_pass, declared in gcc/tree-pass.h line 590. > > It is also (and independently), a local integer in function > connect_traces file gcc/bb-reorder.c line 1042. I feel that > for readability reasons the local current_pass should be renamed > current_pass_num in the function connect_traces. > > But most importantly, I find confusing the way current_pass pointer is > globally set (and reset). The obvious policy seems to set current_pass to > "this" before calling any virtual methods on it (notably the gate and > the exec functions). > > However, if one use -fdump-passes program argument to gcc (i.e. to cc1), then > dump_passes (from gcc/passes.c line 892) gets called. It then calls function > dump_one_pass (from gcc/passes.c line 851) which does line 857 > > is_on = pass->has_gate ? pass->gate () : true; > > But in other occasions, notably in function execute_one_pass > (starting at gcc/passes.c line 2153) the global current_pass is > set (line 2166) before calling its gate function line 2170 > > gate_status = pass->has_gate ? pass->gate () : true; > > I believe something should be done about this, since it seems to confuse > plugins (like MELT). Either we decide that current_pass is always set > before calling any virtual function on it (notably the gate) or we > decide that current_pass global should disappear (but then, what > about the curr_statistics_hash function from gcc/statistics.c line 93 > which uses it line 98)? > > > Comments are welcome. I think we should do something about this before > releasing GCC 4.9... > > The simplest thing would be to set current_pass in dump_one_pass current_pass is not supposed to be accessed outside of pass management. It may or may not vanish in future and it may or may not be set to random values. Do not consider it part of the plugin API (what "API" ...). Heh, we could start marking certain decls with local visibility... Richard. > Regards. > -- > Basile STARYNKEVITCH http://starynkevitch.net/Basile/ > email: basilestarynkevitchnet mobile: +33 6 8501 2359 > 8, rue de la Faiencerie, 92340 Bourg La Reine, France > *** opinions {are only mines, sont seulement les miennes} ***
Re: [gsoc 2014] moving fold-const patterns to gimple
On Mon, Mar 10, 2014 at 7:29 PM, Prathamesh Kulkarni wrote: > Hi Richard, > Sorry for the late reply. I would like to have few clarifications > regarding the following points: > > a) Pattern matching: Currently, gimple_match_and_simplify() matches > patterns one-by-one. Could we use a decision tree to do the matching > instead (similar to insn-recog.c) ? > For the moment, let's consider pattern matching on only unary > expressions without valueize and predicates: > pattern 1: (negate (negate @0)) > pattern 2: (negate (bit_not @0)) > > from the two AST's corresponding to patterns (match::op), we can build > a decision tree: > Some-thing similar to: >NEGATE_EXPR > NEGATE_EXPRBIT_NOT_EXPR > > and then generate code corresponding to this decision tree in gimple-match.c > so the generated code should look something similar to: > > tree > gimple_match_and_simplify (enum tree_code code, tree type, tree op0, > gimple_seq *seq, tree (*valueize)(tree)) > { > if (code == NEGATE_EXPR) > { > tree captures[4] = {}; > if (TREE_CODE (op0) != SSA_NAME) > return NULL_TREE; > gimple def_stmt = SSA_NAM_DEF_STMT (op0); > if (!is_gimple_assign (def_stmt)) > return NULL_TREE; > tree op = gimple_assign_rhs1 (def_stmt); > if (gimple_assign_rhs_code (op) == NEGATE_EXPR) > { >/* pattern (negate (negate @0)) matched */ > } > else if (gimple_assign_rhs_code (op) == BIT_NOT_EXPR) > { >/* pattern (negate (bit_not_expr @0)) matched */ > } > else >return NULL_TREE; > } > else > return NULL_TREE; > } > > For commutative ops, the pattern can be duplicated by walking the > children of the node in reverse order. > (I am not exactly clear so far about representing binary operators in a > decision > tree) Is this the right way to go ? I shall try to shortly post a patch that > implements this. Yes, that's the way to go (well, I'd even use a switch ()). > b) Targeting GENERIC, separating AST from gimple/generic: > For generating a GENERIC pattern should there be another pattern > something like match_and_simplify_generic ? Yes, there is an existing API in GCC for this that operates on GENERIC. It's fold_unary_loc, fold_binary_loc, fold_ternary_loc. The interface the GENERIC match_and_simplify variant provides should match that one. > Currently, the AST data structures (operand, expr, etc.) > are tied to gimple (gen_gimple_match, gen_gimple_transform). > We could also have similar functions: gen_generic_match, > gen_generic_transform for generating GENERIC ? Yeah, but I'm not sure if keeping the (virtual) methods for generating code makes much sense with a rewritten code generator. > Instead will it be better if we separate the AST > from target IR (generic/gimple) and make simplify a visitor on AST > (make simplify > abstract class, with simplify_generic and simplify_gimple visitor > classes that generate corresponding IR code) ? Yes. Keep in mind the current state of genmatch.c is "quick hack to make playing with the API side and with patterns possible" ;) > c) Shall it be a good idea in define_match , for > name to act as a substitute for pattern (similar to flex pattern > definitions), so the name can be used in bigger patterns ? Maybe, I suppose we'll see when adding more patterns. > d) This is silly, but maybe use constants to denote corresponding tree nodes ? > for example instead of { build_int_cst (integer_type_node, 0); }, one > could directly write 0, to denote a INTEGER_CST node with value 0. Yes, that might be possible - though it will require more knowledge in the pattern matcher (you also want to match '0'?) and the code generator. > e) There was a mention on the thread, regarding testing of patterns > integrated into DSL. I wasn't able to understand that clearly. Could > you explain that briefly ? DSL? Currently I'd say it would be nice to make sure each pattern is triggered by at least one GCC testcase - this requires looking at a particular pass dump (that of forwprop or ccp are probably most suitable as they are run very early). I mentioned the possibility to do offline (thus not with C testcases) testing but that would require some tool to do that and it would be correctness testing (some automatic proof generation tool - ISTR academics have this kind of stuff). But that was just an idea. > Regarding gsoc proposal, I would like to align it on the following points: > a) Pattern matching using decision tree good. > b) Generate GIMPLE folding patterns (tree-ssa-forwprop, > tree-ssa-sccvn, gimple-fold) I'd narrow it down a bit, you can optionally do more if time permits. I'd say 0) add basic arithmetic identities (x + 0, x * 1, x / 1, etc., correctly for all types - you can look into fold-const.c which handles all of them) 1) target as much as possible of the existing transforms in forwprop 2) pieces of fold-
Re: GNU C extension: Function Error vs. Success
On 10/03/14 18:26, Shahbaz Youssefi wrote: > I'm mostly interested in C. Nevertheless, you can of course also do > the same in C: > > struct option_float > { > float value; > int error_code; > bool succeeded; > }; > > struct option_float inverse(int x) { > if (x == 0) > return (struct option_float){ .succeeded = false, .error_code = EDOM }; > return (struct option_float){ .value = 1.0f / x, .succeeded = true }; > } > > you get the idea. The difference is that it's hard to optimize the > non-error execution path if the compiler is not aware of the > semantics. You can tell the compiler about the likely paths: struct option_float inverse(int x) { if (__builtin_expect(x != 0, 1)) { return (struct option_float){ .value = 1.0f / x, .succeeded = true }; } else { return (struct option_float){ .succeeded = false, .error_code = EDOM }; } > Also, with exceptions, this can happen: > > float inverse(int x) > { > if (x == 0) > throw overflow; > return 1.0f / x; > } > > y = inverse(x); > > Which means control is taken from the function calling inverse without > it explicitly allowing it, which is not in the spirit of C. In many cases, I'd agree with you that C++ exceptions are a bit like hidden and unexpected gotos. But in situations like this, the code is in fact in the "spirit of C". If you try to find the inverse of 0, you are asking for undefined behaviour - and you are getting it. You can add extra code (checks, exceptions, etc.) to turn that undefined behaviour into defined behaviour - but without that code it is not unreasonable to pass the exception up the call stack or do other odd things. > > P.S. programming in a lot of languages is _mere syntax_ with respect > to some others. Still, some syntaxes are good and some not. If we can > improve GNU C's syntax to be shorter, but without loss of > expressiveness or clarity, then why not! > I am not sure that it would be possible to get the sort of effect you are looking for without disrupting the syntax too much for a gcc extension. Speaking as an embedded developer who often wants to get the smallest and fastest code on small processors, it would be very nice is to have the ability to return an extra flag along with the main return value of a function. Typically that would be a flag to indicate success or failure, but it might have other purposes - and it could be the only return value of an otherwise void function. Key to the implementation would be a calling convention to use a processor condition code flag here - that would let you generate optimal code for the "if (error) goto" part.
Re: GNU C extension: Function Error vs. Success
On Tue, Mar 11, 2014 at 1:26 PM, David Brown wrote: > On 10/03/14 18:26, Shahbaz Youssefi wrote: > You can tell the compiler about the likely paths: > > struct option_float inverse(int x) { > if (__builtin_expect(x != 0, 1)) { > return (struct option_float){ .value = 1.0f / x, .succeeded > = true }; > } else { > return (struct option_float){ .succeeded = false, > .error_code = > EDOM }; > } True, but I was actually referring to the fact that like this, you have to write the status to stack, where the return value resides, while with a built-in method you could do away with returning it in a register. This is not just for performance, but also to be compatible with the previous ABI. > I am not sure that it would be possible to get the sort of effect you > are looking for without disrupting the syntax too much for a gcc extension. > > Speaking as an embedded developer who often wants to get the smallest > and fastest code on small processors, it would be very nice is to have > the ability to return an extra flag along with the main return value of > a function. Typically that would be a flag to indicate success or > failure, but it might have other purposes - and it could be the only > return value of an otherwise void function. Key to the implementation > would be a calling convention to use a processor condition code flag > here - that would let you generate optimal code for the "if (error) > goto" part. I too am an embedded developer (with some kernel module programming too) and what you say is another reason why I'd personally like to see this happen. Thanks for the feedback.
Issues installing GCC libraries
Dear Administrator, I tried installing GCC libraries, but it does not seem to work. I do not understand the problem why. This is the output from my Terminal window: 1282816:~ kfalk$ cd Desktop/ 1282816:Desktop kfalk$ cd gcc-4.8.2/ 1282816:gcc-4.8.2 kfalk$ ./configure checking build system type... x86_64-apple-darwin12.4.0 checking host system type... x86_64-apple-darwin12.4.0 checking target system type... x86_64-apple-darwin12.4.0 checking for a BSD-compatible install... /usr/bin/install -c checking whether ln works... yes checking whether ln -s works... yes checking for a sed that does not truncate output... /usr/bin/sed checking for gawk... no checking for mawk... no checking for nawk... no checking for awk... awk checking for libatomic support... yes checking for libitm support... yes checking for libsanitizer support... yes checking for gcc... gcc checking for C compiler default output file name... a.out checking whether the C compiler works... yes checking whether we are cross compiling... no checking for suffix of executables... checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking for g++... g++ checking whether we are using the GNU C++ compiler... yes checking whether g++ accepts -g... yes checking whether g++ accepts -static-libstdc++ -static-libgcc... no checking for gnatbind... no checking for gnatmake... no checking whether compiler driver understands Ada... no checking how to compare bootstrapped objects... cmp --ignore-initial=16 $$f1 $$f2 checking for objdir... .libs checking for the correct version of gmp.h... no configure: error: Building GCC requires GMP 4.2+, MPFR 2.4.0+ and MPC 0.8.0+. Try the --with-gmp, --with-mpfr and/or --with-mpc options to specify their locations. Source code for these libraries can be found at their respective hosting sites as well as at ftp://gcc.gnu.org/pub/gcc/infrastructure/. See also http://gcc.gnu.org/install/prerequisites.html for additional info. If you obtained GMP, MPFR and/or MPC from a vendor distribution package, make sure that you have installed both the libraries and the header files. They may be located in separate packages. Is there anything specific I can install like the mentioned GMP, or else? Many thanks! Katerina
Re: Issues installing GCC libraries
On 11 March 2014 21:33, Falk, Katerina wrote: > Dear Administrator, > > I tried installing GCC libraries, but it does not seem to work. I do not > understand the problem why. This is the output from my Terminal window: This mailing list is for development of GCC, for help using GCC please use the gcc-h...@gcc.gnu.org list. Please take any follow up questions there. > 1282816:~ kfalk$ cd Desktop/ > 1282816:Desktop kfalk$ cd gcc-4.8.2/ > 1282816:gcc-4.8.2 kfalk$ ./configure Please read http://gcc.gnu.org/wiki/InstallingGCC
Re: [RFC] Meta-description for tree and gimple folding
On Mon, 3 Mar 2014, Richard Biener wrote: How do you handle a transformation that currently tries to recursively fold something else and does the main transformation only if that simplified? And doesn't do the other folding (because it's not in the IL literally?)? Similar to the cst without overflow case, by writing custom C code and allowing that to signal failure. Note that for this kind of simplification, it can be inconvenient to have canonicalization included with the "real" simplifications. Imagine I am looking at (x?3:5)+y. If 3+y "simplifies" to y+3 and 5+y "simplifies" to y+5, then it looks worth it to replace the expression with x?y+3:(y+5). Would there be a convenient way to separate them, so it can tell me that 3+y should be replaced with y+3 but that it is not a simplification? -- Marc Glisse
GSoC 2014 C++ Concepts project
Hello, my name is Thomas Wynn. I am a junior in pursuit of a B.S. in Computer Science at The University of Akron. I am interested in working on a project with GCC for this year's Google Summer of Code. More specifically, I would like to work on support for concept variables and shorthand notation of concepts for C++ Concepts Lite. I am currently doing an independent study with Andrew Sutton in which I have been porting and creating various tests for concepts used in the DejaGNU test suite of an experimental branch of GCC 4.9, and will soon be helping with the development of features in branch. I would greatly appreciate any suggestions or feedback for this project so that I may write a more detailed, relevant, and accurate proposal.
GSoC Concepts - separate checking
My name is Braden Obrzut and I am a student from the University of Akron interested in contributing to GCC for GSoC. I am interested in working on a project related to the c++-concepts branch. In particular, I am interested in implementing mechanisms for checking the safety of constrained templates (separate checking). I have discussed the project with Andrew Sutton (who maintains the c++-concepts branch and happens to be a professor at Akron) and believe that some aspects of the work would be feasible within the three month time span. I also hope to continue working on the project as my honors thesis project. As a hobby I usually design and implement declarative languages for content definition in old video games. While I currently may have limited experience with GCC internals, I think this would be a great opportunity for me to learn how real compilers works and help with the development of the C++ programming language.