On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: > > To be constructive here - the above case is from within a > > GIMPLE_ASSIGN case label > > and thus I'd have expected > > > > case GIMPLE_ASSIGN: > > { > > gassign *a1 = as_a <gassign *> (s1); > > gassign *a2 = as_a <gassign *> (s2); > > lhs1 = gimple_assign_lhs (a1); > > lhs2 = gimple_assign_lhs (a2); > > if (TREE_CODE (lhs1) != SSA_NAME > > && TREE_CODE (lhs2) != SSA_NAME) > > return (operand_equal_p (lhs1, lhs2, 0) > > && gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), > > gimple_assign_rhs1 (a2))); > > else if (TREE_CODE (lhs1) == SSA_NAME > > && TREE_CODE (lhs2) == SSA_NAME) > > return vn_valueize (lhs1) == vn_valueize (lhs2); > > return false; > > } > > > > instead. That's the kind of changes I have expected and have approved of. > > But even that looks like just adding extra work for all developers, with no > gain. You only have to add extra code and extra temporaries, in switches > typically also have to add {} because of the temporaries and thus extra > indentation level, and it doesn't simplify anything in the code.
The branch attempts to use the C++ typesystem to capture information about the kinds of gimple statement we expect, both: (A) so that the compiler can detect type errors, and (B) as a comprehension aid to the human reader of the code The ideal here is when function params and struct field can be strengthened from "gimple" to a subclass ptr. This captures the knowledge that every use of a function or within a struct has a given gimple code. Examples of this for the initial patchkit were: * the "call_stmt" field of a cgraph_edge becoming a gcall *, rather than a plain gimple. * various variables in tree-into-ssa.c change from just vec<gimple> to being vec<gphi *>, capturing the "phi-ness" of the contents as a compile-time check * tree-inline.h's struct copy_body_data, the field "debug_stmts" can be "concretized" from a vec<gimple> to a vec<gdebug *>. A more recent example, from: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=5bd16d92b9e928b5a5a7aebd571d92f72dd517f8 The fields "arr_ref_first" and "arr_ref_last" of tree-switch-conversion.c's struct switch_conv_info can be strengthened from gimple to gassign *: they are always GIMPLE_ASSIGN. I applied cleanups to do my initial patchkit, which Jeff approved (with some provisos), and which became the first 92 commits on the branch: "[gimple-classes, committed 00/92] Initial slew of commits": https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02791.html followed by a merger from trunk into the branch: "[gimple-classes] Merge trunk r216157-r216746 into branch": https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02982.html With those commits, I was able to convert 180 accessors to taking a concrete subclass, with 158 left taking a gimple or const_gimple i.e. about half of them. (My script to analyze this is gimple_typesafety.py within https://github.com/davidmalcolm/gcc-refactoring-scripts) I got it into my head that it was desirable to convert *all* gimple accessors to this form, and to eliminate the GIMPLE_CHECK macros (given that gcc development community seems to dislike partial transitions). I've been attempting this full conversion - convert all of the gimple_ accessors, to require an appropriate gimple subclass ptr, in particular focusing on the gimple_assign_ ones, but it's a *lot* of extra work, and much more invasive than the patches that Jeff conditionally approved. I now suspect that it's going too far - in the initial patchkit I was doing the clean, obvious ones, but now I'm left with the awkward ones that would require me to uglify the code to "fix". If it's OK to only convert some of them, then I'd rather just do that. The type-strengthening is rarely as neat as being able to simply convert a param or field type. Some examples: Functions passed a gsi ====================== Sometimes functions are passed a gsi, where it can be known that the gsi currently references a stmt of known kind (although that isn't necessarily obvious from reading the body of the function): Example from tree-ssa-strlen.c: handle_char_store (gimple_stmt_iterator *gsi) { int idx = -1; strinfo si = NULL; - gimple stmt = gsi_stmt (*gsi); + gassign *stmt = as_a <gassign *> (gsi_stmt (*gsi)); tree ssaname = NULL_TREE, lhs = gimple_assign_lhs (stmt); if (TREE_CODE (lhs) == MEM_REF from https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=78aae552f15ad5f8f5290fb825f9ae33f4a7cad9 Only acting on one kind of gimple ================================= Some functions accept any kind of gimple, but only act on e.g. a GIMPLE_ASSIGN, immediately returning if they got a different kind. So I make this kind of change, where: void foo (gimple stmt, other params) { if (!is_gimple_assign (stmt)) return; use_various gimple_assign_accessors (stmt); } becomes: void foo (gimple gs, other params) { gassign *stmt = dyn_cast <gassign *> (gs); if (!stmt) return; use_various gimple_assign_accessors (stmt); } renaming the param to "gs" to avoid a mass rename of "stmt". Example tree-ssa-sink.c (statement_sink_location): https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=da19cf71e540f52a924d0985efdacd9e03684a6e Suites where we know the type of a statement ============================================ If we have: if (is_gimple_assign (stmt)) { /* 20 lines of logic, with numerous gimple_assign_ accessors on "stmt". */ } then I've been converting it to: if (gassign *assign_stmt = dyn_cast <gassign *> (stmt)) { /* rename the "stmt" to "assign_stmt", wrapping lines as needed. */ } Is "assign_stmt" too long here? It's handy to be able to distinguish the meaning of the stmt e.g. "use_stmt", "def_stmt" can have synonyms "use_assign" and "def_assign" for the regions where they're known to be GIMPLE_ASSIGN). The above is overkill for a 1-liner e.g.: if (gimple_get_code (stmt) == GIMPLE_ASSIGN) expr = gimple_get_lhs (stmt); It could be converted to: if (gassign *assign_stmt = dyn_cast <gassign *> (stmt)) expr = gimple_assign_get_lhs (assign_stmt); or to: if (gimple_get_code (stmt) == GIMPLE_ASSIGN) expr = gimple_assign_get_lhs (as_a <gassign *> (stmt)); or left as-is if we don't want gimple_assign_get_lhs to require a gassign *. Casting functions ================= Taking RTL's single_set as inspiration, I converted the predicates: gimple_assign_copy_p gimple_assign_ssa_name_copy_p gimple_assign_single_p to return a gassign * rather than a bool, so that they can be used as casting functions: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=06a4829d92f383a0fc1e9d488f1634d3764a0171 (also burning a register, since without LTO or some attribute the compiler can't know the invariant that if non-NULL, then the return value == the param (albeit with a cast) - do we have a function attribute for that?). Example of use, from tree-ssa-pre.c @@ -4040,6 +4041,7 @@ eliminate_dom_walker::before_dom_children (basic_block b) tree sprime = NULL_TREE; gimple stmt = gsi_stmt (gsi); tree lhs = gimple_get_lhs (stmt); + gassign *assign_stmt; if (lhs && TREE_CODE (lhs) == SSA_NAME && !gimple_has_volatile_ops (stmt) /* See PR43491. Do not replace a global register variable when @@ -4049,10 +4051,10 @@ eliminate_dom_walker::before_dom_children (basic_block b) ??? The fix isn't effective here. This should instead be ensured by not value-numbering them the same but treating them like volatiles? */ - && !(gimple_assign_single_p (stmt) - && (TREE_CODE (gimple_assign_rhs1 (stmt)) == VAR_DECL - && DECL_HARD_REGISTER (gimple_assign_rhs1 (stmt)) - && is_global_var (gimple_assign_rhs1 (stmt))))) + && !((assign_stmt = gimple_assign_single_p (stmt)) + && (TREE_CODE (gimple_assign_rhs1 (assign_stmt)) == VAR_DECL + && DECL_HARD_REGISTER (gimple_assign_rhs1 (assign_stmt)) + && is_global_var (gimple_assign_rhs1 (assign_stmt))))) (followed by lots of renaming of "stmt" to "assign_stmt" within the guarded scope; this is from: https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=18277e45b407ddd9f15012b48caf7403354ebaec ) More awkward cases ================== e.g. code that can work on both GIMPLE_CALL and GIMPLE_ASSIGN; once we've dealt with GIMPLE_CALL, we can add: gassign *assign_stmt = as_a <gassign *> (stmt); and have the rest of the code work on "assign_stmt". etc Current status ============== I currently have 261 accessors converted, with 75 to go i.e. about 75%, as compared to the 50% from the original patchkit. I believe that the additional 50->75% work is relatively clean, but I'm hitting a wall where the final 25% is requiring much more invasive work, of the kind objected to earlier in this thread. I'm having to do uglier and uglier patches. How to proceed? =============== I like the initial work I did, the: "[gimple-classes, committed 00/92] Initial slew of commits": https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02791.html work, but I agree that the work on the branch after that is ugly in places, and the volume of the work may resemble a DoS attack on the reviewers - sorry. (to answer a question on IRC, those followup commits *were* handwritten, not autogenerated). Is it acceptable to have the partial transition of strengthening the types of only 50% or 75% of the accessors? I'd like to apply those first commits to trunk (applying necessary merger work), but I know we're approaching the close of stage 1 for gcc 5. I can try to identify a subset of patches I think are likely to be more acceptable if this work is still viable, and maybe put them on a different git branch. I hope I can get close to the 75% mark without too much ugliness and verbosity. Thoughts? Thanks Dave (fwiw, I'm getting rather sick of refactoring, and keen to focus on user-visible work for gcc 6)