On Mon, Nov 2, 2015 at 10:17 AM, Kugan <kugan.vivekanandara...@linaro.org> wrote: > > > On 29/10/15 02:45, Richard Biener wrote: >> On Tue, Oct 27, 2015 at 1:50 AM, kugan >> <kugan.vivekanandara...@linaro.org> wrote: >>> >>> >>> On 23/10/15 01:23, Richard Biener wrote: >>>> >>>> On Thu, Oct 22, 2015 at 12:50 PM, Kugan >>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>> >>>>> >>>>> >>>>> On 21/10/15 23:45, Richard Biener wrote: >>>>>> >>>>>> On Tue, Oct 20, 2015 at 10:03 PM, Kugan >>>>>> <kugan.vivekanandara...@linaro.org> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 07/09/15 12:53, Kugan wrote: >>>>>>>> >>>>>>>> >>>>>>>> This a new version of the patch posted in >>>>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00226.html. I have done >>>>>>>> more testing and spitted the patch to make it more easier to review. >>>>>>>> There are still couple of issues to be addressed and I am working on >>>>>>>> them. >>>>>>>> >>>>>>>> 1. AARCH64 bootstrap now fails with the commit >>>>>>>> 94f92c36a83d66a893c3bc6f00a038ba3dbe2a6f. simplify-rtx.c is >>>>>>>> mis-compiled >>>>>>>> in stage2 and fwprop.c is failing. It looks to me that there is a >>>>>>>> latent >>>>>>>> issue which gets exposed my patch. I can also reproduce this in x86_64 >>>>>>>> if I use the same PROMOTE_MODE which is used in aarch64 port. For the >>>>>>>> time being, I am using patch >>>>>>>> 0006-temporary-workaround-for-bootstrap-failure-due-to-co.patch as a >>>>>>>> workaround. This meeds to be fixed before the patches are ready to be >>>>>>>> committed. >>>>>>>> >>>>>>>> 2. vector-compare-1.c from c-c++-common/torture fails to assemble with >>>>>>>> -O3 -g Error: unaligned opcodes detected in executable segment. It >>>>>>>> works >>>>>>>> fine if I remove the -g. I am looking into it and needs to be fixed as >>>>>>>> well. >>>>>>> >>>>>>> >>>>>>> Hi Richard, >>>>>>> >>>>>>> Now that stage 1 is going to close, I would like to get these patches >>>>>>> accepted for stage1. I will try my best to address your review comments >>>>>>> ASAP. >>>>>> >>>>>> >>>>>> Ok, can you make the whole patch series available so I can poke at the >>>>>> implementation a bit? Please state the revision it was rebased on >>>>>> (or point me to a git/svn branch the work resides on). >>>>>> >>>>> >>>>> Thanks. Please find the patched rebated against trunk@229156. I have >>>>> skipped the test-case readjustment patches. >>>> >>>> >>>> Some quick observations. On x86_64 when building >>> >>> >>> Hi Richard, >>> >>> Thanks for the review. >>> >>>> >>>> short bar (short y); >>>> int foo (short x) >>>> { >>>> short y = bar (x) + 15; >>>> return y; >>>> } >>>> >>>> with -m32 -O2 -mtune=pentiumpro (which ends up promoting HImode regs) >>>> I get >>>> >>>> <bb 2>: >>>> _1 = (int) x_10(D); >>>> _2 = (_1) sext (16); >>>> _11 = bar (_2); >>>> _5 = (int) _11; >>>> _12 = (unsigned int) _5; >>>> _6 = _12 & 65535; >>>> _7 = _6 + 15; >>>> _13 = (int) _7; >>>> _8 = (_13) sext (16); >>>> _9 = (_8) sext (16); >>>> return _9; >>>> >>>> which looks fine but the VRP optimization doesn't trigger for the >>>> redundant sext >>>> (ranges are computed correctly but the 2nd extension is not removed). > > Thanks for the comments. Please fond the attached patches with which I > am now getting > cat .192t.optimized > > ;; Function foo (foo, funcdef_no=0, decl_uid=1406, cgraph_uid=0, > symbol_order=0) > > foo (short int x) > { > signed int _1; > int _2; > signed int _5; > unsigned int _6; > unsigned int _7; > signed int _8; > int _9; > short int _11; > unsigned int _12; > signed int _13; > > <bb 2>: > _1 = (signed int) x_10(D); > _2 = _1; > _11 = bar (_2); > _5 = (signed int) _11; > _12 = (unsigned int) _11; > _6 = _12 & 65535; > _7 = _6 + 15; > _13 = (signed int) _7; > _8 = (_13) sext (16); > _9 = _8; > return _9; > > } > > > There are still some redundancies. The asm difference after RTL > optimizations is > > - addl $15, %eax > + addw $15, %ax > > >>>> >>>> This also makes me notice trivial match.pd patterns are missing, like >>>> for example >>>> >>>> (simplify >>>> (sext (sext@2 @0 @1) @3) >>>> (if (tree_int_cst_compare (@1, @3) <= 0) >>>> @2 >>>> (sext @0 @3))) >>>> >>>> as VRP doesn't run at -O1 we must rely on those to remove rendudant >>>> extensions, >>>> otherwise generated code might get worse compared to without the pass(?) >>> >>> >>> Do you think that we should enable this pass only when vrp is enabled. >>> Otherwise, even when we do the simple optimizations you mentioned below, we >>> might not be able to remove all the redundancies. >>> >>>> >>>> I also notice that the 'short' argument does not get it's sign-extension >>>> removed >>>> as redundand either even though we have >>>> >>>> _1 = (int) x_8(D); >>>> Found new range for _1: [-32768, 32767] >>>> >>> >>> I am looking into it. >>> >>>> In the end I suspect that keeping track of the "simple" cases in the >>>> promotion >>>> pass itself (by keeping a lattice) might be a good idea (after we fix VRP >>>> to do >>>> its work). In some way whether the ABI guarantees promoted argument >>>> registers might need some other target hook queries. > > I tried adding it in the attached patch with record_visit_stmt to track > whether an ssa would have value overflow or properly zero/sign extended > in promoted mode. We can use this to eliminate some of the zero/sign > extension at gimple level. As it is, it doesn't do much. If this is what > you had in mind, I will extend it based on your feedback. > > >>>> >>>> Now onto the 0002 patch. >>>> >>>> +static bool >>>> +type_precision_ok (tree type) >>>> +{ >>>> + return (TYPE_PRECISION (type) == 8 >>>> + || TYPE_PRECISION (type) == 16 >>>> + || TYPE_PRECISION (type) == 32); >>>> +} >>>> >>>> that's a weird function to me. You probably want >>>> TYPE_PRECISION (type) == GET_MODE_PRECISION (TYPE_MODE (type)) >>>> here? And guard that thing with POINTER_TYPE_P || INTEGRAL_TYPE_P? >>>> >>> >>> I will change this. (I have a patch which I am testing with other changes >>> you have asked for) >>> >>> >>>> +/* Return the promoted type for TYPE. */ >>>> +static tree >>>> +get_promoted_type (tree type) >>>> +{ >>>> + tree promoted_type; >>>> + enum machine_mode mode; >>>> + int uns; >>>> + if (POINTER_TYPE_P (type) >>>> + || !INTEGRAL_TYPE_P (type) >>>> + || !type_precision_ok (type)) >>>> + return type; >>>> + >>>> + mode = TYPE_MODE (type); >>>> +#ifdef PROMOTE_MODE >>>> + uns = TYPE_SIGN (type); >>>> + PROMOTE_MODE (mode, uns, type); >>>> +#endif >>>> + uns = TYPE_SIGN (type); >>>> + promoted_type = lang_hooks.types.type_for_mode (mode, uns); >>>> + if (promoted_type >>>> + && (TYPE_PRECISION (promoted_type) > TYPE_PRECISION (type))) >>>> + type = promoted_type; >>>> >>>> I think what you want to verify is that TYPE_PRECISION (promoted_type) >>>> == GET_MODE_PRECISION (mode). >>>> And to not even bother with this simply use >>>> >>>> promoted_type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), >>>> uns); >>>> >>> >>> I am changing this too. >>> >>>> You use a domwalk but also might create new basic-blocks during it >>>> (insert_on_edge_immediate), that's a >>>> no-no, commit edge inserts after the domwalk. >>> >>> >>> I am sorry, I dont understand "commit edge inserts after the domwalk" Is >>> there a way to do this in the current implementation? >> >> Yes, simply use gsi_insert_on_edge () and after the domwalk is done do >> gsi_commit_edge_inserts (). >> >>>> ssa_sets_higher_bits_bitmap looks unused and >>>> we generally don't free dominance info, so please don't do that. >>>> >>>> I fired off a bootstrap on ppc64-linux which fails building stage1 libgcc >>>> with >>>> >>>> /abuild/rguenther/obj/./gcc/xgcc -B/abuild/rguenther/obj/./gcc/ >>>> -B/usr/local/powerpc64-unknown-linux-gnu/bin/ >>>> -B/usr/local/powerpc64-unknown-linux-gnu/lib/ -isystem >>>> /usr/local/powerpc64-unknown-linux-gnu/include -isystem >>>> /usr/local/powerpc64-unknown-linux-gnu/sys-include -g -O2 -O2 -g >>>> -O2 -DIN_GCC -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual >>>> -Wno-format -Wstrict-prototypes -Wmissing-prototypes >>>> -Wold-style-definition -isystem ./include -fPIC -mlong-double-128 >>>> -mno-minimal-toc -g -DIN_LIBGCC2 -fbuilding-libgcc >>>> -fno-stack-protector -fPIC -mlong-double-128 -mno-minimal-toc -I. >>>> -I. -I../.././gcc -I../../../trunk/libgcc -I../../../trunk/libgcc/. >>>> -I../../../trunk/libgcc/../gcc -I../../../trunk/libgcc/../include >>>> -I../../../trunk/libgcc/../libdecnumber/dpd >>>> -I../../../trunk/libgcc/../libdecnumber -DHAVE_CC_TLS -o _divdi3.o >>>> -MT _divdi3.o -MD -MP -MF _divdi3.dep -DL_divdi3 -c >>>> ../../../trunk/libgcc/libgcc2.c \ >>>> -fexceptions -fnon-call-exceptions -fvisibility=hidden >>>> -DHIDE_EXPORTS >>>> In file included from ../../../trunk/libgcc/libgcc2.c:56:0: >>>> ../../../trunk/libgcc/libgcc2.c: In function ‘__divti3’: >>>> ../../../trunk/libgcc/libgcc2.h:193:20: internal compiler error: in >>>> expand_debug_locations, at cfgexpand.c:5277 >>>> > > With the attached patch, now I am running into Bootstrap comparison > failure. I am looking into it. Please review this version so that I can > address them while fixing this issue.
I notice diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c index 82fd4a1..80fcf70 100644 --- a/gcc/tree-ssanames.c +++ b/gcc/tree-ssanames.c @@ -207,7 +207,8 @@ set_range_info (tree name, enum value_range_type range_type, unsigned int precision = TYPE_PRECISION (TREE_TYPE (name)); /* Allocate if not available. */ - if (ri == NULL) + if (ri == NULL + || (precision != ri->get_min ().get_precision ())) and I think you need to clear range info on promoted SSA vars in the promotion pass. The basic "structure" thing still remains. You walk over all uses and defs in all stmts in promote_all_stmts which ends up calling promote_ssa_if_not_promoted on all uses and defs which in turn promotes (the "def") and then fixes up all uses in all stmts. Instead of this you should, in promote_all_stmts, walk over all uses doing what fixup_uses does and then walk over all defs, doing what promote_ssa does. + case GIMPLE_NOP: + { + if (SSA_NAME_VAR (def) == NULL) + { + /* Promote def by fixing its type for anonymous def. */ + TREE_TYPE (def) = promoted_type; + } + else + { + /* Create a promoted copy of parameters. */ + bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); I think the uninitialized vars are somewhat tricky and it would be best to create a new uninit anonymous SSA name for them. You can have SSA_NAME_VAR != NULL and def _not_ being a parameter btw. +/* Return true if it is safe to promote the defined SSA_NAME in the STMT + itself. */ +static bool +safe_to_promote_def_p (gimple *stmt) +{ + enum tree_code code = gimple_assign_rhs_code (stmt); + if (gimple_vuse (stmt) != NULL_TREE + || gimple_vdef (stmt) != NULL_TREE + || code == ARRAY_REF + || code == LROTATE_EXPR + || code == RROTATE_EXPR + || code == VIEW_CONVERT_EXPR + || code == BIT_FIELD_REF + || code == REALPART_EXPR + || code == IMAGPART_EXPR + || code == REDUC_MAX_EXPR + || code == REDUC_PLUS_EXPR + || code == REDUC_MIN_EXPR) + return false; + return true; huh, I think this function has an odd name, maybe can_promote_operation ()? Please use TREE_CODE_CLASS (code) == tcc_reference for all _REF trees. Note that as followup things like the rotates should be "expanded" like we'd do on RTL (open-coding the thing). And we'd need a way to specify zero-/sign-extended loads. +/* Return true if it is safe to promote the use in the STMT. */ +static bool +safe_to_promote_use_p (gimple *stmt) +{ + enum tree_code code = gimple_assign_rhs_code (stmt); + tree lhs = gimple_assign_lhs (stmt); + + if (gimple_vuse (stmt) != NULL_TREE + || gimple_vdef (stmt) != NULL_TREE I think the vuse/vdef check is bogus, you can have a use of 'i_3' in say _2 = a[i_3]; + || code == VIEW_CONVERT_EXPR + || code == LROTATE_EXPR + || code == RROTATE_EXPR + || code == CONSTRUCTOR + || code == BIT_FIELD_REF + || code == COMPLEX_EXPR + || code == ASM_EXPR + || VECTOR_TYPE_P (TREE_TYPE (lhs))) + return false; + return true; ASM_EXPR can never appear here. I think PROMOTE_MODE never promotes vector types - what cases did you need to add VECTOR_TYPE_P for? +/* Return true if the SSA_NAME has to be truncated to preserve the + semantics. */ +static bool +truncate_use_p (gimple *stmt) +{ + enum tree_code code = gimple_assign_rhs_code (stmt); I think the description can be improved. This is about stray bits set beyond the original type, correct? Please use NOP_EXPR wherever you use CONVERT_EXPR right how. + if (TREE_CODE_CLASS (code) + == tcc_comparison) + promote_cst_in_stmt (stmt, promoted_type, true); don't you always need to promote constant operands? Richard.