On Tue, May 26, 2015 at 3:10 PM, Evgeniya Maenkova <evgeniya.maenk...@gmail.com> wrote: > Hi, Richard > > Thanks for review starting. > > Do you see any major issues with this patch (i.e. algorithms and ideas > that should be completely replaced, effectively causing the re-write > of most code)? > > To decide if there are major issues in the patch, perhaps, you need > additional clarifications from me? Could you point at the places where > additional explanations could save you most effort? > > Your answers to these questions are looking the first priority ones. > You wrote about several issues in the code, which are looking as easy > (or almost easy ;) to fix(inline functions, unswitch-loops flag, > comments, etc). But, I think you agree, let’s first decide about the > major issues (I mean, whether we continue with this patch or starting > new one, this will save a lot of time for both of us).
I didn't get an overall idea on how the patch works, that is, how it integrates with the existing algorithm. If you can elaborate on that a bit that would be helpful. I think the code-generation part needs some work (whether by following my idea with re-using copy_bbs or whether by basically re-implementing it is up to debate). How does your code handle for () { if (cond1) { if (cond2) invariant; if (cond3) invariant; } } ? Optimally we'd have before the loop exactly the same if () structure (thus if (cond1) is shared). Richard. > Thanks, > > Evgeniya > > On Tue, May 26, 2015 at 2:31 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, May 8, 2015 at 11:07 PM, Evgeniya Maenkova >> <evgeniya.maenk...@gmail.com> wrote: >>> Hi, >>> >>> Could you please review my patch for predicated lim? >>> >>> Let me note some details about it: >>> >>> >>> >>> 1) Phi statements are still moved only if they have 1 or 2 >>> arguments. However, phi statements could be move under conditions (as >>> it’s done for the other statements). Probably, phi statement motion >>> with 3 + arguments could be implemented in the next patch after >>> predicated lim. >>> >>> 2) Patch has limitations/features like (it was ok to me to >>> implement it such way, maybe I’m not correct. ): >>> >>> a) Loop1 >>> >>> { >>> >>> If (a) >>> >>> Loop2 >>> >>> { >>> >>> Stmt - Invariant for Loop1 >>> >>> } >>> >>> } >>> >>> In this case Stmt will be moved only out of Loop2, because of if (a). >>> >>> b) Or >>> >>> Loop1 >>> >>> { >>> >>> … >>> >>> If (cond1) >>> >>> If (cond2) >>> >>> If (cond3) >>> >>> Stmt; >>> >>> } >>> >>> Stmt will be moved out only if cond1 is always executed in Loop1. >>> >>> c) It took me a long time to write all of these code, so there >>> might be other peculiarities which I forgot to mention. :) >>> >>> Let’s discuss these ones as you will review my patch. >>> >>> 3) Patch consists of 9 files: >>> >>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: >>> >>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >>> predicated lim moves 2 more statements out of the loop; >>> >>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional >>> lim recip optimization in this test doesn’t work (the corresponding >>> value is below threshold as I could see in the code for recip, 1<3). >>> So to have recip working in this test I changed test a little bit. >>> >>> b) gcc/tree-ssa-loop-im.c – the patched lim per se >>> >>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >>> >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >>> >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >>> >>> the tests for conditional lim. >>> >>> 4) Patch testing: >>> >>> a) make –k check (no difference in results for me for the clean >>> build and the patched one, >>> >>> - Revision: 222849, >>> >>> - uname -a >>> >>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct >>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >>> >>> b) Bootstrap. >>> >>> It goes well now, however to fix it I have made a temporary hack in >>> the lim code. And with this fix patch definitely shouldn’t be >>> committed. >>> >>> I did so, as I would like to discuss this issue first. >>> >>> The problem is: I got stage2-stage3 comparison failure on the single >>> file (tree-vect-data-refs.o). After some investigation I understood >>> that tree-vect-data-refs.o differs being compiled with and without >>> ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, >>> and for stage 3 this is ‘-g –O2’. But to simplify things I can >>> reproduce this difference on the same build (even not bootstrapped), >>> with option ‘ –g’ and without it). >>> >>> Of course, the file compiled with –g option will contain debug >>> information and will differ from the corresponding file without debug >>> information. I mean there is the difference reported by >>> contrib/compare-debug (I mean we talk about stripped files). >>> >>> Then I compared assemblers and lim dump files and made a conclusion >>> the difference is: >>> >>> There is statement _484=something, which is conditionally moved out of >>> loop X. In non debug cases no usages of _484 are left inside the loop >>> X. In debug case, there is the single use in debug statement. So for >>> debug case my patch generates phi statement for _484 to make it >>> available inside the loop X. For non debug case, no such phi statement >>> generated as there is no uses of _484. >>> >>> There is one more statement with lhs=_493, with exactly this pattern >>> of use. But this is not important for our discussion. >>> >>> >>> >>> So, in my opinion it’s ok to generate additional phi node for debug >>> case. But I’m not a compiler expert and maybe there is some >>> requirement that debug and non-debug versions should differ only by >>> debug statements, I don’t know. >> >> As Andi said code generation needs to be the same. >> >>> My temporary hack fix is just removing of such debug statements (no >>> debug statements, no problems J). >> >> That's fine and generally what is done in other passes. >> >>> The alternatives I see are: >>> >>> - Move debug statements outside the loop (not good in my opinion); >>> >>> - Somehow reset values in debug statements (not good in my >>> opinion, if I could provide correct information for them); >> >> You could re-set them via gimple_debug_bind_reset_value which >> will tell the user that at this point the value is optimized out >> (that's slightly >> better than removing the debug stmts). >> >>> - Generate phi for debug statements and fix bootstrap (say, >>> let’s have some list of files, which we have special check for. I mean >>> for my case, the difference is not in stage2 and stage3, it’s in debug >>> and non-debug). I like this variant. However, as I don’t see such list >>> of special files in the whole gcc, I think I might be missing >>> something. >> >> The policy ;) >> >>> What do you think? >> >> Now about the patch. Generally there seem to be spurious whitespace-only >> or formatting differences - try to avoid these. >> >> All new functiuons need a comment explaining them and their function >> parameters. >> >> +static cond_data_map *get_cond_data_map (struct loop * level) >> +{ >> >> per GCC coding conventions this should be formatted like >> >> static cond_data_map * >> get_cond_data_map (struct loop * level) >> { >> >> +#define SET_ALWAYS_EXECUTED_IN(BB, VAL)\ >> + BB->aux = (void *) (XCNEW (struct move_cond));\ >> + BB_MOVE_COND (BB)->type=ALWAYS_EXECUTED_IN;\ >> + BB_MOVE_COND (BB)->loop=VAL; >> >> this kind of side-effects in macros are bad - better turn them into >> inline functions. >> >> - if (flag_unswitch_loops >> - && gimple_code (stmt) == GIMPLE_COND) >> - { >> - /* If we perform unswitching, force the operands of the invariant >> - condition to be moved out of the loop. */ >> - return MOVE_POSSIBLE; >> - } >> + if (gimple_code (stmt) == GIMPLE_COND) >> + return MOVE_POSSIBLE; >> >> any reason you removed the guard on flag_unswitch_loops? We may >> want to add a new flag, certainly this transform might not be suitable >> for -O[12s] for example. Like if not all stmts inside a conditional can >> be moved the transform will increase code-size. >> >> You are doing a (lot of?) refactoring which makes review of the patch >> harder. For example >> >> +bool calculate_tgt_level (gimple stmt, struct loop *outermost, >> + bool cond_executed, basic_block bb) >> +{ >> + struct lim_aux_data *lim_dat >> ... >> + >> + if (lim_data->cost >= LIM_EXPENSIVE) >> + set_profitable_level (stmt); >> + return true; >> +} >> >> ... >> >> - >> - lim_data = init_lim_data (stmt); >> - lim_data->always_executed_in = outermost; >> - >> - if (!determine_max_movement (stmt, false)) >> - { >> - lim_data->max_loop = NULL; >> - continue; >> - } >> - >> - if (dump_file && (dump_flags & TDF_DETAILS)) >> - { >> - print_gimple_stmt (dump_file, stmt, 2, 0); >> - fprintf (dump_file, " invariant up to level %d, cost %d.\n\n", >> - loop_depth (lim_data->max_loop), >> - lim_data->cost); >> - } >> - >> - if (lim_data->cost >= LIM_EXPENSIVE) >> - set_profitable_level (stmt); >> + if (!calculate_tgt_level (stmt, outermost, cond_executed, bb)) >> + continue; >> >> where it is not obvious why moving set_profitable_level should be >> in a function named calculate_tgt_level. In fact determine_max_movement >> was supposed to be the abstraction already. >> >> Without yet getting an idea of the overall algorithm (I'm really missing such >> a thing...) it seems to me that the current handling of doing invariant >> motion >> of PHIs should handle the analysis phase just fine (you might be adding >> more advanced cases in this patch, but I'd rather do that as a followup for >> initial support). >> >> The code-gen part looks somewhat ad-hoc to me. In principle if we have >> a list of PHIs to hoist we can materialize the needed part of the original >> basic-block structure on the pre-header edge together with the controlling >> conditionals. The loop pre-header would be the always-executed BB >> and the needed part would always start with the loop header and be a >> single-entry multiple-exit region. So ideally we could use sth like >> gimple_duplicate_sese_region with the BB copying worker copy_bbs >> refactored/replaced with something only copying controlling conditionals >> and statements/PHIs that a callback identifies (tree-loop-distribution.c >> would also like sth like that - currently it copies whole loops and removes >> stmts it didn't intend to copy...). >> >> So basically invariantness_dom_walker would record a vector of PHIs >> to hoist (or we'd construct that at move_computations time). >> If there is any control-flow to hoist then we'd do an alternative >> move_computations by "copying" the needed CFG and covered stmts. >> In theory doing it the loop-distribution way (simply copy the whole >> loop body and then remove anything not needed) would also be possible >> though that wouldn't be the very best idea obviously ;) >> >> Thanks, >> Richard. >> >>> >>> >>> Thanks, >>> >>> >>> >>> Evgeniya > > > > -- > Thanks, > > Evgeniya > > perfstories.wordpress.com