Re: More aggressive threading causing loop-interchange-9.c regression
I think things are clear enough to propose a patch. Thanks for everyone's input. I have added some comments and tweaked Michael's patch to include the final BB where we're threading to, in the check. Without this last check, we fail in tree-ssa/cunroll-15.c because the latch becomes polluted with PHI nodes. OK for trunk? Aldy commit e735a2cd01485773635bd66c97af21059992d5a7 (HEAD -> pending/global-ranges) Author: Aldy Hernandez Date: Thu Sep 9 20:30:28 2021 +0200 Disable threading through latches until after loop optimizations. The motivation for this patch was enabling the use of global ranges in the path solver, but this caused certain properties of loops being destroyed which made subsequent loop optimizations to fail. Consequently, this patch's mail goal is to disable jump threading involving the latch until after loop optimizations have run. I have added a bit in cfun to indicate when the "loopdone" optimization has completed. This helps the path threader determine when it's safe to thread through loop structures. I can adapt the patch if there is an alternate way of determining this. As can be seen in the test adjustments, we mostly shift the threading from the early threaders (ethread, thread[12] to the late threaders thread[34]). I have nuked some of the early notes in the testcases that came as part of the jump threader rewrite. They're mostly noise now. Note that we could probably relax some other restrictions in profitable_path_p when loop optimizations have completed, but it would require more testing, and I'm hesitant to touch more things than needed at this point. I have added a reminder to the function to keep this in mind. Finally, perhaps as a follow-up, we should apply the same restrictions to the forward threader. At some point I'd like to combine the cost models. Tested on x86-64 Linux. p.s. There is a thorough discussion involving the limitations of jump threading involving loops here: https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html >From e735a2cd01485773635bd66c97af21059992d5a7 Mon Sep 17 00:00:00 2001 From: Aldy Hernandez Date: Thu, 9 Sep 2021 20:30:28 +0200 Subject: [PATCH] Disable threading through latches until after loop optimizations. The motivation for this patch was enabling the use of global ranges in the path solver, but this caused certain properties of loops being destroyed which made subsequent loop optimizations to fail. Consequently, this patch's mail goal is to disable jump threading involving the latch until after loop optimizations have run. I have added a bit in cfun to indicate when the "loopdone" optimization has completed. This helps the path threader determine when it's safe to thread through loop structures. I can adapt the patch if there is an alternate way of determining this. As can be seen in the test adjustments, we mostly shift the threading from the early threaders (ethread, thread[12] to the late threaders thread[34]). I have nuked some of the early notes in the testcases that came as part of the jump threader rewrite. They're mostly noise now. Note that we could probably relax some other restrictions in profitable_path_p when loop optimizations have completed, but it would require more testing, and I'm hesitant to touch more things than needed at this point. I have added a reminder to the function to keep this in mind. Finally, perhaps as a follow-up, we should apply the same restrictions to the forward threader. At some point I'd like to combine the cost models. Tested on x86-64 Linux. p.s. There is a thorough discussion involving the limitations of jump threading involving loops here: https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html gcc/ChangeLog: * function.h (struct function): Add loop_optimizers_done. (set_loop_optimizers_done): New. (loop_optimizers_done_p): New. * gimple-range-path.cc (path_range_query::internal_range_of_expr): Intersect with global range. * tree-ssa-loop.c (tree_ssa_loop_done): Call set_loop_optimizers_done. * tree-ssa-threadbackward.c (back_threader_profitability::profitable_path_p): Disable threading through latches until after loop optimizations have run. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of threading through latches. * gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same. * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same. Co-authored-by: Michael Matz --- gcc/function.h| 15 gcc/gimple-range-path.cc | 3 ++ .../gcc.dg/tree-ssa/ssa-dom-thread-2b.c | 4 +- .../gcc.dg/tree-ssa/ssa-dom-thread-6.c| 37 +-- .../gcc.dg/tree-ssa/ssa-dom-thread-7.c| 17 + gcc/tree-ssa-loop.c | 1 + gcc/tree-ssa-threadbackward.c | 28 +- 7 files changed
[PING] Re: Fix 'hash_table::expand' to destruct stale Value objects
Hi! On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches wrote: > On 8/30/21 4:46 AM, Thomas Schwinge wrote: >> Ping -- we still need to plug the memory leak; see patch attached, and/or >> long discussion here: > > Thanks for answering my questions. I have no concerns with going > forward with the patch as is. Thanks, Martin. Ping for formal approval (and review for using proper C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source code comment that I'm adding). Patch again attached, for easy reference. > Just a suggestion/request: unless > this patch fixes all the outstanding problems you know of or suspect > in this area (leaks/missing dtor calls) and unless you plan to work > on those in the near future, please open a bug for them with a brain > dump of what you learned. That should save us time when the day > comes to tackle those. ACK. I'm not aware of any additional known problems. (In our email discussion, we did have some "vague ideas" for opportunities of clarification/clean-up, but these aren't worth filing PRs for; needs someone to gain understanding, taking a look.) Grüße Thomas >> On 2021-08-16T14:10:00-0600, Martin Sebor wrote: >>> On 8/16/21 6:44 AM, Thomas Schwinge wrote: On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc wrote: > On 8/6/21 10:57 AM, Thomas Schwinge wrote: >> So I'm trying to do some C++... ;-) >> >> Given: >> >>/* A map from SSA names or var decls to record fields. */ >>typedef hash_map field_map_t; >> >>/* For each propagation record type, this is a map from SSA names >> or var decls >> to propagate, to the field in the record type that should be >> used for >> transmission and reception. */ >>typedef hash_map record_field_map_t; >> >> Thus, that's a 'hash_map>'. (I may do that, >> right?) Looking through GCC implementation files, very most of all uses >> of 'hash_map' boil down to pointer key ('tree', for example) and >> pointer/integer value. > > Right. Because most GCC containers rely exclusively on GCC's own > uses for testing, if your use case is novel in some way, chances > are it might not work as intended in all circumstances. > > I've wrestled with hash_map a number of times. A use case that's > close to yours (i.e., a non-trivial value type) is in cp/parser.c: > see class_to_loc_map_t. Indeed, at the time you sent this email, I already had started looking into that one! (The Fortran test cases that I originally analyzed, which triggered other cases of non-POD/non-trivial destructor, all didn't result in a memory leak, because the non-trivial constructor doesn't actually allocate any resources dynamically -- that's indeed different in this case here.) ..., and indeed: > (I don't remember if I tested it for leaks > though. It's used to implement -Wmismatched-tags so compiling > a few tests under Valgrind should show if it does leak.) ... it does leak memory at present. :-| (See attached commit log for details for one example.) >> >> (Attached "Fix 'hash_table::expand' to destruct stale Value objects" >> again.) >> To that effect, to document the current behavior, I propose to "Add more self-tests for 'hash_map' with Value type with non-trivial constructor/destructor" >> >> (We've done that in commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b >> "Add more self-tests for 'hash_map' with Value type with non-trivial >> constructor/destructor", quickly followed by bug fix >> commit bb04a03c6f9bacc890118b9e12b657503093c2f8 >> "Make 'gcc/hash-map-tests.c:test_map_of_type_with_ctor_and_dtor_expand' >> work on 32-bit architectures [PR101959]". >> (Also cherry-pick into release branches, eventually?) >> >> Then: >> >>record_field_map_t field_map ([...]); // see below >>for ([...]) >> { >>tree record_type = [...]; >>[...] >>bool existed; >>field_map_t &fields >> = field_map.get_or_insert (record_type, &existed); >>gcc_checking_assert (!existed); >>[...] >>for ([...]) >> fields.put ([...], [...]); >>[...] >> } >>[stuff that looks up elements from 'field_map'] >>field_map.empty (); >> >> This generally works. >> >> If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy. >> If however I instantiate 'record_field_map_t field_map (13);' (where '13' >> would be the default for 'hash_map'), Valgrind complains: >> >>2,080 bytes in 10 blocks are definitely lost in loss record 828 >> of 876 >> at 0x483DD99: calloc (vg_replace_malloc.c:762) >> by 0x175F010: xcal
GNU Tools @ LPC 2021: Program is published
Hi all, The program for the GNU Tools Track at Linux Plumbers Conference is published: https://linuxplumbersconf.org/event/11/sessions/109/ A total of 25 talks, lightning talks and BoFs plus our regular Q&A session with the GCC steering committee and Glibc, GDB and binutils stewards. The sessions run 07:00-11:00 Pacific Time from Monday 20 September - Thursday 23 September. This means there is no clash with the LPC Toolchains and Kernel Microconference on Friday 24 September. This is a virtual conference hosted using BigBlueButton. It isn't a free conference - you'll need to sign up for a ticket (for all of LPC) to participate. However the talks should also be live streamed free on YouTube. https://linuxplumbersconf.org/event/11/page/112-attend Speakers should be contacted next week with details of their free tickets. Contact me or sarah.c...@embecosm.com if you are a speaker and don't get an email. Best wishes, Jeremy -- Cell: +44 (7970) 676050 SkypeID: jeremybennett Twitter: @jeremypbennett Email: jeremy.benn...@embecosm.com Web: www.embecosm.com PGP key: 1024D/BEF58172FB4754E1 2009-03-20 OpenPGP_signature Description: OpenPGP digital signature
Exact inform format escape sequence (GCC 10 or GCC 11)
Hello all, In the Bismon static source code analyzer on https://github.com/bstarynk/bismon/ commit ad8b6270691e (funded by http://decoder-project.eu/ ) which contains some GPLv3+ GCC plugin code under directory gccplugins/ I am getting when compiling it gcc10_metaplugin_BMGCC.cc: In function ‘int plugin_init(plugin_name_args*, plugin_gcc_version*)’: gcc10_metaplugin_BMGCC.cc:165:85: warning: unquoted whitespace character ‘\x0a’ in format [-Wformat-diag] 165 | warning(UNKNOWN_LOCATION, "BISMON GCC10 METAPLUGIN: datestamp difference for %s:\n" | ^~~ 166 | " plugin has %s, GCC had %s; this is risky.", | ~~ gcc10_metaplugin_BMGCC.cc:169:84: warning: unquoted whitespace character ‘\x0a’ in format [-Wformat-diag] 169 | warning(UNKNOWN_LOCATION, "BISMON GCC10 METAPLUGIN: devphase difference for %s:\n" | ^~~ 170 | " plugin has %s, GCC had %s; this is risky.", | ~~ gcc10_metaplugin_BMGCC.cc:174:89: warning: unquoted whitespace character ‘\x0a’ in format [-Wformat-diag] 174 | warning(UNKNOWN_LOCATION, "BISMON GCC10 METAPLUGIN: configuration difference for %s:\n" | ^~~ 175 | " plugin has %s, GCC had %s; this is risky.", | ~~ gcc10_metaplugin_BMGCC.cc: In function ‘void parse_plugin_arguments(const char*, plugin_name_args*)’: gcc10_metaplugin_BMGCC.cc:405:53: warning: unquoted sequence of 2 consecutive space characters in format [-Wformat-diag] 405 | inform (UNKNOWN_LOCATION, "Bismon plugin %qs (%s:%d) will handle GCC include-file events with prefix %qs", | ^~ Where can I read the complete specification of % escape sequences for inform? Thanks Basile Starynkevitch (only mine opinions / les opinions sont miennes uniquement) 92340 Bourg-la-Reine, France web page: starynkevitch.net/Basile/
Re: More aggressive threading causing loop-interchange-9.c regression
On 9/9/2021 3:21 AM, Aldy Hernandez wrote: /* If this path does not thread through the loop latch, then we are using the FSM threader to find old style jump threads. This is good, except the FSM threader does not re-use an existing threading path to reduce code duplication. So for that case, drastically reduce the number of statements we are allowed to copy. */ *blink* Woah. The backward threader has been using FSM threads indiscriminately as far as I can remember. I wonder what would break if we "fixed it". ?!? I'm not sure what you're suggesting here. If you s/FSM threader/backwards threader/ in the comment does it make more sense? The term FSM really should largely have been dropped as the backwards threader was improved to handle more cases. so these cases should use the "old style" validity/costing metrics and thus classify threading opportunities in a different way? Jeff, do you have any insight here? This is precisely what you're cleaning up. I think today "backwards" vs, "forwards" only refers to the way we find threading opportunities. Yes, it's a mess. I ran some experiments a while back, and my current work on the enhanced solver/threader, can fold virtually everything the DOM/threader gets (even with its use of const_and_copies, avail_exprs, and evrp_range_analyzer), while getting 5% more DOM threads and 1% more overall threads. That is, I've been testing if the path solver can solve everything the DOM threader needs (the hybrid approach I mentioned). Unfortunately, replacing the forward threader right now is not feasible for a few reasons: Right. But I thought the short term goal was to replace/remove the forward threading from VRP. Dropping from DOM is going to be tougher. a) The const_and_copies/avail_exprs relation framework can do floats, and that's next year's ranger work. Right. I'd actually run into this as well when I wanted to drop all the range bits out of DOM and rely exclusively on EVRP. It'd still be a step forward to rip out the EVRP engine from DOM and simplify all the code that derives one equivalence from another so that it's only working on FP values. b) Even though we can seemingly fold everything DOM/threader does, in order to replace it with a backward threader instance we'd have to merge the cost/profitability code scattered throughout the forward threader, as well as the EDGE_FSM* / EDGE_COPY* business. Right. This is a prerequisite. Though some of the costing will need to be conditional on the threader being used. Refer back to the discussion around how the forward threader can commonize thread paths that lead to the same destination while the backwards threader can not. c) DOM changes the IL as it goes. Though we could conceivably divorce do the threading after DOM is done. The only reason threading runs in parallel with DOM is so that it can use the context sensitive equivalences. With the infrastructure you're building, there's a reasonable chance we can move to a model where we run DOM (and in the long term a simpler DOM) and threading as distinct, independent passes. Jeff
Re: More aggressive threading causing loop-interchange-9.c regression
On 9/9/2021 4:15 AM, Richard Biener wrote: b) Even though we can seemingly fold everything DOM/threader does, in order to replace it with a backward threader instance we'd have to merge the cost/profitability code scattered throughout the forward threader, as well as the EDGE_FSM* / EDGE_COPY* business. c) DOM changes the IL as it goes. Though we could conceivably divorce do the threading after DOM is done. Yeah, it does not actually process/simplify the blocks copied by threading. In fact I think it only registers jump threading opportunities during the DOM walk and commits them only later. But it of course uses its avail / copies stack to find them - that part you cannot easily divorce. Well, divorcing from using the context sensitive avail/copies is part of what Aldy & Andrew have been working on. All indications I've seen are they're on track to be able to do that. And yes, it only registers the threads and waits until after DOM is done to transform the CFG. That in and of itself introduces all kinds of complexity. If we can get to the point where we don't need the context sensitive avail/copies, then we've got a real shot at untangling DOM and threading which would be a huge maintainability win in my mind. DOM is also yet another value-numbering framework - one could think of stripping it down from that side, keeping the threading bits only (but you'd still have the avail / copies bits). Yes. I think you and I touched on this a while back. At a high level I'd prefer to have FRE rather than DOM doing the bulk of the redundant expression elimination. The big blocker there was the tight integration of DOM and threading. But if Aldy can untangle that we can then evaluate replacing DOM with FRE. That said, it has one nice property it can leverage due to its incredibly simple memory redundancy handling, in that it usually performs way less alias queries than FRE (even when you run the latter in non-iterative mode). DOM as an infrastructure for optimization is probably reaching the end of its useful life. FRE has a lot more going for it. But the same way DOM can register jump threading opportunities FRE could do as well. I'd advise against that and instead look towards a model where no pass has integrated jump threading and the only jump threading module we have is the backwards threader. jeff
Re: More aggressive threading causing loop-interchange-9.c regression
On 9/10/21 5:43 PM, Jeff Law wrote: On 9/9/2021 3:21 AM, Aldy Hernandez wrote: /* If this path does not thread through the loop latch, then we are using the FSM threader to find old style jump threads. This is good, except the FSM threader does not re-use an existing threading path to reduce code duplication. So for that case, drastically reduce the number of statements we are allowed to copy. */ *blink* Woah. The backward threader has been using FSM threads indiscriminately as far as I can remember. I wonder what would break if we "fixed it". ?!? I'm not sure what you're suggesting here. If you s/FSM threader/backwards threader/ in the comment does it make more sense? The term FSM really should largely have been dropped as the backwards threader was improved to handle more cases. back_threader_registry::register_path() uses EDGE_FSM_THREAD as the thread type to register threads. I was wondering if it should have been some combination of EDGE_START_JUMP_THREAD / EDGE_*_COPY_SRC_BLOCK, etc. I (purposely) know nothing about the underlying threading types ;-). But if the backwards threader has been improved, then perhaps we should just remove the confusing FSM references. so these cases should use the "old style" validity/costing metrics and thus classify threading opportunities in a different way? Jeff, do you have any insight here? This is precisely what you're cleaning up. I think today "backwards" vs, "forwards" only refers to the way we find threading opportunities. Yes, it's a mess. I ran some experiments a while back, and my current work on the enhanced solver/threader, can fold virtually everything the DOM/threader gets (even with its use of const_and_copies, avail_exprs, and evrp_range_analyzer), while getting 5% more DOM threads and 1% more overall threads. That is, I've been testing if the path solver can solve everything the DOM threader needs (the hybrid approach I mentioned). Unfortunately, replacing the forward threader right now is not feasible for a few reasons: Right. But I thought the short term goal was to replace/remove the forward threading from VRP. Dropping from DOM is going to be tougher. My current thinking is that replacing the forward VRP threader with a hybrid one is a gentler approach to the longer term goal of replacing the forward threader altogether. However, all the work I've been doing could go either way-- we could try the forward/VRP replacement or a hybrid approach. It will all use the path solver underneath. My main problem with replacing the forward/VRP with a backward client is that the cost models are so different that it was difficult to compare how we fared. I constantly ran into threads the solver could handle just fine, but profitable_path_p was holding it back. FWIW, we get virtually everything the forward threader gets, minus a very few things. At least when I plug in the solver to the DOM/forwarder threader, it can solve everything it can (minus noise and floats). If you prefer a backward threader instance to replace the VRP/forward threader, I'm game. It's just harder to compare. Either way (backward threader or a hybrid forward+solver) uses the same underlying solver which is solid. a) The const_and_copies/avail_exprs relation framework can do floats, and that's next year's ranger work. Right. I'd actually run into this as well when I wanted to drop all the range bits out of DOM and rely exclusively on EVRP. It'd still be a step forward to rip out the EVRP engine from DOM and simplify all the code that derives one equivalence from another so that it's only working on FP values. Sure. b) Even though we can seemingly fold everything DOM/threader does, in order to replace it with a backward threader instance we'd have to merge the cost/profitability code scattered throughout the forward threader, as well as the EDGE_FSM* / EDGE_COPY* business. Right. This is a prerequisite. Though some of the costing will need to be conditional on the threader being used. Refer back to the discussion around how the forward threader can commonize thread paths that lead to the same destination while the backwards threader can not. Yup, yup. c) DOM changes the IL as it goes. Though we could conceivably divorce do the threading after DOM is done. The only reason threading runs in parallel with DOM is so that it can use the context sensitive equivalences. With the infrastructure you're building, there's a reasonable chance we can move to a model where we run DOM (and in the long term a simpler DOM) and threading as distinct, independent passes. Andrew mumbled something about replacing all of DOM eventually :-). Well, except that value-numbering business I bet. Aldy
Re: More aggressive threading causing loop-interchange-9.c regression
On 9/10/21 5:51 PM, Jeff Law wrote: On 9/9/2021 4:15 AM, Richard Biener wrote: b) Even though we can seemingly fold everything DOM/threader does, in order to replace it with a backward threader instance we'd have to merge the cost/profitability code scattered throughout the forward threader, as well as the EDGE_FSM* / EDGE_COPY* business. c) DOM changes the IL as it goes. Though we could conceivably divorce do the threading after DOM is done. Yeah, it does not actually process/simplify the blocks copied by threading. In fact I think it only registers jump threading opportunities during the DOM walk and commits them only later. But it of course uses its avail / copies stack to find them - that part you cannot easily divorce. Well, divorcing from using the context sensitive avail/copies is part of what Aldy & Andrew have been working on. All indications I've seen are they're on track to be able to do that. And yes, it only registers the threads and waits until after DOM is done to transform the CFG. That in and of itself introduces all kinds of complexity. If we can get to the point where we don't need the context sensitive avail/copies, then we've got a real shot at untangling DOM and threading which would be a huge maintainability win in my mind. DOM is also yet another value-numbering framework - one could think of stripping it down from that side, keeping the threading bits only (but you'd still have the avail / copies bits). Yes. I think you and I touched on this a while back. At a high level I'd prefer to have FRE rather than DOM doing the bulk of the redundant expression elimination. The big blocker there was the tight integration of DOM and threading. But if Aldy can untangle that we can then evaluate replacing DOM with FRE. Once ranger does floats, I can't think of anything the forward threader could get that the backward threader couldn't. That said, it has one nice property it can leverage due to its incredibly simple memory redundancy handling, in that it usually performs way less alias queries than FRE (even when you run the latter in non-iterative mode). DOM as an infrastructure for optimization is probably reaching the end of its useful life. FRE has a lot more going for it. But the same way DOM can register jump threading opportunities FRE could do as well. I'd advise against that and instead look towards a model where no pass has integrated jump threading and the only jump threading module we have is the backwards threader. Yes, please. We need to separate jump threading from all passes. One thing, and do it well. Aldy
Re: More aggressive threading causing loop-interchange-9.c regression
On 9/10/2021 10:05 AM, Aldy Hernandez wrote: On 9/10/21 5:43 PM, Jeff Law wrote: On 9/9/2021 3:21 AM, Aldy Hernandez wrote: /* If this path does not thread through the loop latch, then we are using the FSM threader to find old style jump threads. This is good, except the FSM threader does not re-use an existing threading path to reduce code duplication. So for that case, drastically reduce the number of statements we are allowed to copy. */ *blink* Woah. The backward threader has been using FSM threads indiscriminately as far as I can remember. I wonder what would break if we "fixed it". ?!? I'm not sure what you're suggesting here. If you s/FSM threader/backwards threader/ in the comment does it make more sense? The term FSM really should largely have been dropped as the backwards threader was improved to handle more cases. back_threader_registry::register_path() uses EDGE_FSM_THREAD as the thread type to register threads. I was wondering if it should have been some combination of EDGE_START_JUMP_THREAD / EDGE_*_COPY_SRC_BLOCK, etc. I (purposely) know nothing about the underlying threading types ;-). But if the backwards threader has been improved, then perhaps we should just remove the confusing FSM references. No we shouldn't change it to any of the other types. EDGE_FSM_THREAD means a thread found by the backwards threader and it's the key used to determine which of the two CFG updating mechanisms should be used, the generic copier in the case of EDGE_FSM_THREAD. Changing the name, yes, absolutely. I probably should have done that when the original FSM threader was tweaked to handle generic threading. As you've probably heard me mention before, all the EDGE_FSM_THREAD stuff in the registry really could be pulled out. The registry's purpose is to deal with the two stage nature of jump threading in DOM (find threads, then optimize them later). I don't think any of the backwards threading bits need that two stage handling. My current thinking is that replacing the forward VRP threader with a hybrid one is a gentler approach to the longer term goal of replacing the forward threader altogether. However, all the work I've been doing could go either way-- we could try the forward/VRP replacement or a hybrid approach. It will all use the path solver underneath. And that's probably a reasonable intermediate step on the way towards removing the VRP threading. My main problem with replacing the forward/VRP with a backward client is that the cost models are so different that it was difficult to compare how we fared. I constantly ran into threads the solver could handle just fine, but profitable_path_p was holding it back. Yea. Sorry about that tangle of insanity FWIW, we get virtually everything the forward threader gets, minus a very few things. At least when I plug in the solver to the DOM/forwarder threader, it can solve everything it can (minus noise and floats). So once you plug those bits in, we don't have to carry around the avail/copies tables for the threader anymore, right? That's a nice cleanup in and of itself. If you prefer a backward threader instance to replace the VRP/forward threader, I'm game. It's just harder to compare. Either way (backward threader or a hybrid forward+solver) uses the same underlying solver which is solid. I think we can go hybrid, then look at the next step, which could well be bringing some consistency into the costing models. c) DOM changes the IL as it goes. Though we could conceivably divorce do the threading after DOM is done. The only reason threading runs in parallel with DOM is so that it can use the context sensitive equivalences. With the infrastructure you're building, there's a reasonable chance we can move to a model where we run DOM (and in the long term a simpler DOM) and threading as distinct, independent passes. Andrew mumbled something about replacing all of DOM eventually :-). Well, except that value-numbering business I bet. Essentially a realization of Bodik's work in GCC. The nugget in there is it's a path sensitive optimizer. That's kindof what I've envisioned DOM turning into. 1. We separate out jump threading from DOM. 2. We replace the bulk of DOM with FRE 3. The remnants of DOM turn into a path sensitive optimizer (and for god's sake we don't want to call it DOM anymore :-) Jeff
Re: More aggressive threading causing loop-interchange-9.c regression
On 9/10/21 6:21 PM, Jeff Law wrote: On 9/10/2021 10:05 AM, Aldy Hernandez wrote: On 9/10/21 5:43 PM, Jeff Law wrote: On 9/9/2021 3:21 AM, Aldy Hernandez wrote: /* If this path does not thread through the loop latch, then we are using the FSM threader to find old style jump threads. This is good, except the FSM threader does not re-use an existing threading path to reduce code duplication. So for that case, drastically reduce the number of statements we are allowed to copy. */ *blink* Woah. The backward threader has been using FSM threads indiscriminately as far as I can remember. I wonder what would break if we "fixed it". ?!? I'm not sure what you're suggesting here. If you s/FSM threader/backwards threader/ in the comment does it make more sense? The term FSM really should largely have been dropped as the backwards threader was improved to handle more cases. back_threader_registry::register_path() uses EDGE_FSM_THREAD as the thread type to register threads. I was wondering if it should have been some combination of EDGE_START_JUMP_THREAD / EDGE_*_COPY_SRC_BLOCK, etc. I (purposely) know nothing about the underlying threading types ;-). But if the backwards threader has been improved, then perhaps we should just remove the confusing FSM references. No we shouldn't change it to any of the other types. EDGE_FSM_THREAD means a thread found by the backwards threader and it's the key used to determine which of the two CFG updating mechanisms should be used, the generic copier in the case of EDGE_FSM_THREAD. Changing the name, yes, absolutely. I probably should have done that when the original FSM threader was tweaked to handle generic threading. I'll put that on my list. As you've probably heard me mention before, all the EDGE_FSM_THREAD stuff in the registry really could be pulled out. The registry's purpose is to deal with the two stage nature of jump threading in DOM (find threads, then optimize them later). I don't think any of the backwards threading bits need that two stage handling. Yeah yeah. But you said that a year ago, and all I heard was *mumble mumble/complicated things*. That was before I had much exposure to the code. Now I feel a lot more comfortable ;-). I'll also put this on my list, but it may not get done this release, cause I'm running against the clock with the VRP/threader replacement, which is what's keeping us back from replacing VRP with an evrp instance right now :). My current thinking is that replacing the forward VRP threader with a hybrid one is a gentler approach to the longer term goal of replacing the forward threader altogether. However, all the work I've been doing could go either way-- we could try the forward/VRP replacement or a hybrid approach. It will all use the path solver underneath. And that's probably a reasonable intermediate step on the way towards removing the VRP threading. My main problem with replacing the forward/VRP with a backward client is that the cost models are so different that it was difficult to compare how we fared. I constantly ran into threads the solver could handle just fine, but profitable_path_p was holding it back. Yea. Sorry about that tangle of insanity FWIW, we get virtually everything the forward threader gets, minus a very few things. At least when I plug in the solver to the DOM/forwarder threader, it can solve everything it can (minus noise and floats). So once you plug those bits in, we don't have to carry around the avail/copies tables for the threader anymore, right? That's a nice cleanup in and of itself. Correct. For the VRP/hybrid approach I'm working on, there are no copies/avails. The solver can do everything they did. After all, it's an easier target, since VRP threading only happens on ints and without the IL changing constantly. If you prefer a backward threader instance to replace the VRP/forward threader, I'm game. It's just harder to compare. Either way (backward threader or a hybrid forward+solver) uses the same underlying solver which is solid. I think we can go hybrid, then look at the next step, which could well be bringing some consistency into the costing models. c) DOM changes the IL as it goes. Though we could conceivably divorce do the threading after DOM is done. The only reason threading runs in parallel with DOM is so that it can use the context sensitive equivalences. With the infrastructure you're building, there's a reasonable chance we can move to a model where we run DOM (and in the long term a simpler DOM) and threading as distinct, independent passes. Andrew mumbled something about replacing all of DOM eventually :-). Well, except that value-numbering business I bet. Essentially a realization of Bodik's work in GCC. The nugget in there is it's a path sensitive optimizer. That's kindof what I've envisioned DOM turning into. 1. We
gcc-10-20210910 is now available
Snapshot gcc-10-20210910 is now available on https://gcc.gnu.org/pub/gcc/snapshots/10-20210910/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 10 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-10 revision 80b1492b2de153f4850a32cafcd8f4d37c2c84fc You'll find: gcc-10-20210910.tar.xz Complete GCC SHA256=88ff63a91315e5a1b5a9d7d7a09f6e4be41386dc54c4adf5f00f9280c1bfb3e4 SHA1=e80d8e32283a4441878d14d241e88ddaf714da85 Diffs from 10-20210903 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-10 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.