Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Aldy Hernandez via Gcc
I think things are clear enough to propose a patch.  Thanks for 
everyone's input.


I have added some comments and tweaked Michael's patch to include the 
final BB where we're threading to, in the check.  Without this last 
check, we fail in tree-ssa/cunroll-15.c because the latch becomes 
polluted with PHI nodes.


OK for trunk?
Aldy

commit e735a2cd01485773635bd66c97af21059992d5a7 (HEAD -> 
pending/global-ranges)

Author: Aldy Hernandez 
Date:   Thu Sep 9 20:30:28 2021 +0200

Disable threading through latches until after loop optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

I have added a bit in cfun to indicate when the "loopdone" optimization
has completed.  This helps the path threader determine when it's 
safe to

thread through loop structures.  I can adapt the patch if there is an
alternate way of determining this.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same 
restrictions to
the forward threader.  At some point I'd like to combine the cost 
models.


Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html
>From e735a2cd01485773635bd66c97af21059992d5a7 Mon Sep 17 00:00:00 2001
From: Aldy Hernandez 
Date: Thu, 9 Sep 2021 20:30:28 +0200
Subject: [PATCH] Disable threading through latches until after loop
 optimizations.

The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.

I have added a bit in cfun to indicate when the "loopdone" optimization
has completed.  This helps the path threader determine when it's safe to
thread through loop structures.  I can adapt the patch if there is an
alternate way of determining this.

As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]).  I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite.  They're mostly noise
now.

Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point.  I have added a reminder to the function to keep this
in mind.

Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader.  At some point I'd like to combine the cost models.

Tested on x86-64 Linux.

p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:

	https://gcc.gnu.org/pipermail/gcc/2021-September/237247.html

gcc/ChangeLog:

	* function.h (struct function): Add loop_optimizers_done.
	(set_loop_optimizers_done): New.
	(loop_optimizers_done_p): New.
	* gimple-range-path.cc (path_range_query::internal_range_of_expr):
	Intersect with global range.
	* tree-ssa-loop.c (tree_ssa_loop_done): Call set_loop_optimizers_done.
	* tree-ssa-threadbackward.c
	(back_threader_profitability::profitable_path_p): Disable
	threading through latches until after loop optimizations have run.

gcc/testsuite/ChangeLog:

	* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
	threading through latches.
	* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
	* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.

Co-authored-by: Michael Matz 
---
 gcc/function.h| 15 
 gcc/gimple-range-path.cc  |  3 ++
 .../gcc.dg/tree-ssa/ssa-dom-thread-2b.c   |  4 +-
 .../gcc.dg/tree-ssa/ssa-dom-thread-6.c| 37 +--
 .../gcc.dg/tree-ssa/ssa-dom-thread-7.c| 17 +
 gcc/tree-ssa-loop.c   |  1 +
 gcc/tree-ssa-threadbackward.c | 28 +-
 7 files changed

[PING] Re: Fix 'hash_table::expand' to destruct stale Value objects

2021-09-10 Thread Thomas Schwinge
Hi!

On 2021-09-01T19:31:19-0600, Martin Sebor via Gcc-patches 
 wrote:
> On 8/30/21 4:46 AM, Thomas Schwinge wrote:
>> Ping -- we still need to plug the memory leak; see patch attached, and/or
>> long discussion here:
>
> Thanks for answering my questions.  I have no concerns with going
> forward with the patch as is.

Thanks, Martin.  Ping for formal approval (and review for using proper
C++ terminology in the 'gcc/hash-table.h:hash_table::expand' source code
comment that I'm adding).  Patch again attached, for easy reference.


> Just a suggestion/request: unless
> this patch fixes all the outstanding problems you know of or suspect
> in this area (leaks/missing dtor calls) and unless you plan to work
> on those in the near future, please open a bug for them with a brain
> dump of what you learned.  That should save us time when the day
> comes to tackle those.

ACK.  I'm not aware of any additional known problems.  (In our email
discussion, we did have some "vague ideas" for opportunities of
clarification/clean-up, but these aren't worth filing PRs for; needs
someone to gain understanding, taking a look.)


Grüße
 Thomas


>> On 2021-08-16T14:10:00-0600, Martin Sebor  wrote:
>>> On 8/16/21 6:44 AM, Thomas Schwinge wrote:
 On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  wrote:
> On 8/6/21 10:57 AM, Thomas Schwinge wrote:
>> So I'm trying to do some C++...  ;-)
>>
>> Given:
>>
>>/* A map from SSA names or var decls to record fields.  */
>>typedef hash_map field_map_t;
>>
>>/* For each propagation record type, this is a map from SSA names 
>> or var decls
>>   to propagate, to the field in the record type that should be 
>> used for
>>   transmission and reception.  */
>>typedef hash_map record_field_map_t;
>>
>> Thus, that's a 'hash_map>'.  (I may do that,
>> right?)  Looking through GCC implementation files, very most of all uses
>> of 'hash_map' boil down to pointer key ('tree', for example) and
>> pointer/integer value.
>
> Right.  Because most GCC containers rely exclusively on GCC's own
> uses for testing, if your use case is novel in some way, chances
> are it might not work as intended in all circumstances.
>
> I've wrestled with hash_map a number of times.  A use case that's
> close to yours (i.e., a non-trivial value type) is in cp/parser.c:
> see class_to_loc_map_t.

 Indeed, at the time you sent this email, I already had started looking
 into that one!  (The Fortran test cases that I originally analyzed, which
 triggered other cases of non-POD/non-trivial destructor, all didn't
 result in a memory leak, because the non-trivial constructor doesn't
 actually allocate any resources dynamically -- that's indeed different in
 this case here.)  ..., and indeed:

> (I don't remember if I tested it for leaks
> though.  It's used to implement -Wmismatched-tags so compiling
> a few tests under Valgrind should show if it does leak.)

 ... it does leak memory at present.  :-| (See attached commit log for
 details for one example.)
>>
>> (Attached "Fix 'hash_table::expand' to destruct stale Value objects"
>> again.)
>>
 To that effect, to document the current behavior, I propose to
 "Add more self-tests for 'hash_map' with Value type with non-trivial
 constructor/destructor"
>>
>> (We've done that in commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b
>> "Add more self-tests for 'hash_map' with Value type with non-trivial
>> constructor/destructor", quickly followed by bug fix
>> commit bb04a03c6f9bacc890118b9e12b657503093c2f8
>> "Make 'gcc/hash-map-tests.c:test_map_of_type_with_ctor_and_dtor_expand'
>> work on 32-bit architectures [PR101959]".
>>
 (Also cherry-pick into release branches, eventually?)
>>
>> Then:
>>
>>record_field_map_t field_map ([...]); // see below
>>for ([...])
>>  {
>>tree record_type = [...];
>>[...]
>>bool existed;
>>field_map_t &fields
>>  = field_map.get_or_insert (record_type, &existed);
>>gcc_checking_assert (!existed);
>>[...]
>>for ([...])
>>  fields.put ([...], [...]);
>>[...]
>>  }
>>[stuff that looks up elements from 'field_map']
>>field_map.empty ();
>>
>> This generally works.
>>
>> If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy.
>> If however I instantiate 'record_field_map_t field_map (13);' (where '13'
>> would be the default for 'hash_map'), Valgrind complains:
>>
>>2,080 bytes in 10 blocks are definitely lost in loss record 828 
>> of 876
>>   at 0x483DD99: calloc (vg_replace_malloc.c:762)
>>   by 0x175F010: xcal

GNU Tools @ LPC 2021: Program is published

2021-09-10 Thread Jeremy Bennett
Hi all,

The program for the GNU Tools Track at Linux Plumbers Conference is
published:

  https://linuxplumbersconf.org/event/11/sessions/109/

A total of 25 talks, lightning talks and BoFs plus our regular Q&A
session with the GCC steering committee and Glibc, GDB and binutils
stewards. The sessions run 07:00-11:00 Pacific Time from Monday 20
September - Thursday 23 September. This means there is no clash with the
LPC Toolchains and Kernel Microconference on Friday 24 September.

This is a virtual conference hosted using BigBlueButton. It isn't a free
conference - you'll need to sign up for a ticket (for all of LPC) to
participate. However the talks should also be live streamed free on YouTube.

  https://linuxplumbersconf.org/event/11/page/112-attend

Speakers should be contacted next week with details of their free
tickets. Contact me or sarah.c...@embecosm.com if you are a speaker and
don't get an email.

Best wishes,


Jeremy

-- 
Cell: +44 (7970) 676050
SkypeID: jeremybennett
Twitter: @jeremypbennett
Email: jeremy.benn...@embecosm.com
Web: www.embecosm.com
PGP key: 1024D/BEF58172FB4754E1 2009-03-20



OpenPGP_signature
Description: OpenPGP digital signature


Exact inform format escape sequence (GCC 10 or GCC 11)

2021-09-10 Thread Basile Starynkevitch

Hello all,

In the Bismon static source code analyzer on 
https://github.com/bstarynk/bismon/ commit ad8b6270691e


(funded by http://decoder-project.eu/ ) which contains some GPLv3+ 
GCC plugin code under directory gccplugins/


I am getting when compiling it


gcc10_metaplugin_BMGCC.cc: In function ‘int 
plugin_init(plugin_name_args*, plugin_gcc_version*)’:
gcc10_metaplugin_BMGCC.cc:165:85: warning: unquoted whitespace character 
‘\x0a’ in format [-Wformat-diag]
  165 | warning(UNKNOWN_LOCATION, "BISMON GCC10 METAPLUGIN: 
datestamp difference for %s:\n"

| ^~~
  166 |  " plugin has %s, GCC had %s; this is risky.",
  | ~~
gcc10_metaplugin_BMGCC.cc:169:84: warning: unquoted whitespace character 
‘\x0a’ in format [-Wformat-diag]
  169 | warning(UNKNOWN_LOCATION, "BISMON GCC10 METAPLUGIN: 
devphase difference for %s:\n"

| ^~~
  170 |  " plugin has %s, GCC had %s; this is risky.",
  | ~~
gcc10_metaplugin_BMGCC.cc:174:89: warning: unquoted whitespace character 
‘\x0a’ in format [-Wformat-diag]
  174 | warning(UNKNOWN_LOCATION, "BISMON GCC10 METAPLUGIN: 
configuration difference for %s:\n"

| ^~~
  175 |  " plugin has %s, GCC had %s; this is risky.",
  | ~~
gcc10_metaplugin_BMGCC.cc: In function ‘void 
parse_plugin_arguments(const char*, plugin_name_args*)’:
gcc10_metaplugin_BMGCC.cc:405:53: warning: unquoted sequence of 2 
consecutive space characters in format [-Wformat-diag]
  405 | inform (UNKNOWN_LOCATION, "Bismon plugin %qs (%s:%d) 
will handle GCC include-file events with prefix %qs",

  | ^~



Where can I read the complete specification of % escape sequences for 
inform?



Thanks




Basile Starynkevitch  
(only mine opinions / les opinions sont miennes uniquement)
92340 Bourg-la-Reine, France
web page: starynkevitch.net/Basile/



Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Jeff Law via Gcc




On 9/9/2021 3:21 AM, Aldy Hernandez wrote:




   /* If this path does not thread through the loop latch, then we are
  using the FSM threader to find old style jump threads. This
  is good, except the FSM threader does not re-use an existing
  threading path to reduce code duplication.

  So for that case, drastically reduce the number of statements
  we are allowed to copy.  */


*blink*

Woah.  The backward threader has been using FSM threads 
indiscriminately as far as I can remember.  I wonder what would break 
if we "fixed it".
?!?  I'm not sure what you're suggesting here.  If you s/FSM 
threader/backwards threader/ in the comment does it make more sense?   
The term FSM really should largely have been dropped as the backwards 
threader was improved to handle more cases.









so these cases should use the "old style" validity/costing metrics 
and thus

classify threading opportunities in a different way?


Jeff, do you have any insight here?

This is precisely what you're cleaning up.






I think today "backwards" vs, "forwards" only refers to the way we find
threading opportunities.


Yes, it's a mess.

I ran some experiments a while back, and my current work on the 
enhanced solver/threader, can fold virtually everything the 
DOM/threader gets (even with its use of const_and_copies, avail_exprs, 
and evrp_range_analyzer), while getting 5% more DOM threads and 1% 
more overall threads.  That is, I've been testing if the path solver 
can solve everything the DOM threader needs (the hybrid approach I 
mentioned).


Unfortunately, replacing the forward threader right now is not 
feasible for a few reasons:
Right.  But I thought the short term goal was to replace/remove the 
forward threading from VRP.   Dropping from DOM is going to be tougher.




a) The const_and_copies/avail_exprs relation framework can do floats, 
and that's next year's ranger work.
Right.  I'd actually run into this as well when I wanted to drop all the 
range bits out of DOM and rely exclusively on EVRP.   It'd still be a 
step forward to rip out the EVRP engine from DOM and simplify all the 
code that derives one equivalence from another so that it's only working 
on FP values.




b) Even though we can seemingly fold everything DOM/threader does, in 
order to replace it with a backward threader instance we'd have to 
merge the cost/profitability code scattered throughout the forward 
threader, as well as the EDGE_FSM* / EDGE_COPY* business.
Right.  This is a prerequisite.  Though some of the costing will need to 
be conditional on the threader being used.  Refer back to the discussion 
around how the forward threader can commonize thread paths that lead to 
the same destination while the backwards threader can not.




c) DOM changes the IL as it goes.  Though we could conceivably divorce 
do the threading after DOM is done.
The only reason threading runs in parallel with DOM is so that it can 
use the context sensitive equivalences.  With the infrastructure you're 
building, there's a reasonable chance we can move to a model where we 
run DOM (and in the long term a simpler DOM) and threading as distinct, 
independent passes.


Jeff


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Jeff Law via Gcc




On 9/9/2021 4:15 AM, Richard Biener wrote:



b) Even though we can seemingly fold everything DOM/threader does, in
order to replace it with a backward threader instance we'd have to merge
the cost/profitability code scattered throughout the forward threader,
as well as the EDGE_FSM* / EDGE_COPY* business.

c) DOM changes the IL as it goes.  Though we could conceivably divorce
do the threading after DOM is done.

Yeah, it does not actually process/simplify the blocks copied by threading.
In fact I think it only registers jump threading opportunities during the DOM
walk and commits them only later.  But it of course uses its avail / copies
stack to find them - that part you cannot easily divorce.
Well, divorcing from using the context sensitive avail/copies is part of 
what Aldy & Andrew have been working on.  All indications I've seen are 
they're on track to be able to do that.


And yes, it only registers the threads and waits until after DOM is done 
to transform the CFG.  That in and of itself introduces all kinds of 
complexity.  If we can get to the point where we don't need the context 
sensitive avail/copies, then we've got a real shot at untangling DOM and 
threading which would be a huge maintainability win in my mind.




DOM is also yet another value-numbering framework - one could think
of stripping it down from that side, keeping the threading bits only
(but you'd still have the avail / copies bits).
Yes.  I think you and I touched on this a while back.    At a high level 
I'd prefer to have FRE rather than DOM doing the bulk of the redundant 
expression elimination.  The big blocker there was the tight integration 
of DOM and threading.  But if Aldy can untangle that we can then 
evaluate replacing DOM with FRE.





That said, it has one nice property it can leverage due to its incredibly
simple memory redundancy handling, in that it usually performs way less
alias queries than FRE (even when you run the latter in non-iterative mode).
DOM as an infrastructure for optimization is probably reaching the end 
of its useful life.  FRE has a lot more going for it.




But the same way DOM can register jump threading opportunities FRE
could do as well.
I'd advise against that and instead look towards a model where no pass 
has integrated jump threading and the only jump threading module we have 
is the backwards threader.


jeff


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Aldy Hernandez via Gcc




On 9/10/21 5:43 PM, Jeff Law wrote:



On 9/9/2021 3:21 AM, Aldy Hernandez wrote:




   /* If this path does not thread through the loop latch, then we are
  using the FSM threader to find old style jump threads. This
  is good, except the FSM threader does not re-use an existing
  threading path to reduce code duplication.

  So for that case, drastically reduce the number of statements
  we are allowed to copy.  */


*blink*

Woah.  The backward threader has been using FSM threads 
indiscriminately as far as I can remember.  I wonder what would break 
if we "fixed it".
?!?  I'm not sure what you're suggesting here.  If you s/FSM 
threader/backwards threader/ in the comment does it make more sense? The 
term FSM really should largely have been dropped as the backwards 
threader was improved to handle more cases.


back_threader_registry::register_path() uses EDGE_FSM_THREAD as the 
thread type to register threads.  I was wondering if it should have been 
some combination of EDGE_START_JUMP_THREAD / EDGE_*_COPY_SRC_BLOCK, etc. 
 I (purposely) know nothing about the underlying threading types ;-). 
But if the backwards threader has been improved, then perhaps we should 
just remove the confusing FSM references.











so these cases should use the "old style" validity/costing metrics 
and thus

classify threading opportunities in a different way?


Jeff, do you have any insight here?

This is precisely what you're cleaning up.






I think today "backwards" vs, "forwards" only refers to the way we find
threading opportunities.


Yes, it's a mess.

I ran some experiments a while back, and my current work on the 
enhanced solver/threader, can fold virtually everything the 
DOM/threader gets (even with its use of const_and_copies, avail_exprs, 
and evrp_range_analyzer), while getting 5% more DOM threads and 1% 
more overall threads.  That is, I've been testing if the path solver 
can solve everything the DOM threader needs (the hybrid approach I 
mentioned).


Unfortunately, replacing the forward threader right now is not 
feasible for a few reasons:
Right.  But I thought the short term goal was to replace/remove the 
forward threading from VRP.   Dropping from DOM is going to be tougher.


My current thinking is that replacing the forward VRP threader with a 
hybrid one is a gentler approach to the longer term goal of replacing 
the forward threader altogether.  However, all the work I've been doing 
could go either way-- we could try the forward/VRP replacement or a 
hybrid approach.  It will all use the path solver underneath.


My main problem with replacing the forward/VRP with a backward client is 
that the cost models are so different that it was difficult to compare 
how we fared.  I constantly ran into threads the solver could handle 
just fine, but profitable_path_p was holding it back.


FWIW, we get virtually everything the forward threader gets, minus a 
very few things.  At least when I plug in the solver to the 
DOM/forwarder threader, it can solve everything it can (minus noise and 
floats).


If you prefer a backward threader instance to replace the VRP/forward 
threader, I'm game.  It's just harder to compare.  Either way (backward 
threader or a hybrid forward+solver) uses the same underlying solver 
which is solid.


a) The const_and_copies/avail_exprs relation framework can do floats, 
and that's next year's ranger work.
Right.  I'd actually run into this as well when I wanted to drop all the 
range bits out of DOM and rely exclusively on EVRP.   It'd still be a 
step forward to rip out the EVRP engine from DOM and simplify all the 
code that derives one equivalence from another so that it's only working 
on FP values.


Sure.





b) Even though we can seemingly fold everything DOM/threader does, in 
order to replace it with a backward threader instance we'd have to 
merge the cost/profitability code scattered throughout the forward 
threader, as well as the EDGE_FSM* / EDGE_COPY* business.
Right.  This is a prerequisite.  Though some of the costing will need to 
be conditional on the threader being used.  Refer back to the discussion 
around how the forward threader can commonize thread paths that lead to 
the same destination while the backwards threader can not.


Yup, yup.





c) DOM changes the IL as it goes.  Though we could conceivably divorce 
do the threading after DOM is done.
The only reason threading runs in parallel with DOM is so that it can 
use the context sensitive equivalences.  With the infrastructure you're 
building, there's a reasonable chance we can move to a model where we 
run DOM (and in the long term a simpler DOM) and threading as distinct, 
independent passes.


Andrew mumbled something about replacing all of DOM eventually :-). 
Well, except that value-numbering business I bet.


Aldy



Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Aldy Hernandez via Gcc




On 9/10/21 5:51 PM, Jeff Law wrote:



On 9/9/2021 4:15 AM, Richard Biener wrote:



b) Even though we can seemingly fold everything DOM/threader does, in
order to replace it with a backward threader instance we'd have to merge
the cost/profitability code scattered throughout the forward threader,
as well as the EDGE_FSM* / EDGE_COPY* business.

c) DOM changes the IL as it goes.  Though we could conceivably divorce
do the threading after DOM is done.
Yeah, it does not actually process/simplify the blocks copied by 
threading.
In fact I think it only registers jump threading opportunities during 
the DOM
walk and commits them only later.  But it of course uses its avail / 
copies

stack to find them - that part you cannot easily divorce.
Well, divorcing from using the context sensitive avail/copies is part of 
what Aldy & Andrew have been working on.  All indications I've seen are 
they're on track to be able to do that.


And yes, it only registers the threads and waits until after DOM is done 
to transform the CFG.  That in and of itself introduces all kinds of 
complexity.  If we can get to the point where we don't need the context 
sensitive avail/copies, then we've got a real shot at untangling DOM and 
threading which would be a huge maintainability win in my mind.




DOM is also yet another value-numbering framework - one could think
of stripping it down from that side, keeping the threading bits only
(but you'd still have the avail / copies bits).
Yes.  I think you and I touched on this a while back.    At a high level 
I'd prefer to have FRE rather than DOM doing the bulk of the redundant 
expression elimination.  The big blocker there was the tight integration 
of DOM and threading.  But if Aldy can untangle that we can then 
evaluate replacing DOM with FRE.


Once ranger does floats, I can't think of anything the forward threader 
could get that the backward threader couldn't.







That said, it has one nice property it can leverage due to its incredibly
simple memory redundancy handling, in that it usually performs way less
alias queries than FRE (even when you run the latter in non-iterative 
mode).
DOM as an infrastructure for optimization is probably reaching the end 
of its useful life.  FRE has a lot more going for it.




But the same way DOM can register jump threading opportunities FRE
could do as well.
I'd advise against that and instead look towards a model where no pass 
has integrated jump threading and the only jump threading module we have 
is the backwards threader.


Yes, please.  We need to separate jump threading from all passes.  One 
thing, and do it well.


Aldy



Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Jeff Law via Gcc




On 9/10/2021 10:05 AM, Aldy Hernandez wrote:



On 9/10/21 5:43 PM, Jeff Law wrote:



On 9/9/2021 3:21 AM, Aldy Hernandez wrote:




   /* If this path does not thread through the loop latch, then we are
  using the FSM threader to find old style jump threads. This
  is good, except the FSM threader does not re-use an existing
  threading path to reduce code duplication.

  So for that case, drastically reduce the number of statements
  we are allowed to copy.  */


*blink*

Woah.  The backward threader has been using FSM threads 
indiscriminately as far as I can remember.  I wonder what would 
break if we "fixed it".
?!?  I'm not sure what you're suggesting here.  If you s/FSM 
threader/backwards threader/ in the comment does it make more sense? 
The term FSM really should largely have been dropped as the backwards 
threader was improved to handle more cases.


back_threader_registry::register_path() uses EDGE_FSM_THREAD as the 
thread type to register threads.  I was wondering if it should have 
been some combination of EDGE_START_JUMP_THREAD / 
EDGE_*_COPY_SRC_BLOCK, etc.  I (purposely) know nothing about the 
underlying threading types ;-). But if the backwards threader has been 
improved, then perhaps we should just remove the confusing FSM 
references.
No we shouldn't change it to any of the other types. EDGE_FSM_THREAD 
means a thread found by the backwards threader and it's the key used to 
determine which of the two CFG updating mechanisms should be used, the 
generic copier in the case of EDGE_FSM_THREAD.



Changing the name, yes, absolutely.  I probably should have done that 
when the original FSM threader was tweaked to handle generic threading.


As you've probably heard me mention before, all the EDGE_FSM_THREAD 
stuff in the registry really could be pulled out.   The registry's 
purpose is to deal with the two stage nature of jump threading in DOM 
(find threads, then optimize them later).  I don't think any of the 
backwards threading bits need that two stage handling.


My current thinking is that replacing the forward VRP threader with a 
hybrid one is a gentler approach to the longer term goal of replacing 
the forward threader altogether.  However, all the work I've been 
doing could go either way-- we could try the forward/VRP replacement 
or a hybrid approach.  It will all use the path solver underneath.
And that's probably a reasonable intermediate step on the way towards 
removing the VRP threading.




My main problem with replacing the forward/VRP with a backward client 
is that the cost models are so different that it was difficult to 
compare how we fared.  I constantly ran into threads the solver could 
handle just fine, but profitable_path_p was holding it back.

Yea.  Sorry about that tangle of insanity



FWIW, we get virtually everything the forward threader gets, minus a 
very few things.  At least when I plug in the solver to the 
DOM/forwarder threader, it can solve everything it can (minus noise 
and floats).
So once you plug those bits in, we don't have to carry around the 
avail/copies tables for the threader anymore, right?  That's a nice 
cleanup in and of itself.




If you prefer a backward threader instance to replace the VRP/forward 
threader, I'm game.  It's just harder to compare. Either way (backward 
threader or a hybrid forward+solver) uses the same underlying solver 
which is solid.
I think we can go hybrid, then look at the next step, which could well 
be bringing some consistency into the costing models.




c) DOM changes the IL as it goes.  Though we could conceivably 
divorce do the threading after DOM is done.
The only reason threading runs in parallel with DOM is so that it can 
use the context sensitive equivalences.  With the infrastructure 
you're building, there's a reasonable chance we can move to a model 
where we run DOM (and in the long term a simpler DOM) and threading 
as distinct, independent passes.


Andrew mumbled something about replacing all of DOM eventually :-). 
Well, except that value-numbering business I bet.
Essentially a realization of Bodik's work in GCC.   The nugget in there 
is it's a path sensitive optimizer.  That's kindof what I've envisioned 
DOM turning into.


1. We separate out jump threading from DOM.
2. We replace the bulk of DOM with FRE
3. The remnants of DOM turn into a path sensitive optimizer (and for 
god's sake we don't want to call it DOM anymore :-)




Jeff


Re: More aggressive threading causing loop-interchange-9.c regression

2021-09-10 Thread Aldy Hernandez via Gcc




On 9/10/21 6:21 PM, Jeff Law wrote:



On 9/10/2021 10:05 AM, Aldy Hernandez wrote:



On 9/10/21 5:43 PM, Jeff Law wrote:



On 9/9/2021 3:21 AM, Aldy Hernandez wrote:




   /* If this path does not thread through the loop latch, then we are
  using the FSM threader to find old style jump threads. This
  is good, except the FSM threader does not re-use an existing
  threading path to reduce code duplication.

  So for that case, drastically reduce the number of statements
  we are allowed to copy.  */


*blink*

Woah.  The backward threader has been using FSM threads 
indiscriminately as far as I can remember.  I wonder what would 
break if we "fixed it".
?!?  I'm not sure what you're suggesting here.  If you s/FSM 
threader/backwards threader/ in the comment does it make more sense? 
The term FSM really should largely have been dropped as the backwards 
threader was improved to handle more cases.


back_threader_registry::register_path() uses EDGE_FSM_THREAD as the 
thread type to register threads.  I was wondering if it should have 
been some combination of EDGE_START_JUMP_THREAD / 
EDGE_*_COPY_SRC_BLOCK, etc.  I (purposely) know nothing about the 
underlying threading types ;-). But if the backwards threader has been 
improved, then perhaps we should just remove the confusing FSM 
references.
No we shouldn't change it to any of the other types. EDGE_FSM_THREAD 
means a thread found by the backwards threader and it's the key used to 
determine which of the two CFG updating mechanisms should be used, the 
generic copier in the case of EDGE_FSM_THREAD.



Changing the name, yes, absolutely.  I probably should have done that 
when the original FSM threader was tweaked to handle generic threading.


I'll put that on my list.



As you've probably heard me mention before, all the EDGE_FSM_THREAD 
stuff in the registry really could be pulled out.   The registry's 
purpose is to deal with the two stage nature of jump threading in DOM 
(find threads, then optimize them later).  I don't think any of the 
backwards threading bits need that two stage handling.


Yeah yeah.  But you said that a year ago, and all I heard was *mumble 
mumble/complicated things*.  That was before I had much exposure to the 
code.  Now I feel a lot more comfortable ;-).


I'll also put this on my list, but it may not get done this release, 
cause I'm running against the clock with the VRP/threader replacement, 
which is what's keeping us back from replacing VRP with an evrp instance 
right now :).




My current thinking is that replacing the forward VRP threader with a 
hybrid one is a gentler approach to the longer term goal of replacing 
the forward threader altogether.  However, all the work I've been 
doing could go either way-- we could try the forward/VRP replacement 
or a hybrid approach.  It will all use the path solver underneath.
And that's probably a reasonable intermediate step on the way towards 
removing the VRP threading.




My main problem with replacing the forward/VRP with a backward client 
is that the cost models are so different that it was difficult to 
compare how we fared.  I constantly ran into threads the solver could 
handle just fine, but profitable_path_p was holding it back.

Yea.  Sorry about that tangle of insanity



FWIW, we get virtually everything the forward threader gets, minus a 
very few things.  At least when I plug in the solver to the 
DOM/forwarder threader, it can solve everything it can (minus noise 
and floats).
So once you plug those bits in, we don't have to carry around the 
avail/copies tables for the threader anymore, right?  That's a nice 
cleanup in and of itself.


Correct.  For the VRP/hybrid approach I'm working on, there are no 
copies/avails.  The solver can do everything they did.  After all, it's 
an easier target, since VRP threading only happens on ints and without 
the IL changing constantly.






If you prefer a backward threader instance to replace the VRP/forward 
threader, I'm game.  It's just harder to compare. Either way (backward 
threader or a hybrid forward+solver) uses the same underlying solver 
which is solid.
I think we can go hybrid, then look at the next step, which could well 
be bringing some consistency into the costing models.




c) DOM changes the IL as it goes.  Though we could conceivably 
divorce do the threading after DOM is done.
The only reason threading runs in parallel with DOM is so that it can 
use the context sensitive equivalences.  With the infrastructure 
you're building, there's a reasonable chance we can move to a model 
where we run DOM (and in the long term a simpler DOM) and threading 
as distinct, independent passes.


Andrew mumbled something about replacing all of DOM eventually :-). 
Well, except that value-numbering business I bet.
Essentially a realization of Bodik's work in GCC.   The nugget in there 
is it's a path sensitive optimizer.  That's kindof what I've envisioned 
DOM turning into.


1. We 

gcc-10-20210910 is now available

2021-09-10 Thread GCC Administrator via Gcc
Snapshot gcc-10-20210910 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20210910/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision 80b1492b2de153f4850a32cafcd8f4d37c2c84fc

You'll find:

 gcc-10-20210910.tar.xz   Complete GCC

  SHA256=88ff63a91315e5a1b5a9d7d7a09f6e4be41386dc54c4adf5f00f9280c1bfb3e4
  SHA1=e80d8e32283a4441878d14d241e88ddaf714da85

Diffs from 10-20210903 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.