Fix 'hash_table::expand' to destruct stale Value objects (was: 'hash_map>')

2021-08-30 Thread Thomas Schwinge
Hi!

Ping -- we still need to plug the memory leak; see patch attached, and/or
long discussion here:

On 2021-08-16T14:10:00-0600, Martin Sebor  wrote:
> On 8/16/21 6:44 AM, Thomas Schwinge wrote:
>> On 2021-08-12T17:15:44-0600, Martin Sebor via Gcc  wrote:
>>> On 8/6/21 10:57 AM, Thomas Schwinge wrote:
 So I'm trying to do some C++...  ;-)

 Given:

   /* A map from SSA names or var decls to record fields.  */
   typedef hash_map field_map_t;

   /* For each propagation record type, this is a map from SSA names or 
 var decls
  to propagate, to the field in the record type that should be used 
 for
  transmission and reception.  */
   typedef hash_map record_field_map_t;

 Thus, that's a 'hash_map>'.  (I may do that,
 right?)  Looking through GCC implementation files, very most of all uses
 of 'hash_map' boil down to pointer key ('tree', for example) and
 pointer/integer value.
>>>
>>> Right.  Because most GCC containers rely exclusively on GCC's own
>>> uses for testing, if your use case is novel in some way, chances
>>> are it might not work as intended in all circumstances.
>>>
>>> I've wrestled with hash_map a number of times.  A use case that's
>>> close to yours (i.e., a non-trivial value type) is in cp/parser.c:
>>> see class_to_loc_map_t.
>>
>> Indeed, at the time you sent this email, I already had started looking
>> into that one!  (The Fortran test cases that I originally analyzed, which
>> triggered other cases of non-POD/non-trivial destructor, all didn't
>> result in a memory leak, because the non-trivial constructor doesn't
>> actually allocate any resources dynamically -- that's indeed different in
>> this case here.)  ..., and indeed:
>>
>>> (I don't remember if I tested it for leaks
>>> though.  It's used to implement -Wmismatched-tags so compiling
>>> a few tests under Valgrind should show if it does leak.)
>>
>> ... it does leak memory at present.  :-| (See attached commit log for
>> details for one example.)

(Attached "Fix 'hash_table::expand' to destruct stale Value objects"
again.)

>> To that effect, to document the current behavior, I propose to
>> "Add more self-tests for 'hash_map' with Value type with non-trivial
>> constructor/destructor"

(We've done that in commit e4f16e9f357a38ec702fb69a0ffab9d292a6af9b
"Add more self-tests for 'hash_map' with Value type with non-trivial
constructor/destructor", quickly followed by bug fix
commit bb04a03c6f9bacc890118b9e12b657503093c2f8
"Make 'gcc/hash-map-tests.c:test_map_of_type_with_ctor_and_dtor_expand'
work on 32-bit architectures [PR101959]".

>> (Also cherry-pick into release branches, eventually?)

 Then:

   record_field_map_t field_map ([...]); // see below
   for ([...])
 {
   tree record_type = [...];
   [...]
   bool existed;
   field_map_t &fields
 = field_map.get_or_insert (record_type, &existed);
   gcc_checking_assert (!existed);
   [...]
   for ([...])
 fields.put ([...], [...]);
   [...]
 }
   [stuff that looks up elements from 'field_map']
   field_map.empty ();

 This generally works.

 If I instantiate 'record_field_map_t field_map (40);', Valgrind is happy.
 If however I instantiate 'record_field_map_t field_map (13);' (where '13'
 would be the default for 'hash_map'), Valgrind complains:

   2,080 bytes in 10 blocks are definitely lost in loss record 828 of 
 876
  at 0x483DD99: calloc (vg_replace_malloc.c:762)
  by 0x175F010: xcalloc (xmalloc.c:162)
  by 0xAF4A2C: hash_table>>> simple_hashmap_traits, tree_node*> 
 >::hash_entry, false, xcallocator>::hash_table(unsigned long, bool, bool, 
 bool, mem_alloc_origin) (hash-table.h:275)
  by 0x15E0120: hash_map>>> simple_hashmap_traits, tree_node*> 
 >::hash_map(unsigned long, bool, bool, bool) (hash-map.h:143)
  by 0x15DEE87: hash_map>>> tree_node*, simple_hashmap_traits, 
 tree_node*> >, simple_hashmap_traits, 
 hash_map>>> simple_hashmap_traits, tree_node*> > > 
 >::get_or_insert(tree_node* const&, bool*) (hash-map.h:205)
  by 0x15DD52C: execute_omp_oacc_neuter_broadcast() 
 (omp-oacc-neuter-broadcast.cc:1371)
  [...]

 (That's with '#pragma GCC optimize "O0"' at the top of the 'gcc/*.cc'
 file.)

 My suspicion was that it is due to the 'field_map' getting resized as it
 incrementally grows (and '40' being big enough for that to never happen),
 and somehow the non-POD (?) value objects not being properly handled
 during that.  Working my way a bit through 'gcc/hash-map.*' and
 'gcc/hash-table.*' (but not claiming that I understand all that, off
 hand), it seems as if my theory is right: I'm

Enable the vectorizer at -O2 for GCC 12

2021-08-30 Thread Florian Weimer via Gcc
There has been a discussion, both off-list and on the gcc-help mailing
list (“Why vectorization didn't turn on by -O2”, spread across several
months), about enabling the auto-vectorizer at -O2, similar to what
Clang does.

I think the review concluded that the very cheap cost model should be
used for that.

Are there any remaining blockers?

Thanks,
Florian



Re: Enable the vectorizer at -O2 for GCC 12

2021-08-30 Thread Bill Schmidt via Gcc

On 8/30/21 8:04 AM, Florian Weimer wrote:

There has been a discussion, both off-list and on the gcc-help mailing
list (“Why vectorization didn't turn on by -O2”, spread across several
months), about enabling the auto-vectorizer at -O2, similar to what
Clang does.

I think the review concluded that the very cheap cost model should be
used for that.

Are there any remaining blockers?


Hi Florian,

I don't think I'd characterize it as having blockers, but we are 
continuing to investigate small performance issues that arise with 
very-cheap, including some things that regressed in GCC 12.  Kewen Lin 
is leading that effort.  Kewen, do you feel we have any major remaining 
concerns with this plan?


Thanks,
Bill



Thanks,
Florian



Re: Enable the vectorizer at -O2 for GCC 12

2021-08-30 Thread Kewen.Lin via Gcc
on 2021/8/30 下午10:11, Bill Schmidt wrote:
> On 8/30/21 8:04 AM, Florian Weimer wrote:
>> There has been a discussion, both off-list and on the gcc-help mailing
>> list (“Why vectorization didn't turn on by -O2”, spread across several
>> months), about enabling the auto-vectorizer at -O2, similar to what
>> Clang does.
>>
>> I think the review concluded that the very cheap cost model should be
>> used for that.
>>
>> Are there any remaining blockers?
> 
> Hi Florian,
> 
> I don't think I'd characterize it as having blockers, but we are continuing 
> to investigate small performance issues that arise with very-cheap, including 
> some things that regressed in GCC 12.  Kewen Lin is leading that effort.  
> Kewen, do you feel we have any major remaining concerns with this plan?
> 

Hi Florian & Bill,

There are some small performance issues like PR101944 and PR102054, and
still two degraded bmks (P9 520.omnetpp_r -2.41% and P8 526.blender_r
-1.31%) to be investigated/clarified, but since their performance numbers
with separated loop and slp vectorization options look neutral, they are
very likely noises.  IMHO I don't think they are/will be blockers.  

So I think it's good to turn this on by default for Power.

BR,
Kewen


Re: Enable the vectorizer at -O2 for GCC 12

2021-08-30 Thread Hongtao Liu via Gcc
On Tue, Aug 31, 2021 at 11:11 AM Kewen.Lin via Gcc  wrote:
>
> on 2021/8/30 下午10:11, Bill Schmidt wrote:
> > On 8/30/21 8:04 AM, Florian Weimer wrote:
> >> There has been a discussion, both off-list and on the gcc-help mailing
> >> list (“Why vectorization didn't turn on by -O2”, spread across several
> >> months), about enabling the auto-vectorizer at -O2, similar to what
> >> Clang does.
> >>
> >> I think the review concluded that the very cheap cost model should be
> >> used for that.
> >>
> >> Are there any remaining blockers?
> >
> > Hi Florian,
> >
> > I don't think I'd characterize it as having blockers, but we are continuing 
> > to investigate small performance issues that arise with very-cheap, 
> > including some things that regressed in GCC 12.  Kewen Lin is leading that 
> > effort.  Kewen, do you feel we have any major remaining concerns with this 
> > plan?
> >
>
> Hi Florian & Bill,
>
> There are some small performance issues like PR101944 and PR102054, and
> still two degraded bmks (P9 520.omnetpp_r -2.41% and P8 526.blender_r
> -1.31%) to be investigated/clarified, but since their performance numbers
> with separated loop and slp vectorization options look neutral, they are
> very likely noises.  IMHO I don't think they are/will be blockers.
>
> So I think it's good to turn this on by default for Power.
The intel side is also willing to enable O2 vectorization after
measuring performance impact for SPEC2017 and eembc.
Meanwhile we are investigating PR101908/PR101909/PR101910/PR92740
which are reported O2 vectorization regresses extra benchmarks on
znver and kabylake.
>
> BR,
> Kewen



-- 
BR,
Hongtao


Re: Enable the vectorizer at -O2 for GCC 12

2021-08-30 Thread Jeff Law




On 8/30/2021 9:30 PM, Hongtao Liu via Gcc wrote:

On Tue, Aug 31, 2021 at 11:11 AM Kewen.Lin via Gcc  wrote:

on 2021/8/30 下午10:11, Bill Schmidt wrote:

On 8/30/21 8:04 AM, Florian Weimer wrote:

There has been a discussion, both off-list and on the gcc-help mailing
list (“Why vectorization didn't turn on by -O2”, spread across several
months), about enabling the auto-vectorizer at -O2, similar to what
Clang does.

I think the review concluded that the very cheap cost model should be
used for that.

Are there any remaining blockers?

Hi Florian,

I don't think I'd characterize it as having blockers, but we are continuing to 
investigate small performance issues that arise with very-cheap, including some 
things that regressed in GCC 12.  Kewen Lin is leading that effort.  Kewen, do 
you feel we have any major remaining concerns with this plan?


Hi Florian & Bill,

There are some small performance issues like PR101944 and PR102054, and
still two degraded bmks (P9 520.omnetpp_r -2.41% and P8 526.blender_r
-1.31%) to be investigated/clarified, but since their performance numbers
with separated loop and slp vectorization options look neutral, they are
very likely noises.  IMHO I don't think they are/will be blockers.

So I think it's good to turn this on by default for Power.

The intel side is also willing to enable O2 vectorization after
measuring performance impact for SPEC2017 and eembc.
Meanwhile we are investigating PR101908/PR101909/PR101910/PR92740
which are reported O2 vectorization regresses extra benchmarks on
znver and kabylake.
We'd like to see it on for our processor as well.  Though I don't have 
numbers I can share at this time.


jeff