On Tue, Nov 19, 2024 at 9:59 AM Richard Biener
<richard.guent...@gmail.com> wrote:
>
> On Sun, Nov 17, 2024 at 4:28 AM Lewis Hyatt <lhy...@gmail.com> wrote:
> >
> > Currently, when we allocate a gphi object, we round up the capacity for the
> > trailing arguments array such that it will make full use of the page size
> > that ggc will allocate. While there is also an explicit minimum of 2
> > arguments, in practice after rounding to the ggc page size there is always
> > room for at least 4.
> >
> > It seems we have some code that has come to depend on there being this much
> > room before reallocation of a PHI is required. For example, the function
> > loop_version () used during loop optimization will make sure there is room
> > for an additional edge on each PHI that it processes. But there are call
> > sites which cache a PHI pointer prior to calling loop_version () and assume
> > it remains valid afterward, thus implicitly assuming that the PHI will have
> > spare capacity. Examples include split_loop () and gen_parallel_loop ().
> >
> > This works fine now, but if the size of a gphi becomes larger, e.g. due to
> > configuring location_t to be a 64-bit type, then on 32-bit platforms it ends
> > up being possible to get a gphi with only 2 arguments of capacity, causing
> > the above call sites of loop_version () to fail. (They store a pointer to a
> > gphi object that no longer has the same meaning it did before it got
> > reallocated.) The testcases gcc.dg/torture/pr113707-2.c and
> > gcc.dg/graphite/pr81945.c exhibit that failure mode.
> >
> > It may be necessary to adjust those call sites to make this more robust, but
> > in the meantime, changing the minimum from 2 to 4 does no harm given the
> > minimum is practically 4 anyway, and it resolves the issue for 32-bit
> > platforms.
>
> We need to fix the users.  Note ideal_phi_node_len rounds up to a power of two
> but extra_order_size_table also has MAX_ALIGNMENT * n with n from 1 to 16
> buckets, so such extensive rounding up is not needed.
>
> The cache is also quite useless this way (I didn't fix this when last working
> there).
>

Adjusting the call sites definitely sounds right, but I worry it's
potentially a big change?

So one of the call sites that caused problems here was around line 620
in tree-ssa-loop-split.cc:

 /* Find a loop PHI node that defines guard_iv directly,
           or create one doing that.  */
        gphi *phi = find_or_create_guard_phi (loop1, guard_iv, &iv);
        if (!phi)

It remembers "phi" and reuses it later when it might have been
invalidated. That one is easy to fix, find_or_create_guard_phi() can
just be called again later.

But the other one I ran into with testing was in tree-parloops.cc:

/* Element of the hashtable, representing a
   reduction in the current loop.  */
struct reduction_info
{
  gimple *reduc_stmt;           /* reduction statement.  */
  gimple *reduc_phi;            /* The phi node defining the reduction.  */
  enum tree_code reduction_code;/* code for the reduction operation.  */
  unsigned reduc_version;       /* SSA_NAME_VERSION of original reduc_phi
                                   result.  */
  gphi *keep_res;               /* The PHI_RESULT of this phi is the
resulting value
                                   of the reduction variable when
existing the loop. */
  tree initial_value;           /* The initial value of the reduction
var before entering the loop.  */
  tree field;                   /*  the name of the field in the
parloop data structure intended for reduction.  */
  tree reduc_addr;              /* The address of the reduction variable for
                                   openacc reductions.  */
  tree init;                    /* reduction initialization value.  */
  gphi *new_phi;                /* (helper field) Newly created phi
node whose result
                                   will be passed to the atomic
operation.  Represents
                                   the local result each thread
computed for the reduction
                                   operation.  */
};

It keeps a hash map of these throughout the pass and the whole design
is predicated on those pointers always remaining valid, so I think
this would need extensive redesign? I did not look into all the
details there though.

Once I saw that, I tried the approach shown here of "just" changing
the minimum number to 4, given that on most platforms there are always
4 anyway, and that fixed both of these on sparc as well as other
issues that I hadn't looked into yet, so there are going to be more as
well and I'm not sure how pervasive it is.

Does "fix the users" mean, change all call sites to understand that a
phi pointer could be invalidated and handle it, or should it mean,
change all call sites to arrange that none of the phi pointers they
use will be invalidated? I am not very familiar with these internals
yet, although I am happy to look into it. Do you think it's a
prerequisite for the location_t change or it could be a followup
later, given it doesn't have any effect on 64-bit platforms anyway?
Thanks...

-Lewis

Reply via email to