On Fri, Sep 16, 2016 at 9:05 AM, Thomas Schwinge
<tho...@codesourcery.com> wrote:
> Hi!
>
> (CCing Bernd and Jakub -- for your information, or: "amusement" -- as
> you've discussed offloading preload_common_nodes issues before...)
>
> Got to look into this some more, yesterday:
>
> On Thu, 08 Sep 2016 13:43:30 +0200, I wrote:
>> On Wed, 7 Sep 2016 14:23:18 +0200, Richard Biener 
>> <richard.guent...@gmail.com> wrote:
>> > On Wed, Sep 7, 2016 at 1:52 PM, Thomas Schwinge <tho...@codesourcery.com> 
>> > wrote:
>> > > On Fri, 19 Aug 2016 11:05:59 +0000, Joseph Myers 
>> > > <jos...@codesourcery.com> wrote:
>> > >> On Fri, 19 Aug 2016, Richard Biener wrote:
>> > >> > Can you quickly verify if LTO works with the new types?  I don't see 
>> > >> > anything
>> > >> > that would prevent it but having new global trees and backends 
>> > >> > initializing them
>> > >> > might come up with surprises (see 
>> > >> > tree-streamer.c:preload_common_nodes)
>> > >>
>> > >> Well, the execution tests are in gcc.dg/torture, which is run with 
>> > >> various
>> > >> options including -flto (and I've checked the testsuite logs to confirm
>> > >> these tests are indeed run with such options).  Is there something else
>> > >> you think should be tested?
>> > >
>> > > As I noted in <https://gcc.gnu.org/PR77458>:
>> > >
>> > >     As of the PR32187 commit r239625 "Implement C _FloatN, _FloatNx 
>> > > types", nvptx
>> > >     offloading is broken, ICEs in LTO stream-in.  Probably some kind of 
>> > > data-type
>> > >     mismatch that is not visible with Intel MIC offloading (using the 
>> > > same data
>> > >     types) but explodes with nvptx.  I'm having a look.
>
>> > [...] preload_common_nodes.  This is carefully crafted to _not_ diverge by
>> > frontend (!) it wasn't even designed to cope with global trees being 
>> > present
>> > or not dependent on target (well, because the target is always the
>> > same! mind you!)
>>
>> Scary.  ;-/
>>
>> > Now -- in theory it should deal with NULLs just fine (resulting in
>> > error_mark_node), but it can diverge when there are additional
>> > compount types (like vectors, complex
>> > or array or record types) whose element types are not in the set of
>> > global trees.
>> > The complex _FloatN types would be such a case given they appear before 
>> > their
>> > components.  That mixes up the ordering at least.
>>
>> ACK, but that's also an issue for "regular" float/complex float, which
>> also is in "reverse" order.  But that's "fixed" by the recursion in
>> gcc/tree-streamer.c:record_common_node for "TREE_CODE (node) ==
>> COMPLEX_TYPE".  This likewise seems to work for the _FloatN types.
>
> So, that mechanism does work, but what's going wrong is the following:
> with differing target vs. offload target, we potentially (and actually
> do, in the case of x86_64 vs. nvptx) have different sets of _FloatN and
> _FloatNx types.  That is, for nvptx, a few of these don't exist (so,
> NULL_TREE), and so it follows their complex variants also don't exist,
> and thus the recursion that I just mentioned for complex types is no
> longer done in lockstep in the x86_64 cc1 vs. the nvptx lto1, hence we
> get an offset (of two, in this specific case), and consequently streaming
> explodes, for example, as soon as it hits a forward-reference (due to
> looking for tree 185 (x86_64 cc1 view; as encoded in the stream) when it
> should be looking for tree 183 (nvptx lto1 view).
>
>> (I've
>> put "fixed" in quotes -- doesn't that recursion mean that we're thus
>> putting "complex float", "float", [...], "float" (again) into the cache?
>> Anyway that's for later...)
>
> Maybe it would make sense to do this tree streaming in two passes: first
> build a set of what we actually need, and then stream that, without
> duplicates.  (Or, is also for these "pickled" trees their order relevant,
> so that one tree may only refer to other earlier but not later ones?
> Anyway, we could still remember the set of trees already streamed, and
> avoid the double streaming I described?)
>
> So I now understand that due to the stream format, the integer tree IDs
> (cache->next_id) have to match in all cc1/lto1s (etc.), so we'll have to
> make that work for x86_64 target vs. nvptx offload target being
> different.  (I'm pondering some ideas about how to rework that integer
> tree ID generation.)
>
> (I have not digested completely yet what the implications are for the
> skipping we're doing for some trees in preload_common_nodes), but here is
> a first patch that at least gets us rid of the immediate problem.  (I'll
> fix the TODO by adding a "#define TI_COMPLEX_FLOATN_NX_TYPE_LAST" to
> gcc/tree-core.h, OK?)  Will such a patch be OK for trunk, at least for
> now?

Humm ... do we anywhere compare to those global trees by pointer equivalence?
If so then it breaks LTO support for those types.

I think forcing proper ordering so that we can assert that at the
point we'd like
to recurse the nodes we recurse for are already in the cache would be a better
fix.  This might need some additional global trees in case components do not
explicitely exist and it might need re-ordering of a few globals.

Can you try if all hell breaks lose if you change the recursions to instead do

   gcc_assert (streamer_tree_cache_lookup (cache, <node-to-recurse-on>, &tem));

?

Thanks,
Richard.

> commit d2045bd028104be2ede5ae1d5d7b9395e67a8180
> Author: Thomas Schwinge <tho...@codesourcery.com>
> Date:   Thu Sep 15 21:56:16 2016 +0200
>
>     [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx 
> types
>
>         gcc/
>         PR lto/77458
>         * tree-streamer.c (preload_common_nodes): Skip _FloatN and
>         _FloatNx types if offloading.
> ---
>  gcc/tree-streamer.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git gcc/tree-streamer.c gcc/tree-streamer.c
> index 7ea7096..061d831 100644
> --- gcc/tree-streamer.c
> +++ gcc/tree-streamer.c
> @@ -333,7 +333,12 @@ preload_common_nodes (struct streamer_tree_cache_d 
> *cache)
>         && (!lto_stream_offload_p
>             || (i != TI_VA_LIST_TYPE
>                 && i != TI_VA_LIST_GPR_COUNTER_FIELD
> -               && i != TI_VA_LIST_FPR_COUNTER_FIELD)))
> +               && i != TI_VA_LIST_FPR_COUNTER_FIELD
> +               /* Likewise for the _FloatN and _FloatNx types.  */
> +               && (i < TI_FLOATN_NX_TYPE_FIRST
> +                   || i > TI_FLOATN_NX_TYPE_LAST)
> +               && (i < TI_COMPLEX_FLOATN_NX_TYPE_FIRST
> +                   || i > /* TODO */ TI_COMPLEX_FLOAT128X_TYPE))))
>        record_common_node (cache, global_trees[i]);
>  }
>
>
>
> Grüße
>  Thomas

Reply via email to