[Bug tree-optimization/49471] New: ICE when -ftree-parallelize-loops is enabled together with -m32 on power7

razya at il dot ibm.com Mon, 20 Jun 2011 01:57:23 -0700

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49471


           Summary: ICE when -ftree-parallelize-loops is enabled together
                    with -m32 on power7
           Product: gcc
           Version: 4.7.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
        AssignedTo: unassig...@gcc.gnu.org
        ReportedBy: ra...@il.ibm.com


When building cactusADM on igoo, I get an error when autopar is enabled with
-m32
(does not happen for 64!)

This is the command line which is executed:

> /home/razya/gcc-bin/bin/gcc -c -o PUGHReduce/ReductionNormInf.o -DSPEC_CPU 
> -DNDEBUG  -Iinclude -I../include -DCCODE -m32  -ffast-math -O3  
> -fno-tree-vectorize -fno-vect-cost-model -ftree-parallelize-loops=6         
> PUGHReduce/ReductionNormInf.c


The source of the failure is at the code generated to expand omp_for pragma (in
tree level).
The expansion of pragma omp_for generates a prolog code which calculates the
particular interval of (the loop's) iterations which the  thread (executing
this code) should execute.
This prolog calculation involves some arithmetic operations, and in particular
MULT and DIV statements which  cause the failures.

so, for example, this is the tree level code generated for the prolog, and the
div and mult operations highlighted in red:


  D.7313_5 = MEM[(struct  *).paral_data_param_1(D)].D.7288; /*  Number of loop
iterations.  */
   D.7316_8 = __builtin_omp_get_num_threads ();
  D.7317_9 = (<unnamed-unsigned:128>) D.7316_8;
  D.7318_10 = __builtin_omp_get_thread_num ();
  D.7319_11 = (<unnamed-unsigned:128>) D.7318_10;
  D.7320_12 = D.7313_5 / D.7317_9;
  D.7321_13 = D.7320_12 * D.7317_9;
  D.7322_14 = D.7321_13 != D.7313_5;
  D.7323_15 = D.7322_14 + D.7320_12;
  ivtmp.575_16 = D.7323_15 * D.7319_11;
  D.7325_17 = ivtmp.575_16 + D.7323_15;
  D.7326_18 = MIN_EXPR <D.7325_17, D.7313_5>;
  if (ivtmp.575_16 >= D.7326_18)
    goto <bb 3>;
  else
    goto <bb 4>;



 when the div expr is  being expanded to RTL code, we fail in executing 
the following expand_binop:
expmed.c:

            quotient = sign_expand_binop (compute_mode,
                                              udiv_optab, sdiv_optab,
                                              op0, op1, target,
                                              unsignedp, OPTAB_LIB_WIDEN);

The call to this function returns NULL (where it shouldn't s far as I
understand).

When I tried removing the div instruction, the mult expr caused an assert
failure 
 in expand_mult() because the following call returned NULL.:

 expmed.c:

  /* This used to use umul_optab if unsigned, but for non-widening multiply
     there is no difference between signed and unsigned.  */
  op0 = expand_binop (mode,
                      ! unsignedp
                      && flag_trapv && (GET_MODE_CLASS(mode) == MODE_INT)
                      ? smulv_optab : smul_optab,
                      op0, op1, target, unsignedp, OPTAB_LIB_WIDEN);
  gcc_assert (op0);

-------------------------------------------------------------------------------

I found that the variables being divided/multiplied are of 128 bit types.
They are created when canonicalize_loop_ivs is called:

tree
canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
{
  unsigned precision = TYPE_PRECISION (TREE_TYPE (*nit));   //precision of
number of iterations

  for (psi = gsi_start_phis (loop->header);         
       !gsi_end_p (psi); gsi_next (&psi))
    {
      gimple phi = gsi_stmt (psi);
      tree res = PHI_RESULT (phi);

      if (is_gimple_reg (res) && TYPE_PRECISION (TREE_TYPE (res)) > precision)
        precision = TYPE_PRECISION (TREE_TYPE (res));
    }

  type = lang_hooks.types.type_for_size (precision, 1);    // here precision is
128 

....
}


Note that this is also the case when -m64 is enabled.
The difference is that the type created by lang_hooks for the -m32 case is
<unnamed-unsigned:128>
and for -m64 it is __int128 unsigned, whose arithmetic operations apparently
are handled correctly by the compiler.

[Bug tree-optimization/49471] New: ICE when -ftree-parallelize-loops is enabled together with -m32 on power7

Reply via email to