On Mon, Nov 25, 2013 at 2:08 PM, Richard Sandiford
<rsand...@linux.vnet.ibm.com> wrote:
> Richard Biener <richard.guent...@gmail.com> writes:
>> @@ -958,6 +961,12 @@ streamer_write_tree_header (struct output_block
>> *ob, tree expr)
>>      streamer_write_uhwi (ob, BINFO_N_BASE_BINFOS (expr));
>>    else if (TREE_CODE (expr) == CALL_EXPR)
>>      streamer_write_uhwi (ob, call_expr_nargs (expr));
>> +  else if (CODE_CONTAINS_STRUCT (code, TS_INT_CST))
>> +    {
>> +      gcc_checking_assert (TREE_INT_CST_NUNITS (expr));
>> +      streamer_write_uhwi (ob, TREE_INT_CST_NUNITS (expr));
>> +      streamer_write_uhwi (ob, TREE_INT_CST_EXT_NUNITS (expr));
>> +    }
>>
>> isn't ext_nunits always either nunits or nunits + 1?  So it should be
>> sufficient to write a single bit in the tree_base bitpack and only
>> write the nunits that are required for the actual allocation in
>> write_tree_header, not both.
>
> ext_nunits can be > nunits + 1.  E.g. imagine a 256-bit all-1s unsigned
> integer.  As a 256-bit number it's simply { -1 }, with all other bits
> being sign-extensions of the -1 HWI.  As a >256-bit number it's
> { -1, -1, -1, -1, 0 }, assuming 64-bit HWIs.

Ah, ok.

>> index 4a24c66..f3e0ffe 100644
>> --- a/gcc/tree-vrp.c
>> +++ b/gcc/tree-vrp.c
>> @@ -1141,15 +1142,7 @@ operand_less_p (tree val, tree val2)
>>  {
>>    /* LT is folded faster than GE and others.  Inline the common case.  */
>>    if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
>> -    {
>> -      if (TYPE_UNSIGNED (TREE_TYPE (val)))
>> -       return INT_CST_LT_UNSIGNED (val, val2);
>> -      else
>> -       {
>> -         if (INT_CST_LT (val, val2))
>> -           return 1;
>> -       }
>> -    }
>> +    return INT_CST_LT (val, val2);
>>    else
>>      {
>>        tree tcmp;
>>
>> Note that INT_CST_LT and INT_CST_LT_UNSIGNED were supposed to
>> be very fast.  Now you made them
>>
>> #define INT_CST_LT(A, B) (wi::lts_p (wi::to_widest (A), wi::to_widest (B)))
>>
>> which a) doesn't look the same to me and b) hides complexity.
>
> Not sure what complexity you mean here: to_widest just reads the
> constant in its "infinite-precision" form.  I'm not sure:
>
>> So why not write
>>
>>    if (TREE_CODE (val) == INTEGER_CST && TREE_CODE (val2) == INTEGER_CST)
>>      if (TYPE_UNSIGNED (...)
>>      return wi::ltu_p (val, va2);
>>      else
>>       return wi::lts_p (val, val2);
>>
>> ?
>
> ...this would be faster in practice, since the fast path in lts_p is
> length-based and can handle small integers quickly whatever their precision.
>
> The second form means that VAL1 and VAL2 must have the same precision.
> I don't know whether that's a good thing or not in this instance.

Ok, I take your word for granted.  Still with INT_CST_LT now being
exactly the same as tree_int_cst_lt I'd prefer to eliminate the former
in favor of the latter.

Btw, trying to look at code generation via -fdump-tree-all blows up
memory - you have a leak somewhere I guess :/  I'm trying to compile
tree.ii with -O2 -fdump-tree-all.

>> @@ -6966,12 +6975,7 @@ tree_int_cst_lt (const_tree t1, const_tree t2)
>>  int
>>  tree_int_cst_compare (const_tree t1, const_tree t2)
>>  {
>> -  if (tree_int_cst_lt (t1, t2))
>> -    return -1;
>> -  else if (tree_int_cst_lt (t2, t1))
>> -    return 1;
>> -  else
>> -    return 0;
>> +  return wi::cmps (wi::to_widest (t1), wi::to_widest (t2));
>>  }
>>
>>  /* Return true if T is an INTEGER_CST whose numerical value (extended
>>
>> How does this work for comparing UHWI_MAX to 0 for example?  Looking
>> at
>>
>> template <typename T1, typename T2>
>> inline int
>> wi::cmps (const T1 &x, const T2 &y)
>> {
>>   unsigned int precision = get_binary_precision (x, y);
>>   WIDE_INT_REF_FOR (T1) xi (x, precision);
>>   WIDE_INT_REF_FOR (T2) yi (y, precision);
>>   if (precision <= HOST_BITS_PER_WIDE_INT)
>>     {
>>       HOST_WIDE_INT xl = xi.to_shwi ();
>>       HOST_WIDE_INT yl = yi.to_shwi ();
>>       if (xl < yl)
>>         return -1;
>>
>> this seems to happily return -1 instead of 1?  Similar issues elsewhere
>> where you change compares to unconditonally perform signed compares.
>> (it's at least non-obvious to me).
>>
>> Ah, the to_widest () will make it XImode * 4 extended and thus
>> get_precision returns 2048 (or so) so we don't take the shortcut
>> (which means it's slow).
>>
>> Shouldn't the shortcuts be taken on 'len' == 1 rather than
>> precision <= HWI?
>
> I suppose we could use the same approach as for lts_p, if that seems
> OK to you.

Yeah, that looks good to me.

>> Side-note: we have too many ways to compare trees / integers
>
> Do you mean in wide-int, or in the tree routines?  At the wide-int
> level there are only really two ways: compare trees as sequences of bits
> (ignoring their sign) and compare trees as "infinite-precision" numbers.

In the tree routines.

Richard.

> Thanks,
> Richard
>

Reply via email to