On 11/10/20 10:21 AM, Stefan Kanthak wrote:

>> So with all that in mind, I installed everything except the bits which
>> have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses. 
>> If you could remove the ifdefs on the abs/mult changes and resubmit them
>> it'd be appreciated.
> Done.
THanks.  I'm doing some testing on the abs changes right now.  They look
pretty reasonable, though they will tend to generate worse code on
targets that don't handle overflow arithmetic and testing all that well.

H8 would be an example that generates more efficient code with the
original implementation.  But I don't think the H8 is terribly
representative of embedded ports these days, though it's likely to get
better at precisely these kinds of scenarios in the near future as
certain modernization patches continue landing.

Something like visium is much better for evaluation as it's got a modern
structure which includes reasonable exposure of carry/overflow.  Not
surprisingly the visium port generates better code with your improved
abs routines.





> libgcc2.patch
>
> --- -/libgcc/libgcc2.c
> +++ +/libgcc/libgcc2.c
>  
>  #ifdef L_mulvdi3
>  DWtype
> -__mulvDI3 (DWtype u, DWtype v)
> +__mulvDI3 (DWtype a, DWtype b)
[ ... ]
You've essentially collapsed the function into:

 DWtype
__mulvDI3 (DWtype a, DWtype b)
{
  DWtype t;
  const DWunion u = {.ll = a};
  const DWunion v = {.ll = b};
  DWunion w = {.ll = __umulsidi3 (u.s.low, v.s.low)};

  if (__builtin_mul_overflow (u.s.low, v.s.high, &t)
   || __builtin_add_overflow (t, w.s.high, &w.s.high)
   || __builtin_mul_overflow (u.s.high, v.s.low, &t)
   || __builtin_add_overflow (t, w.s.high, &w.s.high))
    abort ();

  return w.ll
}

The first thing to note is that I believe that using "DWtype" for "t" is
going to inhibit the overflow check for the __builtin_mul_overflow calls
that store their result into "t".  Your inputs are both 32bits and the
output is 64 bits.   Multiplying 2 32bit numbers will not overflow a
64bit result.  As a result those overflow checks are actually removed in
the code we generate for mulvDI3, defeating the entire purpose of using
mulvdi3 instead of muldi3.

Second while the result of multiplying u.s.high with v.s.high can never
change the result in w.ll, it does affect the overall overflow status
that we need to compute.

I think (but have not yet verified) that to fix these problems we need
to change the type of "t" to SItype and add a check like:

  if (u.s.high && v.s.high)
    abort ();

Also note that your approach always does 3 multiplies, which can be very
expensive on some architectures.  The existing version in libgcc2.c will
often just do one or two multiplies.  So while your implementation looks
a lot simpler, I suspect its often much slower.  And on targets without
32bit multiplication support, it's probably horribly bad.

My inclination is to leave the overflow checking double-word multiplier
as-is.  Though I guess you could keep the same structure as the existing
implementation which tries to avoid unnecessary multiplies and still use
the __builtin_{add,mul}_overflow to simplify the code a bit less
aggressively.




Jeff

Reply via email to