Jeff Law <l...@redhat.com> wrote:

> On 10/5/20 10:49 AM, Stefan Kanthak wrote:
>> The implementation of the functions __absv?i2(), __addv?i3() etc. for
>> trapping integer overflow provided in libgcc2.c is rather bad.
>> Same for __cmp?i2() and __ucmp?i2()
>>
>> At least for AMD64 and i386 processors GCC creates awful to horrible
>> code for them: see <https://skanthak.homepage.t-online.de/gcc.html>
>> for some examples as well as the expected assembly.
>>
>> The attached diff/patch provides better implementations.
> 
> So you didn't indicate how this was tested.

I inspected the assembly generated for i386 and AMD64 and found it to be
correct. I also ran both the current and the patched functions in parallel
and compared their results, without detecting any deviation (except for
the greater speed of the patched functions).

> The addition, subtraction, negation and mulvsi3 are reasonable. The
> mulvsi3 change introduced a failure on m32r, but I'm highly confident
> it's an pre-existing simulator bug.

I only have AMD64 processors, so my tests are limited to these platforms.

> But the abs changes and the other other multiply changes have undesired
> LIBGCC2_BAD_CODE ifdefs that you need to clean up before we could accept
> those changes.

I included these alternative #ifdef'ed implementations only to demonstrate
that GCC generates bad^Wnot the best code there, especially for __mulvDI3(),
which I implemented using __umulsidi3() plus 2 __builtin_mul_overflow() and
2 __builtin_add_overflow() for Wtype instead of just a single
__builtin_mul_overflow() for DWtype. 

> What I don't understand is why the [u]cmpdi/cmpdi2 changes don't cause
> infinite recursion. I would expect the comparisons you're using to
> compute the return value to themselves trigger calls to [u]cmpdi2.

Why should the compiler generate calls to [u]cmpdi2() there?
It doesn't generate calls to the missing functions [u]adddi2() or
[u]subdi2() for addition and subtraction of DWtypes either!

JFTR: this also holds for the __ashlDI3(),  __ashrDI3() and __lshrDI3()
      functions, which should better be written as

__ashldi3 (DWtype u, shift_count_type b)
{
  return u << b;
}

__ashrdi3 (DWtype u, shift_count_type b)
{
  return u >> b;
}

__lshrdi3 (UDWtype u, shift_count_type b)
{
  return u >> b;
}

> I can speculate that the expanders are open-coding the comparison, but
> in that case I'd expect the libgcc2 routines to never be used, so
> optimizing them is pointless :-)

Both are documented, so users might call them directly.
Besides that, their source smells -- nobody with a sane mind would NOT
write (a > b) - (a < b) + 1, what every self-respecting compiler/optimiser
should translate into branch-free code.

> But I put them through my tester and couldn't see any evidence that
> they're causing problems though and reviewed the generated code on m32r
> since it was convenient to do so.
> 
> 
> Also note the ucmpdi changes require updating the prototype in libgcc2.h,
> otherwise you get build failures building gcc itself. I fixed that minor
> omission.

Argh, I missed that indeed.

> So with all that in mind, I installed everything except the bits which
> have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses. 
> If you could remove the ifdefs on the abs/mult changes and resubmit them
> it'd be appreciated.

Done.

regards
Stefan

Attachment: libgcc2.patch
Description: Binary data

Reply via email to