Jeff Law <l...@redhat.com> wrote: > On 10/5/20 10:49 AM, Stefan Kanthak wrote: >> The implementation of the functions __absv?i2(), __addv?i3() etc. for >> trapping integer overflow provided in libgcc2.c is rather bad. >> Same for __cmp?i2() and __ucmp?i2() >> >> At least for AMD64 and i386 processors GCC creates awful to horrible >> code for them: see <https://skanthak.homepage.t-online.de/gcc.html> >> for some examples as well as the expected assembly. >> >> The attached diff/patch provides better implementations. > > So you didn't indicate how this was tested.
I inspected the assembly generated for i386 and AMD64 and found it to be correct. I also ran both the current and the patched functions in parallel and compared their results, without detecting any deviation (except for the greater speed of the patched functions). > The addition, subtraction, negation and mulvsi3 are reasonable. The > mulvsi3 change introduced a failure on m32r, but I'm highly confident > it's an pre-existing simulator bug. I only have AMD64 processors, so my tests are limited to these platforms. > But the abs changes and the other other multiply changes have undesired > LIBGCC2_BAD_CODE ifdefs that you need to clean up before we could accept > those changes. I included these alternative #ifdef'ed implementations only to demonstrate that GCC generates bad^Wnot the best code there, especially for __mulvDI3(), which I implemented using __umulsidi3() plus 2 __builtin_mul_overflow() and 2 __builtin_add_overflow() for Wtype instead of just a single __builtin_mul_overflow() for DWtype. > What I don't understand is why the [u]cmpdi/cmpdi2 changes don't cause > infinite recursion. I would expect the comparisons you're using to > compute the return value to themselves trigger calls to [u]cmpdi2. Why should the compiler generate calls to [u]cmpdi2() there? It doesn't generate calls to the missing functions [u]adddi2() or [u]subdi2() for addition and subtraction of DWtypes either! JFTR: this also holds for the __ashlDI3(), __ashrDI3() and __lshrDI3() functions, which should better be written as __ashldi3 (DWtype u, shift_count_type b) { return u << b; } __ashrdi3 (DWtype u, shift_count_type b) { return u >> b; } __lshrdi3 (UDWtype u, shift_count_type b) { return u >> b; } > I can speculate that the expanders are open-coding the comparison, but > in that case I'd expect the libgcc2 routines to never be used, so > optimizing them is pointless :-) Both are documented, so users might call them directly. Besides that, their source smells -- nobody with a sane mind would NOT write (a > b) - (a < b) + 1, what every self-respecting compiler/optimiser should translate into branch-free code. > But I put them through my tester and couldn't see any evidence that > they're causing problems though and reviewed the generated code on m32r > since it was convenient to do so. > > > Also note the ucmpdi changes require updating the prototype in libgcc2.h, > otherwise you get build failures building gcc itself. I fixed that minor > omission. Argh, I missed that indeed. > So with all that in mind, I installed everything except the bits which > have the LIBGCC2_BAD_CODE ifdefs after testing on the various crosses. > If you could remove the ifdefs on the abs/mult changes and resubmit them > it'd be appreciated. Done. regards Stefan
libgcc2.patch
Description: Binary data