https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97437

--- Comment #4 from fdlbxtqi <euloanty at live dot com> ---
(In reply to Jakub Jelinek from comment #1)
> I don't see anything undesirable on that.  The 0 aka %rax is used in 7
> different instructions later on besides the move, so either we just clear
> %ecx (can't use xorl for that as the flags register needs to be live), or
> clear %eax and copy to %ecx (3 bytes more), but then gain 7 bytes back
> because we don't really use an immediate form.

Hello Jakub.

std::uint64_t a{};//must assign as 0 or it is UB.
sub_borrow(borrow,a,a,a);

sbbq %any_new_rigster,%any_new_rigster

means
std::uint64_t a;
if(borrow)
    a=std::numeric_limits<std::uint64_t>::max();
else
    a=0;

but faster, use less instructions and without any branch (to prevent side
channel attack)

And all those 

sub_borrow(borrow,f[2],zero,f[3]);

So, you do not need to use %rax register for the calculations at all. %rax
would just be used for RVO.

clang does not do the right thing either if I do not write

constexpr unsigned_type zero{};

to tell compiler it is zero, it does not know that either.

I think because _addcarry_, _subborrow_ was not frequently used in GCC, it
lacks many optimizations points here. That is probably why GMP, OpenSSL etc,
they frequently rewrote assemblies for doing the stuff since the compiler does
not guarantee the instructions that got generated as optimal. I know this is
probably the cost of any abstraction, particularly here I combine the usage of
OOP, concepts, templates, if constexpr, constant_evaluated(), RVO and intrinsic
together. Probably interfere with the compiler optimizations. it does not
guarantee to generate the assembly that is optimal compared to assembly.

Reply via email to