On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote:
> Hello, > when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to > __divdi3 is always output, even though it seems the use of the idiv > instruction could be faster. > This seems to remain even under -Ofast and other available options. > > To illustrate, this godbolt link: https://godbolt.org/z/hq4GKb > With code > > #include <stdint.h> > int32_t d(int64_t a, int32_t b) { > return a / b; > } > > Compiles to > > d(long long, int): > sub esp, 12 > mov eax, DWORD PTR [esp+24] > cdq > push edx > push eax > push DWORD PTR [esp+28] > push DWORD PTR [esp+28] > call __divdi3 > add esp, 28 > ret > > Why is this? C evaluation rules for this are such that first 'b' is extended to int64_t, the division is done in int64_t, and its result is truncated to int32_t in an implementation-defined manner. Thus, it must always produce a value, except if (b == 0 || b == -1 && a == INT64_MIN), in which case division causes undefined behavior. The x86 'idiv' instruction, however, will raise a divide error if the result does not fit in a register, so e.g. dividing INT64_MAX by 1 would trap. Alexander