Hi,
I've been looking at GCC's use of sign-extensions when dealing with integers smaller than a machine word size. It looks like there is room for improvement.
Consider this C function:
short g(short x) { short i; for (i = 0; i < 10; i++) { x += i; } return x; }
On x86, using a GCC 4.0.0 20050130, with -O2 I get this code:
g: pushl %ebp xorl %edx, %edx movl %esp, %ebp movswl 8(%ebp),%ecx .p2align 4,,15 .L2: leal (%ecx,%edx), %eax movswl %ax,%ecx # 1 leal 1(%edx), %eax movzwl %ax, %eax # 2 cmpw $10, %ax movswl %ax,%edx # 3 jne .L2
popl %ebp movl %ecx, %eax ret .size g, .-g .p2align 4,,15
The three extensions (#1, #2, #3) here are unnecessarily conservative. This would be better:
g: pushl %ebp xorl %edx, %edx movl %esp, %ebp movswl 8(%ebp),%ecx .p2align 4,,15 .L2: leal (%ecx,%edx), %ecx # x += i leal 1(%edx), %edx # i++ cmpw $10, %dx # i < 10 ? jne .L2
popl %ebp movswl %cx, %eax ret
GCC's approach seems to be *eager*, in that sign-extensions are done immediately after sub-word operations. This ensures that the high bits of a register holding a sub-word value are always valid.
An alternative is to allow the high bits of registers holding sub-word values to be "junk", and do sign-extensions *lazily*, only before operations in which any "junk" high bits could adversely affect the result. For example, if you do a right shift on a value with "junk" high bits you have to sign/zero-extend it first, because high bits in the operands can affect low bits in the result. The same is true of division.
In contrast, an addition of two 16-bit values with "junk" high bits is ok if the result is also a 16-bit value. The same is true of subtraction, multiplication and logical ops. The reason is that for these operations, the low 16 bits of the result do not depend on the high 16 bits of the operands.
Although you can construct examples where the eager approach gives better code, in general I think the lazy approach results in better code, such as in the above example. Is there a particular reason why GCC uses the eager approach? Maybe it has to do with the form of GCC's intermediate representation? Or are there are some subtleties to do with the lazy approach that I have overlooked? Or maybe I've misunderstood GCC's approach.
Any comments are appreciated. Thanks for your help.
Nick