On Fri, Mar 07, 2008 at 01:05:03PM +0100, Philipp Marek wrote: > When wouldn't that possible? My script currently splits on an > instruction-level -- although I would see no problem that some branch > jumps into a "half" opcode of another branch, if the byte sequence > matches.
Consider: 00000000 <bar>: 0: b8 a4 00 00 00 mov $0xa4,%eax 5: ba fc 04 00 00 mov $0x4fc,%edx a: f7 e2 mul %edx c: 05 d2 04 00 00 add $0x4d2,%eax 11: c3 ret ... 00020012 <foo>: 20012: 39 d2 cmp %edx,%edx 20014: 75 07 jne 2001d <foo+0xb> 20016: ba fc 04 00 00 mov $0x4fc,%edx 2001b: f7 e2 mul %edx 2001d: 05 d2 04 00 00 add $0x4d2,%eax 20022: c3 ret If you merge the mov/mul/add/ret sequences by replacing the foo tail sequence with jmp bar+5, then the jne will branch to wrong place, or if you try to adjust it, it is too far to reach the target. > > but even jmp argument is relative, not absolute. > That's why I take only jumps with 32bit arguments - these are absolute. No, they are relative. 00000000 <bar>: 0: e9 05 00 02 00 jmp 2000a <foo> 5: e9 00 00 02 00 jmp 2000a <foo> ... 0002000a <foo>: 2000a: 90 nop See how they are encoded. > Yes ... I think doing a second compile pass might be the easiest way, and > not much slower than other solutions. > (We could always remember which object files could be optimized, and > only recompile *those*. After all, it's just an additional > optimization.) BTW, have you tried to compile the whole kernel with --combine, or at least e.g. each kernel directory with --combine? I guess that will give you bigger savings than 30K. Also, stop defining inline to inline __attribute__((always_inline)), I think Ingo also added such patch recently and it saved 120K. Jakub