On Sat, Sep 01, 2012 at 12:48:50AM +0200, Jilles Tjoelker wrote:
> On Tue, Aug 28, 2012 at 02:03:22PM +0300, Konstantin Belousov wrote:
> > On Sat, Aug 25, 2012 at 12:16:55AM +0200, Jilles Tjoelker wrote:
> > > Not exporting .cerror causes it to be jumped to directly instead of via
> > > the PLT.
> 
> > > The below patch is for i386 only and also takes advantage of .cerror's
> > > new status by not saving and loading %ebx before jumping to it.
> > > (Therefore, .cerror now saves and loads %ebx itself.) Where there was a
> > > conditional jump to a jump to .cerror, the conditional jump has been
> > > changed to jump to .cerror directly (many modern CPUs don't do static
> > > prediction and in any case it is not much of a benefit anyway).
> 
> > Why do you need to save/restore the %ebx at all ? %ebx ==
> > &__GLOBAL_OFFSET_TABLE__ is only needed when you access GOT, but .cerror
> > only works with PLT, which is addressed using the instruction capable of
> > relative addressing. The old .cerror does not need it as well, but it is
> > just engraved in the function ABI.
> 
> On i386, a shared object's PLT entry needs %ebx set up to work properly.
> This is because such a PLT entry needs to access the GOT to find the
> address to jump to (the first instruction is jmp *d32(%ebx)).
> 
> An executable's PLT entry accesses the GOT via absolute addressing and
> therefore does not need %ebx.
Doh, right.

Still, this manipulations can be removed, we just need to resolve
__error in some libc ctr. It is not very important after your patch,
because ABI is not much more regular, but I think removing additional
stack operations is still beneficial.

> 
> > > The patch decreases the size of libc.so.7 by a few kilobytes.
> 
> > > Similar changes could be made to other architectures, and there may be
> > > more symbols that are exported but need not be.
> > Sure, would you handle at least amd64 too ?
> 
> The below patch handles amd64.
> 
> I'm a bit annoyed that most of the syscall stubs are 17 bytes long now
> and have the maximum 15 bytes of padding. This means that the patch
> provides virtually no gain in code size.
Stubs can be converted to do only load of the syscall number into %rax
and unconditional jump to common code which would perform kernel call
and do post-syscall bookkeeping to update errno.

Otherwise, looks good.

Attachment: pgpYhjmDsmTnD.pgp
Description: PGP signature

Reply via email to