Here is a proposal to use 32-byte PLT to preserve bound registers. Any comments?
BTW, we are working on another proposal to use a second PLT section with 8 byte or 16 byte memory overhead, instead of 24 byte overhead. -- H.J. --- Intel MPX: http://software.intel.com/sites/default/files/319433-015.pdf introduces 4 bound registers, which will be used for parameter passing in x86-64. Bound registers are cleared by branch instructions. Branch instructions with BND prefix will keep bound register contents. This leads to 2 requirements to 64-bit MPX run-time: 1. Dynamic linker (ld.so) should save and restore bound registers during symbol lookup. 2. Change the current 16-byte PLT0: ff 35 08 00 00 00 pushq GOT+8(%rip) ff 25 00 10 00 jmpq *GOT+16(%rip) 0f 1f 40 00 nopl 0x0(%rax) and 16-byte PLT1: ff 25 00 00 00 00 jmpq *name@GOTPCREL(%rip) 68 00 00 00 00 pushq $index e9 00 00 00 00 jmpq PLT0 which clear bound registers, to preserve bound registers. We use 2 new relocations: #define R_X86_64_PC32_BND 39 /* PC relative 32 bit signed with BND prefix */ #define R_X86_64_PLT32_BND 40 /* 32 bit PLT address with BND prefix */ to mark branch instructions with BND prefix. When linker sees any R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, it switches to a different PLT0: ff 35 08 00 00 00 pushq GOT+8(%rip) f2 ff 25 00 10 00 bnd jmpq *GOT+16(%rip) 0f 1f 00 nopl (%rax) to preserve bound registers for symbol lookup. For a symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, linker will use a 32-byte PLT1: f2 ff 25 00 00 00 00 bnd jmpq *name@GOTPCREL(%rip) 68 00 00 00 00 pushq $index f2 e9 00 00 00 00 bnd jmpq PLT0 0f 1f 80 00 00 00 00 nopl 0(%rax) 0f 1f 80 00 00 00 00 nopl 0(%rax) Prelink stores the offset of pushq of PLT1 (plt_base + 0x16) in GOT[1] and GOT[1] is stored in GOT[3]. We can undo prelink in GOT by computing the corresponding the pushq offset with GOT[1] + (GOT offset - &GOT[3]) * 2 It depends on that each pushq is 16-byte apart and GOT entry is 8 byte. To support prelink, each 16-byte block in PLT must have an 8-byte entry in GOT. Linker allocates 2 8-byte entries in GOT for each 32-byte PLT1. Then we can undo prelink by computing the corresponding the pushq offset with pushq_offset = GOT[1] + (GOT offset - &GOT[3]) * 2 pushq_offset += ((unsigned char *) pushq_offset)[6] == 0xf2 ? 1 : 0 For each symbol with R_X86_64_PC32_BND or R_X86_64_PLT32_BND relocations, this approach increases PLT size by 16 bytes and GOT size by 8 bytes. That is 24 bytes in total. Pros: No additional sections are needed. Cons: 24-byte memory overhead for each symbol with BND relocation.