On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubiz...@gmail.com> wrote: >>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.to...@gmail.com> wrote: >>> >>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>> by checking >>>>>> >>>>>> movq foo@gottpoff(%rip), %reg >>>>>> >>>>>> and >>>>>> >>>>>> addq foo@gottpoff(%rip), %reg >>>>>> >>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>> the last byte of the previous instruction may be an offset, which >>>>>> may look like a REX prefix. IE->LE optimization will generate corrupted >>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>> >>>>> Actually, linker has: >>>>> >>>>> case R_X86_64_GOTTPOFF: >>>>> /* Check transition from IE access model: >>>>> mov foo@gottpoff(%rip), %reg >>>>> add foo@gottpoff(%rip), %reg >>>>> */ >>>>> >>>>> /* Check REX prefix first. */ >>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>> { >>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>> if (val != 0x48 && val != 0x4c) >>>>> { >>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>> if (ABI_64_P (abfd)) >>>>> return FALSE; >>>>> } >>>>> } >>>>> else >>>>> { >>>>> /* X32 may not have any REX prefix. */ >>>>> if (ABI_64_P (abfd)) >>>>> return FALSE; >>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>> return FALSE; >>>>> } >>>>> >>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>> this is a bug in binutils. >>>>> >>>> >>>> The last byte of the displacement in the previous instruction >>>> may happen to look like a REX byte. In that case, linker >>>> will overwrite the last byte of the previous instruction and >>>> generate the wrong instruction sequence. >>>> >>>> I need to update linker to enforce the REX byte check. >>> >>> One important observation: if we want to follow the x86_64 TLS spec >>> strictly, we have to use existing DImode patterns only. This also >>> means that we should NOT convert other TLS patterns to Pmode, since >>> they explicitly state movq and addq. If this is not the case, then we >>> need new TLS specification for X32. >> >> Here is a patch to properly generate X32 IE sequence. >> >> This is the summary of differences between x86-64 TLS and x32 TLS: >> >> x86-64 x32 >> GD >> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >> call __tls_get_addr@plt >> >> GD->IE optimization >> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >> addq x@gottpoff(%rip),%rax >> >> GD->LE optimization >> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >> leaq x@tpoff(%rax),%rax >> >> LD >> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >> call __tls_get_addr@plt call __tls_get_addr@plt >> >> LD->LE optimization >> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >> %fs:0, %eax >> >> IE >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >> >> or >> Not supported if >> Pmode == SImode >> movq x@gottpoff(%rip),%reg64; movq >> x@gottpoff(%rip),%reg64; >> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >> >> IE->LE optimization >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq x@gottpoff(%rip),%reg64 addl x@gottpoff(%rip),%reg32 >> >> to >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), >> %reg32 >> >> or >> >> movq x@gottpoff(%rip),%reg64 movq >> x@gottpoff(%rip),%reg64; >> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >> >> to >> >> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >> >> LE >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> leaq x@tpoff(%reg64),%reg32 leal x@tpoff(%reg32),%reg32 >> >> or >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >> >> or >> >> movq %fs:0,%reg64; movl %fs:0,%reg32; >> movl x@tpoff(%reg64),%reg32 movl x@tpoff(%reg32),%reg32 >> >> or >> >> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >> >> >> X32 TLS implementation is straight forward, except for IE: >> >> 1. Since address override works only on the (reg32) part in fs:(reg32), >> we can't use it as memory operand. This patch changes ix86_decompose_address >> to disallow fs:(reg) if Pmode != word_mode. >> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >> any instructions between MOV and ADD, which may interfere linker >> IE->LE optimization, since the last byte of the previous instruction >> before ADD may look like a REX prefix. This patch adds tls_initial_exec_x32 >> to make sure that we always have >> >> movl %fs:0, %reg32 >> addl xgottpoff(%rip), %reg32 >> >> so that the last byte of the previous instruction before ADD will >> never be a REX byte. Tested on Linux/x32. >> >> 2012-03-09 H.J. Lu <hongjiu...@intel.com> >> >> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >> if Pmode != word_mode. >> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >> Pmode == SImode for x32. >> >> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >> (tls_initial_exec_x32): Likewise. > > Nice solution! > > OK for mainline.
Done. > BTW: Did you investigate the issue with memory aliasing? > It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 which loads address of the TLS symbol. Thanks. -- H.J.