On Sun, Mar 11, 2012 at 6:11 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC >>>>>>>> by checking >>>>>>>> >>>>>>>> movq foo@gottpoff(%rip), %reg >>>>>>>> >>>>>>>> and >>>>>>>> >>>>>>>> addq foo@gottpoff(%rip), %reg >>>>>>>> >>>>>>>> It uses the REX prefix to avoid the last byte of the previous >>>>>>>> instruction. With 32bit Pmode, we may not have the REX prefix and >>>>>>>> the last byte of the previous instruction may be an offset, which >>>>>>>> may look like a REX prefix. IE->LE optimization will generate >>>>>>>> corrupted >>>>>>>> binary. This patch makes sure we always output an REX pfrefix for >>>>>>>> UNSPEC_GOTNTPOFF. OK for trunk? >>>>>>> >>>>>>> Actually, linker has: >>>>>>> >>>>>>> case R_X86_64_GOTTPOFF: >>>>>>> /* Check transition from IE access model: >>>>>>> mov foo@gottpoff(%rip), %reg >>>>>>> add foo@gottpoff(%rip), %reg >>>>>>> */ >>>>>>> >>>>>>> /* Check REX prefix first. */ >>>>>>> if (offset >= 3 && (offset + 4) <= sec->size) >>>>>>> { >>>>>>> val = bfd_get_8 (abfd, contents + offset - 3); >>>>>>> if (val != 0x48 && val != 0x4c) >>>>>>> { >>>>>>> /* X32 may have 0x44 REX prefix or no REX prefix. */ >>>>>>> if (ABI_64_P (abfd)) >>>>>>> return FALSE; >>>>>>> } >>>>>>> } >>>>>>> else >>>>>>> { >>>>>>> /* X32 may not have any REX prefix. */ >>>>>>> if (ABI_64_P (abfd)) >>>>>>> return FALSE; >>>>>>> if (offset < 2 || (offset + 3) > sec->size) >>>>>>> return FALSE; >>>>>>> } >>>>>>> >>>>>>> So, it should handle the case without REX just OK. If it doesn't, then >>>>>>> this is a bug in binutils. >>>>>>> >>>>>> >>>>>> The last byte of the displacement in the previous instruction >>>>>> may happen to look like a REX byte. In that case, linker >>>>>> will overwrite the last byte of the previous instruction and >>>>>> generate the wrong instruction sequence. >>>>>> >>>>>> I need to update linker to enforce the REX byte check. >>>>> >>>>> One important observation: if we want to follow the x86_64 TLS spec >>>>> strictly, we have to use existing DImode patterns only. This also >>>>> means that we should NOT convert other TLS patterns to Pmode, since >>>>> they explicitly state movq and addq. If this is not the case, then we >>>>> need new TLS specification for X32. >>>> >>>> Here is a patch to properly generate X32 IE sequence. >>>> >>>> This is the summary of differences between x86-64 TLS and x32 TLS: >>>> >>>> x86-64 x32 >>>> GD >>>> byte 0x66; leaq foo@tlsgd(%rip),%rdi; leaq foo@tlsgd(%rip),%rdi; >>>> .word 0x6666; rex64; call __tls_get_addr@plt .word 0x6666; rex64; >>>> call __tls_get_addr@plt >>>> >>>> GD->IE optimization >>>> movq %fs:0,%rax; addq x@gottpoff(%rip),%rax movl %fs:0,%eax; >>>> addq x@gottpoff(%rip),%rax >>>> >>>> GD->LE optimization >>>> movq %fs:0,%rax; leaq x@tpoff(%rax),%rax movl %fs:0,%eax; >>>> leaq x@tpoff(%rax),%rax >>>> >>>> LD >>>> leaq foo@tlsld(%rip),%rdi; leaq foo@tlsld(%rip),%rdi; >>>> call __tls_get_addr@plt call __tls_get_addr@plt >>>> >>>> LD->LE optimization >>>> .word 0x6666; .byte 0x66; movq %fs:0, %rax nopl 0x0(%rax); movl >>>> %fs:0, %eax >>>> >>>> IE >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq x@gottpoff(%rip),%reg64 addl >>>> x@gottpoff(%rip),%reg32 >>>> >>>> or >>>> Not supported if >>>> Pmode == SImode >>>> movq x@gottpoff(%rip),%reg64; movq >>>> x@gottpoff(%rip),%reg64; >>>> movq %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>> >>>> IE->LE optimization >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq x@gottpoff(%rip),%reg64 addl >>>> x@gottpoff(%rip),%reg32 >>>> >>>> to >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq foo@tpoff, %reg64 addl foo@tpoff, %reg32 >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> leaq foo@tpoff(%reg64), %reg64 leal foo@tpoff(%reg32), >>>> %reg32 >>>> >>>> or >>>> >>>> movq x@gottpoff(%rip),%reg64 movq >>>> x@gottpoff(%rip),%reg64; >>>> movl %fs:(%reg64),%reg32 movl %fs:(%reg64), %reg32 >>>> >>>> to >>>> >>>> movq foo@tpoff, %reg64 movq foo@tpoff, %reg64 >>>> movl %fs:(%reeg64),%reg32 movl %fs:(%reg64), %reg32 >>>> >>>> LE >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> leaq x@tpoff(%reg64),%reg32 leal >>>> x@tpoff(%reg32),%reg32 >>>> >>>> or >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> addq $x@tpoff,%reg64 addl $x@tpoff,%reg32 >>>> >>>> or >>>> >>>> movq %fs:0,%reg64; movl %fs:0,%reg32; >>>> movl x@tpoff(%reg64),%reg32 movl >>>> x@tpoff(%reg32),%reg32 >>>> >>>> or >>>> >>>> movl %fs:x@tpoff,%reg32 movl %fs:x@tpoff,%reg32 >>>> >>>> >>>> X32 TLS implementation is straight forward, except for IE: >>>> >>>> 1. Since address override works only on the (reg32) part in fs:(reg32), >>>> we can't use it as memory operand. This patch changes >>>> ix86_decompose_address >>>> to disallow fs:(reg) if Pmode != word_mode. >>>> 2. When Pmode == SImode, there may be no REX prefix for ADD. Avoid >>>> any instructions between MOV and ADD, which may interfere linker >>>> IE->LE optimization, since the last byte of the previous instruction >>>> before ADD may look like a REX prefix. This patch adds >>>> tls_initial_exec_x32 >>>> to make sure that we always have >>>> >>>> movl %fs:0, %reg32 >>>> addl xgottpoff(%rip), %reg32 >>>> >>>> so that the last byte of the previous instruction before ADD will >>>> never be a REX byte. Tested on Linux/x32. >>>> >>>> 2012-03-09 H.J. Lu <hongjiu...@intel.com> >>>> >>>> * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg) >>>> if Pmode != word_mode. >>>> (legitimize_tls_address): Call gen_tls_initial_exec_x32 if >>>> Pmode == SImode for x32. >>>> >>>> * config/i386/i386.md (UNSPEC_TLS_IE_X32): New. >>>> (tls_initial_exec_x32): Likewise. >>> >>> Nice solution! >>> >>> OK for mainline. >> >> Done. >> >>> BTW: Did you investigate the issue with memory aliasing? >>> >> >> It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32 >> which loads address of the TLS symbol. >> >> Thanks. >> > > Since we must use reg64 in %fs:(%reg) memory operand like > > movq x@gottpoff(%rip),%reg64; > mov %fs:(%reg64),%reg > > this patch optimizes x32 TLS IE load and store by wrapping > %reg64 inside of UNSPEC when Pmode == SImode. OK for > trunk? I think we should just scrap all these complications and go with the idea of clearing MASK_TLS_DIRECT_SEG_REFS. Uros.