On Sat, Mar 10, 2012 at 5:09 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
> On Fri, Mar 9, 2012 at 11:26 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>> On Mon, Mar 5, 2012 at 9:25 AM, Uros Bizjak <ubiz...@gmail.com> wrote:
>>> On Mon, Mar 5, 2012 at 6:03 PM, H.J. Lu <hjl.to...@gmail.com> wrote:
>>>
>>>>>> X86-64 linker optimizes TLS_MODEL_INITIAL_EXEC to TLS_MODEL_LOCAL_EXEC
>>>>>> by checking
>>>>>>
>>>>>>        movq foo@gottpoff(%rip), %reg
>>>>>>
>>>>>> and
>>>>>>
>>>>>>        addq foo@gottpoff(%rip), %reg
>>>>>>
>>>>>> It uses the REX prefix to avoid the last byte of the previous
>>>>>> instruction.  With 32bit Pmode, we may not have the REX prefix and
>>>>>> the last byte of the previous instruction may be an offset, which
>>>>>> may look like a REX prefix.  IE->LE optimization will generate corrupted
>>>>>> binary.  This patch makes sure we always output an REX pfrefix for
>>>>>> UNSPEC_GOTNTPOFF.  OK for trunk?
>>>>>
>>>>> Actually, linker has:
>>>>>
>>>>>    case R_X86_64_GOTTPOFF:
>>>>>      /* Check transition from IE access model:
>>>>>                mov foo@gottpoff(%rip), %reg
>>>>>                add foo@gottpoff(%rip), %reg
>>>>>       */
>>>>>
>>>>>      /* Check REX prefix first.  */
>>>>>      if (offset >= 3 && (offset + 4) <= sec->size)
>>>>>        {
>>>>>          val = bfd_get_8 (abfd, contents + offset - 3);
>>>>>          if (val != 0x48 && val != 0x4c)
>>>>>            {
>>>>>              /* X32 may have 0x44 REX prefix or no REX prefix.  */
>>>>>              if (ABI_64_P (abfd))
>>>>>                return FALSE;
>>>>>            }
>>>>>        }
>>>>>      else
>>>>>        {
>>>>>          /* X32 may not have any REX prefix.  */
>>>>>          if (ABI_64_P (abfd))
>>>>>            return FALSE;
>>>>>          if (offset < 2 || (offset + 3) > sec->size)
>>>>>            return FALSE;
>>>>>        }
>>>>>
>>>>> So, it should handle the case without REX just OK. If it doesn't, then
>>>>> this is a bug in binutils.
>>>>>
>>>>
>>>> The last byte of the displacement in the previous instruction
>>>> may happen to look like a REX byte. In that case, linker
>>>> will overwrite the last byte of the previous instruction and
>>>> generate the wrong instruction sequence.
>>>>
>>>> I need to update linker to enforce the REX byte check.
>>>
>>> One important observation: if we want to follow the x86_64 TLS spec
>>> strictly, we have to use existing DImode patterns only. This also
>>> means that we should NOT convert other TLS patterns to Pmode, since
>>> they explicitly state movq and addq. If this is not the case, then we
>>> need new TLS specification for X32.
>>
>> Here is a patch to properly generate X32 IE sequence.
>>
>> This is the summary of differences between x86-64 TLS and x32 TLS:
>>
>>                     x86-64                               x32
>> GD
>>    byte 0x66; leaq foo@tlsgd(%rip),%rdi;         leaq foo@tlsgd(%rip),%rdi;
>>    .word 0x6666; rex64; call __tls_get_addr@plt  .word 0x6666; rex64;
>> call __tls_get_addr@plt
>>
>> GD->IE optimization
>>   movq %fs:0,%rax; addq x@gottpoff(%rip),%rax    movl %fs:0,%eax;
>> addq x@gottpoff(%rip),%rax
>>
>> GD->LE optimization
>>   movq %fs:0,%rax; leaq x@tpoff(%rax),%rax       movl %fs:0,%eax;
>> leaq x@tpoff(%rax),%rax
>>
>> LD
>>  leaq foo@tlsld(%rip),%rdi;                      leaq foo@tlsld(%rip),%rdi;
>>  call __tls_get_addr@plt                         call __tls_get_addr@plt
>>
>> LD->LE optimization
>>  .word 0x6666; .byte 0x66; movq %fs:0, %rax      nopl 0x0(%rax); movl
>> %fs:0, %eax
>>
>> IE
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>
>>   or
>>                                                  Not supported if
>> Pmode == SImode
>>   movq x@gottpoff(%rip),%reg64;                  movq 
>> x@gottpoff(%rip),%reg64;
>>   movq %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>
>> IE->LE optimization
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq x@gottpoff(%rip),%reg64                   addl x@gottpoff(%rip),%reg32
>>
>>   to
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq foo@tpoff, %reg64                         addl foo@tpoff, %reg32
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   leaq foo@tpoff(%reg64), %reg64                 leal foo@tpoff(%reg32), 
>> %reg32
>>
>>   or
>>
>>   movq x@gottpoff(%rip),%reg64                   movq 
>> x@gottpoff(%rip),%reg64;
>>   movl %fs:(%reg64),%reg32                       movl %fs:(%reg64), %reg32
>>
>>   to
>>
>>   movq foo@tpoff, %reg64                         movq foo@tpoff, %reg64
>>   movl %fs:(%reeg64),%reg32                      movl %fs:(%reg64), %reg32
>>
>> LE
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   leaq x@tpoff(%reg64),%reg32                    leal x@tpoff(%reg32),%reg32
>>
>>   or
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   addq $x@tpoff,%reg64                           addl $x@tpoff,%reg32
>>
>>   or
>>
>>   movq %fs:0,%reg64;                             movl %fs:0,%reg32;
>>   movl x@tpoff(%reg64),%reg32                    movl x@tpoff(%reg32),%reg32
>>
>>   or
>>
>>   movl %fs:x@tpoff,%reg32                        movl %fs:x@tpoff,%reg32
>>
>>
>> X32 TLS implementation is straight forward, except for IE:
>>
>> 1. Since address override works only on the (reg32) part in fs:(reg32),
>> we can't use it as memory operand.  This patch changes ix86_decompose_address
>> to disallow  fs:(reg) if Pmode != word_mode.
>> 2. When Pmode == SImode, there may be no REX prefix for ADD.  Avoid
>> any instructions between MOV and ADD, which may interfere linker
>> IE->LE optimization, since the last byte of the previous instruction
>> before ADD may look like a REX prefix.  This patch adds tls_initial_exec_x32
>> to make sure that we always have
>>
>> movl %fs:0, %reg32
>> addl xgottpoff(%rip), %reg32
>>
>> so that the last byte of the previous instruction before ADD will
>> never be a REX byte.  Tested on Linux/x32.
>>
>> 2012-03-09  H.J. Lu  <hongjiu...@intel.com>
>>
>>        * config/i386/i386.c (ix86_decompose_address): Disallow fs:(reg)
>>        if Pmode != word_mode.
>>        (legitimize_tls_address): Call gen_tls_initial_exec_x32 if
>>        Pmode == SImode for x32.
>>
>>        * config/i386/i386.md (UNSPEC_TLS_IE_X32): New.
>>        (tls_initial_exec_x32): Likewise.
>
> Nice solution!
>
> OK for mainline.

Done.

> BTW: Did you investigate the issue with memory aliasing?
>

It isn't a problem since it is wrapped in UNSPEC_TLS_IE_X32
which loads address of the TLS symbol.

Thanks.

-- 
H.J.

Reply via email to