On Tue, Jun 9, 2015 at 6:21 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote:
>> > something?  Would it be acceptable to just guard the changes in the patch
>> > with !TARGET_X32 and let H.J. deal with that target?  I'm afraid I'm lost
>> > when to ZERO_EXTEND addr (if needed at all), etc.
>>
>> If you wish, I can take your patch and take if further. -mx32 is a
>> delicate beast...
>
> If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff
> now.
> Note that for -m64/-mx32 it will be much harder to create a reproducer,
> because to trigger the bug one has to convince the register allocator
> to allocate the lhs of the load in certain registers (not that hard),
> but also the index register (to be scaled, also not that hard) and
> also the register holding the tls symbol immediate.  Wonder if one has to
> keep all but the two registers live across the load or something similar.

Please find attach a patch that takes your idea slightly further. We
find  perhaps zero-extended UNSPEC_TP, and copy it for further use. At
its place, we simply slap const0_rtx. We know that address to
multi-word values has to be offsettable, which in case of x32 means
that it is NOT zero-extended address.

Uros.
Index: config/i386/i386.c
===================================================================
--- config/i386/i386.c  (revision 224292)
+++ config/i386/i386.c  (working copy)
@@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[])
         Do an lea to the last part and use only one colliding move.  */
       else if (collisions > 1)
        {
-         rtx base;
+         rtx base, addr, tls_base = NULL_RTX;
 
          collisions = 1;
 
@@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[])
          if (GET_MODE (base) != Pmode)
            base = gen_rtx_REG (Pmode, REGNO (base));
 
-         emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0)));
+         addr = XEXP (part[1][0], 0);
+         if (TARGET_TLS_DIRECT_SEG_REFS)
+           {
+             struct ix86_address parts;
+             int ok = ix86_decompose_address (addr, &parts);
+             gcc_assert (ok);
+             if (parts.seg != SEG_DEFAULT)
+               {
+                 /* It is not valid to use %gs: or %fs: in
+                    lea though, so we need to remove it from the
+                    address used for lea and add it to each individual
+                    memory loads instead.  */
+                 rtx *x = &addr;
+                  while (GET_CODE (*x) == PLUS)
+                    {
+                      for (i = 0; i < 2; i++)
+                       {
+                         rtx op = XEXP (*x, i);
+                         if ((GET_CODE (op) == UNSPEC
+                            && XINT (op, 1) == UNSPEC_TP)
+                           || (GET_CODE (op) == ZERO_EXTEND
+                               && GET_CODE (XEXP (op, 0)) == UNSPEC
+                               && (XINT (XEXP (op, 0), 1)
+                                   == UNSPEC_TP)))
+                         {
+                           tls_base = XEXP (*x, i);
+                           XEXP (*x, i) = const0_rtx;
+                           break;
+                         }
+                       }
+
+                     if (tls_base)
+                       break;
+                     x = &XEXP (*x, 0);
+                   }
+                 gcc_assert (tls_base);
+               }
+           }
+         emit_insn (gen_rtx_SET (base, addr));
+         if (tls_base)
+           base = gen_rtx_PLUS (GET_MODE (base), base, tls_base);
          part[1][0] = replace_equiv_address (part[1][0], base);
          for (i = 1; i < nparts; i++)
            {
+             if (tls_base)
+               base = copy_rtx (base);
              tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i);
              part[1][i] = replace_equiv_address (part[1][i], tmp);
            }

Reply via email to