On Tue, Jun 9, 2015 at 6:21 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Tue, Jun 09, 2015 at 06:16:32PM +0200, Uros Bizjak wrote: >> > something? Would it be acceptable to just guard the changes in the patch >> > with !TARGET_X32 and let H.J. deal with that target? I'm afraid I'm lost >> > when to ZERO_EXTEND addr (if needed at all), etc. >> >> If you wish, I can take your patch and take if further. -mx32 is a >> delicate beast... > > If you could, it would be appreciated, I'm quite busy with OpenMP 4.1 stuff > now. > Note that for -m64/-mx32 it will be much harder to create a reproducer, > because to trigger the bug one has to convince the register allocator > to allocate the lhs of the load in certain registers (not that hard), > but also the index register (to be scaled, also not that hard) and > also the register holding the tls symbol immediate. Wonder if one has to > keep all but the two registers live across the load or something similar.
Please find attach a patch that takes your idea slightly further. We find perhaps zero-extended UNSPEC_TP, and copy it for further use. At its place, we simply slap const0_rtx. We know that address to multi-word values has to be offsettable, which in case of x32 means that it is NOT zero-extended address. Uros.
Index: config/i386/i386.c =================================================================== --- config/i386/i386.c (revision 224292) +++ config/i386/i386.c (working copy) @@ -22858,7 +22858,7 @@ ix86_split_long_move (rtx operands[]) Do an lea to the last part and use only one colliding move. */ else if (collisions > 1) { - rtx base; + rtx base, addr, tls_base = NULL_RTX; collisions = 1; @@ -22869,10 +22869,52 @@ ix86_split_long_move (rtx operands[]) if (GET_MODE (base) != Pmode) base = gen_rtx_REG (Pmode, REGNO (base)); - emit_insn (gen_rtx_SET (base, XEXP (part[1][0], 0))); + addr = XEXP (part[1][0], 0); + if (TARGET_TLS_DIRECT_SEG_REFS) + { + struct ix86_address parts; + int ok = ix86_decompose_address (addr, &parts); + gcc_assert (ok); + if (parts.seg != SEG_DEFAULT) + { + /* It is not valid to use %gs: or %fs: in + lea though, so we need to remove it from the + address used for lea and add it to each individual + memory loads instead. */ + rtx *x = &addr; + while (GET_CODE (*x) == PLUS) + { + for (i = 0; i < 2; i++) + { + rtx op = XEXP (*x, i); + if ((GET_CODE (op) == UNSPEC + && XINT (op, 1) == UNSPEC_TP) + || (GET_CODE (op) == ZERO_EXTEND + && GET_CODE (XEXP (op, 0)) == UNSPEC + && (XINT (XEXP (op, 0), 1) + == UNSPEC_TP))) + { + tls_base = XEXP (*x, i); + XEXP (*x, i) = const0_rtx; + break; + } + } + + if (tls_base) + break; + x = &XEXP (*x, 0); + } + gcc_assert (tls_base); + } + } + emit_insn (gen_rtx_SET (base, addr)); + if (tls_base) + base = gen_rtx_PLUS (GET_MODE (base), base, tls_base); part[1][0] = replace_equiv_address (part[1][0], base); for (i = 1; i < nparts; i++) { + if (tls_base) + base = copy_rtx (base); tmp = plus_constant (Pmode, base, UNITS_PER_WORD * i); part[1][i] = replace_equiv_address (part[1][i], tmp); }