On Wed, Jan 27, 2016 at 11:49 AM, Uros Bizjak <ubiz...@gmail.com> wrote: > On Wed, Jan 27, 2016 at 8:25 PM, H.J. Lu <hongjiu...@intel.com> wrote: >> >> __tls_get_addr must be called with 16-byte aligned stack, which is >> guaranted by setting preferred_stack_boundary to 128 bits. There >> is no need to change stack_alignment_needed for __tls_get_addr. >> >> Tested on x86-64. OK for trunk? > > You know the purpose of these flags better than I, so - OK. > > Thanks, > Uros. > >> >> H.J. >> -- >> PR target/68986 >> * config/i386/i386.c (ix86_update_stack_boundary): Don't >> change stack_alignment_needed for __tls_get_addr call.
Here is the backport for GCC 5. Ok for gcc-5-branch? -- H.J.
From 3c93d02be1e4b41d9116da6262cf083a65439280 Mon Sep 17 00:00:00 2001 From: hjl <hjl@138bc75d-0d04-0410-961f-82ee72b054a4> Date: Tue, 26 Jan 2016 12:51:07 +0000 Subject: [PATCH] Update preferred stack boundary in ix86_update_stack_boundary __tls_get_addr must be called with 16-byte aligned stack, which is guaranted by setting preferred_stack_boundary to 128 bits. Preferred stack boundary adjustment for __tls_get_addr should be done in ix86_update_stack_boundary, not ix86_compute_frame_layout Also there is no need to over-align stack for __tls_get_addr and function with __tls_get_addr call isn't a leaf function. gcc/ Backport from mainline PR target/68986 * config/i386/i386.c (ix86_compute_frame_layout): Move stack alignment adjustment to ... (ix86_update_stack_boundary): Here. Don't over-align stack nor change stack_alignment_needed for __tls_get_addr. (ix86_finalize_stack_realign_flags): Use stack_alignment_needed if __tls_get_addr is called. gcc/testsuite/ Backport from mainline PR target/68986 * gcc.target/i386/pr68986-1.c: New test. * gcc.target/i386/pr68986-2.c: Likewise. * gcc.target/i386/pr68986-3.c: Likewise. --- gcc/config/i386/i386.c | 26 ++++++++++---------------- gcc/testsuite/gcc.target/i386/pr68986-1.c | 11 +++++++++++ gcc/testsuite/gcc.target/i386/pr68986-2.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pr68986-3.c | 13 +++++++++++++ 4 files changed, 47 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr68986-3.c diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 504e8b8..a99d53b 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -10131,18 +10131,6 @@ ix86_compute_frame_layout (struct ix86_frame *frame) crtl->preferred_stack_boundary = 128; crtl->stack_alignment_needed = 128; } - /* preferred_stack_boundary is never updated for call - expanded from tls descriptor. Update it here. We don't update it in - expand stage because according to the comments before - ix86_current_function_calls_tls_descriptor, tls calls may be optimized - away. */ - else if (ix86_current_function_calls_tls_descriptor - && crtl->preferred_stack_boundary < PREFERRED_STACK_BOUNDARY) - { - crtl->preferred_stack_boundary = PREFERRED_STACK_BOUNDARY; - if (crtl->stack_alignment_needed < PREFERRED_STACK_BOUNDARY) - crtl->stack_alignment_needed = PREFERRED_STACK_BOUNDARY; - } stack_alignment_needed = crtl->stack_alignment_needed / BITS_PER_UNIT; preferred_alignment = crtl->preferred_stack_boundary / BITS_PER_UNIT; @@ -10816,6 +10804,11 @@ ix86_update_stack_boundary (void) && cfun->stdarg && crtl->stack_alignment_estimated < 128) crtl->stack_alignment_estimated = 128; + + /* __tls_get_addr needs to be called with 16-byte aligned stack. */ + if (ix86_tls_descriptor_calls_expanded_in_cfun + && crtl->preferred_stack_boundary < 128) + crtl->preferred_stack_boundary = 128; } /* Handle the TARGET_GET_DRAP_RTX hook. Return NULL if no DRAP is @@ -11275,10 +11268,11 @@ ix86_finalize_stack_realign_flags (void) unsigned int incoming_stack_boundary = (crtl->parm_stack_boundary > ix86_incoming_stack_boundary ? crtl->parm_stack_boundary : ix86_incoming_stack_boundary); - unsigned int stack_realign = (incoming_stack_boundary - < (crtl->is_leaf - ? crtl->max_used_stack_slot_alignment - : crtl->stack_alignment_needed)); + unsigned int stack_realign + = (incoming_stack_boundary + < (crtl->is_leaf && !ix86_current_function_calls_tls_descriptor + ? crtl->max_used_stack_slot_alignment + : crtl->stack_alignment_needed)); if (crtl->stack_realign_finalized) { diff --git a/gcc/testsuite/gcc.target/i386/pr68986-1.c b/gcc/testsuite/gcc.target/i386/pr68986-1.c new file mode 100644 index 0000000..998f34f --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr68986-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-fPIC -mno-accumulate-outgoing-args -mpreferred-stack-boundary=5 -mincoming-stack-boundary=4" } */ + +extern __thread int msgdata; +int +foo () +{ + return msgdata; +} diff --git a/gcc/testsuite/gcc.target/i386/pr68986-2.c b/gcc/testsuite/gcc.target/i386/pr68986-2.c new file mode 100644 index 0000000..c3a366c --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr68986-2.c @@ -0,0 +1,13 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-fPIC -mno-accumulate-outgoing-args -mpreferred-stack-boundary=2" } */ + +extern __thread int msgdata; +int +foo () +{ + return msgdata; +} + +/* { dg-final { scan-assembler "andl\[\\t \]*\\$-16,\[\\t \]*%esp" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr68986-3.c b/gcc/testsuite/gcc.target/i386/pr68986-3.c new file mode 100644 index 0000000..5744cf2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr68986-3.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target tls_native } */ +/* { dg-require-effective-target fpic } */ +/* { dg-options "-fPIC -mno-sse -mpreferred-stack-boundary=3 -mincoming-stack-boundary=3" } */ + +extern __thread int msgdata; +int +foo () +{ + return msgdata; +} + +/* { dg-final { scan-assembler "and\[lq\]\[\\t \]*\\$-16,\[\\t \]*%\[re\]?sp" } } */ -- 2.5.0