On 05/20/16 11:09, Alexander Monakov wrote:
This patch implements '-msoft-stack' code generation variant for NVPTX. The
goal is to avoid relying on '.local' memory space for placement of automatic
data, and instead have an explicitely-maintained stack pointer (which can be
set up to point to preallocated global memory space). This allows to have
stack data accessible from all threads and modifiable with atomic
instructions. This also allows to implement variable-length stack allocation
(for 'alloca' and C99 VLAs).
Each warp has its own 'soft stack' pointer. It lives in shared memory array
called __nvptx_stacks at index %tid.y (like OpenACC, OpenMP is offloading is
going to use launch geometry such that %tid.y gives the warp index). It is
retrieved in function prologue (if the function needs a stack frame) and may
also be written there (if the function is non-leaf, so that its callees see
the updated stack pointer), and restored prior to returning.
Startup code is responsible for setting up the initial soft-stack pointer. For
-mmainkernel testing it is libgcc's __main, for OpenMP offloading it's the
kernel region entry code.
ah, that's much more understandable, thanks. Presumably this doesn't support
worker-single mode (in OpenACC parlance, I don't know what the OpenMP version of
that is?) And neither would it support calls from vector-partitioned code (I
think that's SIMD in OpenMP-land?). It seems like we should reject the
combination of -msoft-stack -fopenacc?
2016-05-19 Alexander Monakov <amona...@ispras.ru>
2016-05-19 Alexander Monakov <amona...@ispras.ru>
2016-05-19 Alexander Monakov <amona...@ispras.ru>
2016-05-19 Alexander Monakov <amona...@ispras.ru>
2016-03-15 Alexander Monakov <amona...@ispras.ru>
2015-12-14 Alexander Monakov <amona...@ispras.ru>
2015-12-09 Alexander Monakov <amona...@ispras.ru>
why so many changelogs? The on-branch development history is irrelvant for
trunk -- the usual single changelog style should be followed.
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2d4dad1..700c4b0 100644
@@ -992,16 +1091,56 @@ nvptx_declare_function_name (FILE *file, const char
*name, const_tree decl)
+ else if (need_frameptr || cfun->machine->has_varadic || cfun->calls_alloca)
+ {
+ /* Maintain 64-bit stack alignment. */
This block needs a more descriptive comment -- it appears to be doing a great
deal more than maintaining 64-bit stack alignment!
+ int keep_align = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
+ sz = ROUND_UP (sz, keep_align);
+ int bits = POINTER_SIZE;
+ fprintf (file, "\t.reg.u%d %%frame;\n", bits);
+ fprintf (file, "\t.reg.u32 %%fstmp0;\n");
+ fprintf (file, "\t.reg.u%d %%fstmp1;\n", bits);
+ fprintf (file, "\t.reg.u%d %%fstmp2;\n", bits);
Some of these register names appear to be long lived -- and referenced in other
functions. It would be better to give those more descriptive names, or even
give them hard-regs. You should certainly do so for those that are already
hard regs (%frame & %stack) -- is it more feasible to augment init_frame to
initialize them?
@@ -1037,6 +1178,10 @@ nvptx_output_return (void)
{
machine_mode mode = (machine_mode)cfun->machine->return_mode;
+ if (cfun->machine->using_softstack)
+ fprintf (asm_out_file, "\tst.shared.u%d [%%fstmp2], %%fstmp1;\n",
See note above about obscure reg names.
Since ptx is a virtual target, we just define a few
hard registers for special purposes and leave pseudos unallocated.
@@ -200,6 +205,7 @@ struct GTY(()) machine_function
bool is_varadic; /* This call is varadic */
bool has_varadic; /* Current function has a varadic call. */
bool has_chain; /* Current function has outgoing static chain. */
+ bool using_softstack; /* Need to restore __nvptx_stacks[tid.y]. */
Comment should describe what the attribute is, not what it implies. In this
case I think it's /* Current function has a soft stack frame. */
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 33a4862..e5650b6 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
+(define_insn "set_softstack_insn"
+ [(unspec [(match_operand 0 "nvptx_register_operand" "R")] UNSPEC_ALLOCA)]
+ "TARGET_SOFT_STACK"
+{
+ return (cfun->machine->using_softstack
+ ? "%.\\tst.shared%t0\\t[%%fstmp2], %0;"
+ : "");
+})
Is this alloca related (UNSPEC_ALLOCA) or restore related (invoked in
restore_stack_block), or stack setting (as insn name suggests). Things seem
inconsistently named. Comments would be good.
(define_expand "restore_stack_block"
[(match_operand 0 "register_operand" "")
(match_operand 1 "register_operand" "")]
""
{
+ if (TARGET_SOFT_STACK)
+ {
+ emit_move_insn (operands[0], operands[1]);
+ emit_insn (gen_set_softstack_insn (operands[0]));
Is it necessary to store the soft stack here? Only restore_stack_block seems
defined, and not save_stack_block. Why the asymmetry?
diff --git a/gcc/testsuite/gcc.target/nvptx/softstack.c
b/gcc/testsuite/gcc.target/nvptx/softstack.c
new file mode 100644
index 0000000..73e60f2
--- /dev/null
This test is ok, but probably wise to check varadic generates the expected code?
Are changes needed in nvptx's crt0.s (or an auxilary file in libc or libgcc) to
make this work for testing? Have you tested an nvptx-elf target with this flag on?
nathan