On 2022-12-01 06:50 Palmer Dabbelt <pal...@dabbelt.com> wrote: > >On Wed, 30 Nov 2022 00:37:17 PST (-0800), gao...@eswincomputing.com wrote: >> The stack that save-restore reserves is not well accumulated in stack >> allocation and deallocation. >> This patch allows less instructions to be used in stack allocation and >> deallocation if save-restore enabled, >> and also a much clear logic for save-restore stack manipulation. >> >> before patch: >> bar: >> call t0,__riscv_save_4 >> addi sp,sp,-64 >> ... >> li t0,-12288 >> addi t0,t0,-1968 # optimized out after patch >> add sp,sp,t0 # prologue >> ... >> li t0,12288 # epilogue >> addi t0,t0,2000 # optimized out after patch >> add sp,sp,t0 >> ... >> addi sp,sp,32 >> tail __riscv_restore_4 >> >> after patch: >> bar: >> call t0,__riscv_save_4 >> addi sp,sp,-2032 >> ... >> li t0,-12288 >> add sp,sp,t0 # prologue >> ... >> li t0,12288 # epilogue >> add sp,sp,t0 >> ... >> addi sp,sp,2032 >> tail __riscv_restore_4 >> >> gcc/ChangeLog: >> >> * config/riscv/riscv.cc (riscv_first_stack_step): add a new function >>parameter remaining_size. >> (riscv_compute_frame_info): adapt new riscv_first_stack_step >>interface. >> (riscv_expand_prologue): consider save-restore in stack allocation. >> (riscv_expand_epilogue): consider save-restore in stack deallocation. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/riscv/stack_save_restore.c: New test. >> --- >> gcc/config/riscv/riscv.cc | 58 ++++++++++--------- >> .../gcc.target/riscv/stack_save_restore.c | 40 +++++++++++++ >> 2 files changed, 70 insertions(+), 28 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/riscv/stack_save_restore.c > >I guess with the RISC-V backend still being open for things as big as >the V port we should probably be taking code like this as well? I >wouldn't be opposed to making an exception for the V code and holding >everything else back, though. > >> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc >> index 05bdba5ab4d..9e92e729a5f 100644 >> --- a/gcc/config/riscv/riscv.cc >> +++ b/gcc/config/riscv/riscv.cc >> @@ -4634,7 +4634,7 @@ riscv_save_libcall_count (unsigned mask) >> They decrease stack_pointer_rtx but leave frame_pointer_rtx and >> hard_frame_pointer_rtx unchanged. */ >> >> -static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info >> *frame); >> +static HOST_WIDE_INT riscv_first_stack_step (struct riscv_frame_info >> *frame, poly_int64 remaining_size); >> >> /* Handle stack align for poly_int. */ >> static poly_int64 >> @@ -4663,7 +4663,7 @@ riscv_compute_frame_info (void) >> save/restore t0. We check for this before clearing the frame struct. >>*/ >> if (cfun->machine->interrupt_handler_p) >> { >> - HOST_WIDE_INT step1 = riscv_first_stack_step (frame); >> + HOST_WIDE_INT step1 = riscv_first_stack_step (frame, >> frame->total_size); >> if (! POLY_SMALL_OPERAND_P ((frame->total_size - step1))) >> interrupt_save_prologue_temp = true; >> } >> @@ -4913,31 +4913,31 @@ riscv_restore_reg (rtx reg, rtx mem) >> without adding extra instructions. */ >> >> static HOST_WIDE_INT >> -riscv_first_stack_step (struct riscv_frame_info *frame) >> +riscv_first_stack_step (struct riscv_frame_info *frame, poly_int64 >> remaining_size) >> { >> - HOST_WIDE_INT frame_total_constant_size; >> - if (!frame->total_size.is_constant ()) >> - frame_total_constant_size >> - = riscv_stack_align (frame->total_size.coeffs[0]) >> - - riscv_stack_align (frame->total_size.coeffs[1]); >> + HOST_WIDE_INT remaining_const_size; >> + if (!remaining_size.is_constant ()) >> + remaining_const_size >> + = riscv_stack_align (remaining_size.coeffs[0]) >> + - riscv_stack_align (remaining_size.coeffs[1]); > >The alignment looks off here, at least in the email. Worth fixing it up >if you're touching the lines anyway.
Sure, i will change RISCV_STACK_ALIGN into riscv_stack_align. > >> else >> - frame_total_constant_size = frame->total_size.to_constant (); >> + remaining_const_size = remaining_size.to_constant (); >> >> - if (SMALL_OPERAND (frame_total_constant_size)) >> - return frame_total_constant_size; >> + if (SMALL_OPERAND (remaining_const_size)) >> + return remaining_const_size; >> >> HOST_WIDE_INT min_first_step = >> - RISCV_STACK_ALIGN ((frame->total_size - >> frame->frame_pointer_offset).to_constant()); >> + RISCV_STACK_ALIGN ((remaining_size - >> frame->frame_pointer_offset).to_constant()); >> HOST_WIDE_INT max_first_step = IMM_REACH / 2 - PREFERRED_STACK_BOUNDARY / >>8; >> - HOST_WIDE_INT min_second_step = frame_total_constant_size - >> max_first_step; >> + HOST_WIDE_INT min_second_step = remaining_const_size - max_first_step; >> gcc_assert (min_first_step <= max_first_step); >> >> /* As an optimization, use the least-significant bits of the total frame >> size, so that the second adjustment step is just LUI + ADD. */ >> if (!SMALL_OPERAND (min_second_step) >> - && frame_total_constant_size % IMM_REACH < IMM_REACH / 2 >> - && frame_total_constant_size % IMM_REACH >= min_first_step) >> - return frame_total_constant_size % IMM_REACH; >> + && remaining_const_size % IMM_REACH < IMM_REACH / 2 >> + && remaining_const_size % IMM_REACH >= min_first_step) >> + return remaining_const_size % IMM_REACH; > >Looks like this entire frame->total_size -> remaining_size conversion >could be done as an independent patch that would change no >functionality, that's always a nice way to do things as it makes the >code easier to read. Sure, i will split this patch. > >I spent a bit poking around here and nothing wrong is jumping out, but >trying to keep all these offset differences in my head is a bit tricky. >If you have the time to refactor this to be easier to read that'd be >great, otherwise hopefully I (or someone else) will have the time to >take a look -- probably not today on my end, though, as I've got some >Linux backlog to look at. I'll try it. Also i propose to add step0 for pre-allocated stack cases like save-restore lib call, and the future Zcmp in Zc* extension if needed. So we have a clear logic of manipulate stack step by step. > >Thanks! > >> if (TARGET_RVC) >> { >> @@ -5037,9 +5037,7 @@ riscv_expand_prologue (void) >> /* Save the registers. */ >> if ((frame->mask | frame->fmask) != 0) >> { >> - HOST_WIDE_INT step1 = riscv_first_stack_step (frame); >> - if (size.is_constant ()) >> - step1 = MIN (size.to_constant(), step1); >> + HOST_WIDE_INT step1 = riscv_first_stack_step (frame, size); >> >> insn = gen_add3_insn (stack_pointer_rtx, >> stack_pointer_rtx, >> @@ -5142,6 +5140,8 @@ riscv_expand_epilogue (int style) >> HOST_WIDE_INT step2 = 0; >> bool use_restore_libcall = ((style == NORMAL_RETURN) >> && riscv_use_save_libcall (frame)); >> + unsigned libcall_size = use_restore_libcall ? >> + frame->save_libcall_adjustment : 0; >> rtx ra = gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM); >> rtx insn; >> >> @@ -5212,13 +5212,18 @@ riscv_expand_epilogue (int style) >> REG_NOTES (insn) = dwarf; >> } >> >> + if (use_restore_libcall) >> + frame->mask = 0; /* Temporarily fib for GPRs. */ >> + >> /* If we need to restore registers, deallocate as much stack as >> possible in the second step without going out of range. */ >> if ((frame->mask | frame->fmask) != 0) >> - { >> - step2 = riscv_first_stack_step (frame); >> - step1 -= step2; >> - } >> + step2 = riscv_first_stack_step (frame, frame->total_size - >> libcall_size); >> + >> + if (use_restore_libcall) >> + frame->mask = mask; /* Undo the above fib. */ >> + >> + step1 -= step2 + libcall_size; >> >> /* Set TARGET to BASE + STEP1. */ >> if (known_gt (step1, 0)) >> @@ -5272,15 +5277,12 @@ riscv_expand_epilogue (int style) >> frame->mask = 0; /* Temporarily fib that we need not save GPRs. */ >> >> /* Restore the registers. */ >> - riscv_for_each_saved_reg (frame->total_size - step2, riscv_restore_reg, >> + riscv_for_each_saved_reg (frame->total_size - step2 - libcall_size, >> + riscv_restore_reg, >> true, style == EXCEPTION_RETURN); >> >> if (use_restore_libcall) >> - { >> frame->mask = mask; /* Undo the above fib. */ >> - gcc_assert (step2 >= frame->save_libcall_adjustment); >> - step2 -= frame->save_libcall_adjustment; >> - } >> >> if (need_barrier_p) >> riscv_emit_stack_tie (); >> diff --git a/gcc/testsuite/gcc.target/riscv/stack_save_restore.c >> b/gcc/testsuite/gcc.target/riscv/stack_save_restore.c >> new file mode 100644 >> index 00000000000..4695ef9469a >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/riscv/stack_save_restore.c >> @@ -0,0 +1,40 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-march=rv32imafc -mabi=ilp32f -msave-restore -O2 >> -fno-schedule-insns -fno-schedule-insns2 -fno-unroll-loops -fno-peel-loops" >> } */ >> +/* { dg-final { check-function-bodies "**" "" } } */ >> + >> +char my_getchar(); >> +float getf(); >> + >> +/* >> +**bar: >> +** call t0,__riscv_save_4 >> +** addi sp,sp,-2032 >> +** ... >> +** li t0,-12288 >> +** add sp,sp,t0 >> +** ... >> +** li t0,12288 >> +** add sp,sp,t0 >> +** ... >> +** addi sp,sp,2032 >> +** tail __riscv_restore_4 >> +*/ > >The test needs to actually check this, it can't just be manual. I didn't get your point. The { dg-final { check-function-bodies "**" "" } } matches the output with the expected result above automatically. Please let me know your idea. Thanks & BR, Fei > >> +int bar() >> +{ >> + float volatile farray[3568]; >> + >> + float sum = 0; >> + float f1 = getf(); >> + float f2 = getf(); >> + float f3 = getf(); >> + float f4 = getf(); >> + >> + for (int i = 0; i < 3568; i++) >> + { >> + farray[i] = my_getchar() * 1.2; >> + sum += farray[i]; >> + } >> + >> + return sum + f1 + f2 + f3 + f4; >> +} >> +