On 10/08/2015 01:29 AM, Abe wrote:
Attached please find my revised patch to the RTL if converter. This
patch enables the
if-conversion of half-hammocks with a store in them that the internal
GCC machinery
otherwise considers too hazardous to if-convert. This is made safe by
using the
"scratchpad" technique, i.e. throwing away the store into a safe
location where nothing
of any importance is currently stored. The scratchpads are allocated in
the stack frame.
So, one conceptual issue first. Obviously this increases the size of the
stack frame, which makes the transformation more expensive. The patch
does not appear to attempt to estimate costs. However, why do we need to
allocate anything in the first place? If you want to store something
that will be thrown away, just pick an address below the stack pointer.
I think some ports may need different strategies due to stack bias or
red zones, so a target hook is in order, with one safe default to fail,
and one default implementation that can be used by most targets, and
then specialized versions in target-dependent code where necessary:
rtx
default_get_scratchpad_fail (HOST_WIDE_INT size)
{
return NULL_RTX;
}
rtx
default_get_scratchpad (HOST_WIDE_INT size)
{
/* Possibly also take STACK_BOUNDARY into account so as to not
make unaligned locations. */
if (size >= param (SCRATCHPAD_MAX_SIZE))
return NULL_RTX;
return plus_constant (stack_pointer_rtx, gen_int_mode (-size, Pmode));
}
With that, I think all the code to keep track of scratchpads can just be
deleted.
There's this preexisting comment:
/* ??? This is overconservative. Storing to two different mems is
as easy as conditionally computing the address. Storing to a
single mem merely requires a scratch memory to use as one of the
destination addresses; often the memory immediately below the
stack pointer is available for this. */
suggesting that it ought to be possible to generalize the technique to
stores to different addresses.
To the patch itself. The code still has many stylistic problems and does
not follow the required guidelines.
+#include "diagnostic-color.h"
Why?
-/* Return true if a write into MEM may trap or fault. */
+/* Return true if a write into MEM may trap or fault
+ even in the presence of scratchpad support. */
+/* Return true if a write into MEM may trap or fault
+ without scratchpad support. */
Please explain the rationale for these changes. What exactly is
different with scratchpads?
+ /* The next "if": quoting "noce_emit_cmove":
+ If we can't create new pseudos, though, don't bother. */
+ if (reload_completed)
+ return FALSE;
+
+ if (optimize<2)
+ return FALSE;
+
+ if (optimize_function_for_size_p (cfun))
+ return FALSE;
+
+ if (targetm.have_conditional_execution () || ! HAVE_conditional_move)
+ return FALSE;
Merge the conditions into one if. Watch spacing around operators.
+
+ const bool not_a_scratchpad_candidate =
+ noce_mem_write_may_trap_or_fault_p_1 (orig_x);
+ if (! not_a_scratchpad_candidate)
The = should start a line, but what you really should do is just put the
condition into the if and eliminate the variable.
+ const size_t size_of_MEM = MEM_SIZE (orig_x);
Identifiers are still too verbose. This is typically just called size,
or memsize if there are other sizes to keep track of.
+
+ for (rtx_insn* insn = BB_HEAD (then_bb); insn && insn != insn_a
+ && insn != BB_END (then_bb); insn=NEXT_INSN (insn))
+ {
+ if (! (NOTE_INSN_BASIC_BLOCK_P (insn) || DEBUG_INSN_P
(insn)))
There are six different coding style violations in this block. Please
identify and fix them (elsewhere as well). In addition, I think it would
be better to start each part of the for statement on its own line for
clarity.
I still need to figure out what is going on in this insn-copying loop.
> + /* Done copying the needed insns between the start of the
> + THEN block and the set of 'a', if any. */
> +
> + if (CONSTANT_P (XEXP (cond, 0)) && CONSTANT_P (XEXP (cond, 1)))
> + {
> + end_sequence ();
> + return FALSE;
> + }
This should be done earlier before you go to the effort of copying insns.
+ MEM_NOTRAP_P (mem) = true;
So I'm still not entirely sure which cases you are trying to optimize
and which ones not, but couldn't this technique allow a trapping store here?
Bernd