On 03/02/16 15:02, Claudiu Zissulescu wrote:
First, I will split this patch in two. The first part will deal with the FPU
instructions. The second patch, will try to address a new abi optimized for
odd-even registers as the comments for the mabi=optimized are numerous and I
need to carefully prepare for an answer.
The remaining of this email will focus on FPU patch.
+ case EQ:
+ case NE:
+ case UNORDERED:
+ case UNLT:
+ case UNLE:
+ case UNGT:
+ case UNGE:
+ return CC_FPUmode;
+
+ case LT:
+ case LE:
+ case GT:
+ case GE:
+ case ORDERED:
+ return CC_FPUEmode;
cse and other code transformations are likely to do better if you use
just one mode for these. It is also very odd to have comparisons and their
inverse use different modes. Have you done any benchmarking for this?
Right, the ORDERED should be in CC_FPUmode. An inspiration point for
CC_FPU/CC_FPUE mode is the arm port.
I can't see how this code in the arm can actually work correctly. When,
for instance, combine simplifies
a comparison, the comparison code can change, and it will use
SELECT_CC_MODE to find a new
mode for the comparison. Thus, if a comparison traps or not on qNaNs
will depend on the whims
of combine.
Also, the the trapping comparisons don't show the side effect of
trapping on qNaNs, which means
they can be speculated.
To make the trapping comparisons safe, they should display the side
effect in the rtl, and only
be used when requested by options, type attributes, pragmas etc.
They could almost be safe to use be default for -ffinite-math-only,
except that when the frontend knows
how to tell qNaNs and sNaNs apart, and speculates a comparison after
some integer/fp mixed computation when it can infer that no sNaN will
occur, you could still get an unexpected signal.
The reason why having the two CC_FPU and CC_FPUE modes is to emit signaling
FPU compare instructions.
I don't know if your compare instructions are signalling for quiet NaNs
(I hope they're not),
but the mode of the comparison result shouldn't be used to distinguish
that - it's not safe,
see above.
The purpose of the mode of the result is to distinguish different
interpretations for the bit
patterns inside the comparison result flags.
We can use a single CC_FPU mode here instead of two, but we may lose
functionality.
Can you define what that functionality actually is, and show some simple
test code
to demonstrate how it works with your port extension?
Regarding benchmarks, I do not have an establish benchmark for this, however,
as far as I could see the code generated for FPU looks clean.
Please let me know if it is acceptable to go with CC_FPU/CC_FPUE, and ORDERED
fix further on.
No, there should be a discernible and achievable purpose for comparison
modes.
Which you have not demonstrated yet so far for the CC_FPU/CC_FPUE
dichotomy.
Or, to have a single mode.
Yes.
+ /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+ registers. */
+ if (TARGET_HS)
+ {
+ for (regno = 1; regno < 32; regno +=2)
+ {
+ arc_hard_regno_mode_ok[regno] = S_MODES;
+ }
+ }
+
Does TARGET_HS with -mabi=default allow for passing DFmode / DImode
arguments
in odd registers? I fear you might run into reload trouble when trying to
access the values.
Although, I haven't bump into this issue until now, I do not say it may not
happen. Hence, I would create a new register class to hold the odd-even
registers. Hence the above code will not be needed. What do u say?
That would have been possible a couple of years ago, but these days, all
the constituent
registers of a multi-reg hard register have to be in a constraint's
register class for the
constraint to match.
You could fudge this by using two classes for two-reg allocations, one
with 0,1, 4,5, 8,9 ... the other
with 2,3, 6,7, 10,11 ... , but then you need another pair for
four-register allocations, and maybe you
want to add various union and intersection classes, and the register
allocators are rather rubbis
when it comes to balance allocations between classes of similar size and
utility, so you should
really try to avoid this.
A way to avoid this is not to give the option of using the old ABI while
enforcing alignment in registers.
Or you could use a different mode for the argument passing when it ends
up unaligned; I suppose
BLKmode should work, using a vector to designate the constituent
registers of the function argument.
+(define_insn "*cmpsf_trap_fpu"
That name makes as little sense to me as having two separate modes
CC_FPU and CC_FPUE
for positive / negated usage and having two comparison patterns pre
precision that
do the same but pretend to be dissimilar.
The F{S/D}CMPF instruction is similar to the F{S/D}CMP instruction
Oops. I missed the 'f' suffix. So the "*trap_fpu" patterns really are
different...
in cases when either of the instruction operands is a signaling NaN. The
FxCMPF instruction updates the invalid flag (FPU_STATUS.IV) when either of the
operands is a quiet or signaling NaN, whereas, the FxCMP instruction updates
the invalid flag (FPU_STATUS.IV) only when either of the operands is a quiet
NaN. We need to use the FxCMPF only if we keep the CC_FPU an CC_FPUE otherwise,
we shall use only FxCMP instruction.
... and have different side effects not mentioned in the rtl.
When the invalid flag is set, does that mean the CPU gets an FPU exception?
So quiet NaNs always signal, and signalling NaNs only when used as
operands of FxCMPF?
You might need to resort to software floating point comparisons then unless
-ffinite-math-only is in effect.
Is this better?
#define TARGET_FP_SP_BASE ((arc_fpu_build & FPU_SP) != 0)
#define TARGET_FP_DP_BASE ((arc_fpu_build & FPU_DP) != 0)
#define TARGET_FP_SP_FUSED ((arc_fpu_build & FPU_SF) != 0)
#define TARGET_FP_DP_FUSED ((arc_fpu_build & FPU_DF) != 0)
#define TARGET_FP_SP_CONV ((arc_fpu_build & FPU_SC) != 0)
#define TARGET_FP_DP_CONV ((arc_fpu_build & FPU_DC) != 0)
#define TARGET_FP_SP_SQRT ((arc_fpu_build & FPU_SD) != 0)
#define TARGET_FP_DP_SQRT ((arc_fpu_build & FPU_DD) != 0)
#define TARGET_FP_DP_AX ((arc_fpu_build & FPX_DP) != 0)
Yes.