Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

Joern Wolfgang Rennecke Wed, 03 Feb 2016 10:42:46 -0800


On 03/02/16 15:02, Claudiu Zissulescu wrote:

First, I will split this patch in two. The first part will deal with the FPU 
instructions. The second patch, will try to address a new abi optimized for 
odd-even registers as the comments for the mabi=optimized are numerous and I 
need to carefully prepare for an answer.
The remaining of this email will focus on FPU patch.

+      case EQ:
+      case NE:
+      case UNORDERED:
+      case UNLT:
+      case UNLE:
+      case UNGT:
+      case UNGE:
+       return CC_FPUmode;
+
+      case LT:
+      case LE:
+      case GT:
+      case GE:
+      case ORDERED:
+       return CC_FPUEmode;

cse and other code transformations are likely to do better if you use
just one mode for these.  It is also very odd to have comparisons and their
inverse use different modes.  Have you done any benchmarking for this?

Right, the ORDERED should be in CC_FPUmode. An inspiration point for 
CC_FPU/CC_FPUE mode is the arm port.

I can't see how this code in the arm can actually work correctly. When,for instance, combine simplifiesa comparison, the comparison code can change, and it will useSELECT_CC_MODE to find a newmode for the comparison. Thus, if a comparison traps or not on qNaNswill depend on the whims

of combine.

Also, the the trapping comparisons don't show the side effect oftrapping on qNaNs, which means

they can be speculated.

To make the trapping comparisons safe, they should display the sideeffect in the rtl, and only

be used when requested by options, type attributes, pragmas etc.

They could almost be safe to use be default for -ffinite-math-only,except that when the frontend knowshow to tell qNaNs and sNaNs apart, and speculates a comparison aftersome integer/fp mixed computation when it can infer that no sNaN willoccur, you could still get an unexpected signal.

  The reason why having the two CC_FPU and CC_FPUE modes is to emit signaling 
FPU compare instructions.

I don't know if your compare instructions are signalling for quiet NaNs(I hope they're not),but the mode of the comparison result shouldn't be used to distinguishthat - it's not safe,

see above.

The purpose of the mode of the result is to distinguish differentinterpretations for the bit

patterns inside the comparison result flags.

   We can use a single CC_FPU mode here instead of two, but we may lose 
functionality.

Can you define what that functionality actually is, and show some simpletest code

to demonstrate how it works with your port  extension?

Regarding benchmarks, I do not have an establish benchmark for this, however, 
as far as I could see the code generated for FPU looks clean.
Please let me know if it is acceptable to go with CC_FPU/CC_FPUE, and ORDERED 
fix further on.

No, there should be a discernible and achievable purpose for comparisonmodes.Which you have not demonstrated yet so far for the CC_FPU/CC_FPUEdichotomy.

  Or, to have a single mode.

Yes.

+  /* ARCHS has 64-bit data-path which makes use of the even-odd paired
+     registers.  */
+  if (TARGET_HS)
+    {
+      for (regno = 1; regno < 32; regno +=2)
+       {
+         arc_hard_regno_mode_ok[regno] = S_MODES;
+       }
+    }
+

Does TARGET_HS with -mabi=default allow for passing DFmode / DImode
arguments
in odd registers?  I fear you might run into reload trouble when trying to
access the values.

Although, I haven't bump into this issue until now, I do not say it may not 
happen. Hence, I would create a new register class to hold the odd-even 
registers. Hence the above code will not be needed. What do u say?

That would have been possible a couple of years ago, but these days, allthe constituentregisters of a multi-reg hard register have to be in a constraint'sregister class for the

constraint to match.

You could fudge this by using two classes for two-reg allocations, onewith 0,1, 4,5, 8,9 ... the otherwith 2,3, 6,7, 10,11 ... , but then you need another pair forfour-register allocations, and maybe youwant to add various union and intersection classes, and the registerallocators are rather rubbiswhen it comes to balance allocations between classes of similar size andutility, so you should

really try to avoid this.

A way to avoid this is not to give the option of using the old ABI whileenforcing alignment in registers.Or you could use a different mode for the argument passing when it endsup unaligned; I supposeBLKmode should work, using a vector to designate the constituentregisters of the function argument.

+(define_insn "*cmpsf_trap_fpu"

That name makes as little sense to me as having two separate modes
CC_FPU and CC_FPUE
for positive / negated usage and having two comparison patterns pre
precision that
do the same but pretend to be dissimilar.

The F{S/D}CMPF instruction is similar to the F{S/D}CMP instruction

Oops. I missed the 'f' suffix. So the "*trap_fpu" patterns really aredifferent...

  in cases when either of the instruction operands is a signaling NaN. The 
FxCMPF instruction updates the invalid flag (FPU_STATUS.IV) when either of the 
operands is a quiet or signaling NaN, whereas, the FxCMP instruction updates 
the invalid flag (FPU_STATUS.IV) only when either of the operands is a quiet 
NaN. We need to use the FxCMPF only if we keep the CC_FPU an CC_FPUE otherwise, 
we shall use only FxCMP instruction.

... and have different side effects not mentioned in the rtl.
When the invalid flag is set, does that mean the CPU gets an FPU exception?

So quiet NaNs always signal, and signalling NaNs only when used asoperands of FxCMPF?


You might need to resort to software floating point comparisons then unless
-ffinite-math-only is in effect.

Is this better?

#define TARGET_FP_SP_BASE   ((arc_fpu_build & FPU_SP) != 0)
#define TARGET_FP_DP_BASE   ((arc_fpu_build & FPU_DP) != 0)
#define TARGET_FP_SP_FUSED  ((arc_fpu_build & FPU_SF) != 0)
#define TARGET_FP_DP_FUSED  ((arc_fpu_build & FPU_DF) != 0)
#define TARGET_FP_SP_CONV   ((arc_fpu_build & FPU_SC) != 0)
#define TARGET_FP_DP_CONV   ((arc_fpu_build & FPU_DC) != 0)
#define TARGET_FP_SP_SQRT   ((arc_fpu_build & FPU_SD) != 0)
#define TARGET_FP_DP_SQRT   ((arc_fpu_build & FPU_DD) != 0)
#define TARGET_FP_DP_AX     ((arc_fpu_build & FPX_DP) != 0)

Yes.

Re: [PATCH] [ARC] Add single/double IEEE precission FPU support.

Reply via email to