On 8/18/15 01:31, Richard Henderson wrote: > On 08/15/2015 11:16 AM, Chen Gang wrote: > >> But what you said is really quite valuable to me!! we can treat the flag >> as a caller saved context, then can let the caller can use callee freely >> (in fact, I guess, the real hardware treats it as caller context, too). >> >> - we have to define the flag format based on the existing format in the >> related docs and tilegx.md (reserve 0-20 and 25-31 bits). >> >> - We can only use 21-24 for mark addsub, mul, or typecast result. If >> 21-24 bits are all zero, it means typecast result. For fsingle: 32-63 >> bits is the input integer; for fdouble: srca is the input integer. > > Plausible. > >> >> - For addsub and mul result, we use 32-63 bits for an index of resource >> handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1, >> fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free. > > No, that's a bad idea. No state external to the inputs to the insns. >
We can use 21-24 bits for the state external to the inputs to the insns. My idea is below: /* * Single floaing point instructions decription. * * - fsingle_add1, fsingle_sub1, and fsingle_pack1/2 can be used individually. * * - when fsingle_pack1/2 is used individually, it is for type cast. * * - the old 4Kth result is alrealy useless for caller. * * fsingle_add1 ; make context and calc result from rsrca and rsrcb. * ; save result in roundup array, and add index to context. * ; move context to rdst. * * fsingle_sub1 ; make context and calc result from rsrca and rsrcb. * ; save result in roundup array, and add index to context. * ; move context to rdst. * * fsingle_addsub2 ; skipped. * * fsingle_mul1 ; make context and calc result from rsrca and srcb. * ; save result in roundup array, and add index to context. * ; move context to rdst. * * fsingle_mul2 ; move rsrca to rdst. * * fsingle_pack1 ; skipped. * * fsingle_pack2 ; get context from rsrca (rsrca is context). * ; if context for add/sub/mul * ; get result from roundup array based on index. * ; move result to rdst. * ; else * ; get (u)int32_t interger from context, * ; (u)int32_to_float32. */ /* * Double floating point instructions' description. * * - fdouble_add_flags, fdouble_sub_flags, and fdouble_pack1/2 can be used * individually. * * - when fdouble_pack1/2 is used individually, it is for type cast. * * - the old 4Kth result is alrealy useless for caller. * * fdouble_unpack_max: ; skipped. * * fdouble_unpack_min: ; skipped. * * fdouble_add_flags: ; make context and calc result from rsrca and rsrcb. * ; save result in roundup array, and add index to context. * ; move context to rdst. * * fdouble_sub_flags: ; make context and calc result from rsrca and rsrcb. * ; save result in roundup array, and add index to context. * ; move context to rdst. * * fdouble_addsub: ; skipped. * * fdouble_mul_flags: ; make context and calc result from rsrca and rsrcb. * ; save result in roundup array, and add index to context. * ; move context to rdst. * * fdouble_pack1: ; get context from rsrcb. * ; if context for add/sub/mul * ; get result from roundup array based on index. * ; move result to rdst. * ; else * ; get (u)int32_t interger from rsrca * ; (u)int32_to_float64. * * fdouble_pack2: ; skipped. */ #define TILEGX_F_COUNT 0x1000 /* Maximized results count for fdouble */ #define TILEGX_F_DUINT 0x21b00 /* exp is for uint32_t to double */ #define TILEGX_F_DINT 0xa1b00 /* exp is for int32_t to double */ #define TILEGX_F_SUINT 0x9e /* exp is for uint32_t to single */ #define TILEGX_F_SINT 0x29e /* exp is for int32_t to single */ #define TILEGX_F_TCAST 0 /* Result type is for typecast, MUST BE 0 */ #define TILEGX_F_TCALC 1 /* Result type is for add/sub/mul */ #pragma pack(push, 1) typedef struct TileGXFPCtx { /* According to float(uns)sisf2 and float(uns)sidf2 in gcc tilegx.md */ uint64_t exp : 20; /* Exponent, for TILEGX_F_(D/S)(U)INT */ /* Context type, defined and used by callee */ uint64_t type : 5; /* For TILEGX_F_T(CAST/CALC) */ /* Come from TILE-Gx ISA document, Table 7-2 for floating point */ uint64_t unordered : 1; /* The two are unordered */ uint64_t lt : 1; /* 1st is less than 2nd */ uint64_t le : 1; /* 1st is less than or equal to 2nd */ uint64_t gt : 1; /* 1st is greater than 2nd */ uint64_t ge : 1; /* 1st is greater than or equal to 2nd */ uint64_t eq : 1; /* The two operands are equal */ uint64_t neq : 1; /* The two operands are not equal */ /* Result data according to the context type */ uint64_t data : 32; /* The explanation is below */ #if 0 /* This is the explanation for 'data' above */ union { uint32_t idx; /* Index for the add/sub/mul result */ uint32_t aint; /* Absolute input integer for fsingle typecast */ /* * There is no input integer for fdouble typecast in context, it is in * rsrca parameter of fdouble_pack1 instruction. */ }; #endif } TileGXFPCtx; #pragma pack(pop) typedef struct FPUTLGState { float_status fp_status; /* floating point status */ int pos32; /* Current position for fsingle result */ int pos64; /* Current position for fdouble result */ float32 val32s[TILEGX_F_COUNT]; /* results roudup array for fsingle */ float64 val64s[TILEGX_F_COUNT]; /* results roudup array for fdouble */ } FPUTLGState; > > It really would be nice if we had the same documentation that was used > to implement the gcc backend. Otherwise we have to rely on guesswork. > > For single-precision it appears that the format is > > 63 31 24 10 9 0 > [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ] > > We are able to deduce the bias for the exponent based on the input gcc gives > us > for floatunssisf: 0x9e == 2**31 when the mantissa is normalized. > > So: > > fsingle_add1, fsingle_sub1: Perform the operation. Split the result > such that all of the fields above are filled in. > > fsingle_mul1: Perform the operation. Split the result such that all > of the fields above except for cmp-flags are filled in. > > fsingle_addsub2: Nop. > fsingle_mul2: Move srca to dest. > > fsingle_pack1: Normalize and repack the above. In the add/sub/mul case, > no normalization will be required, so no change to the result occurs. > > In the floatunssisf2 case, the input implicit bit may not be set, and > guard bits may be set, so real rounding and normalization must occur, > adjusting the exponent constructed by gcc in building the flags. > > For double-precision things are more complicated. Precisely because there is > no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal > 128-bit integer multiplication on the mantissa. > > For double-precision it appears that the format is > > 63 57 4 0 > unpack [ overflow bits? | mantissa with implicit bit | guard bits ] > > 63 31 24 20 19 8 0 > flags [ ?? | cmp flags | ?? | s | exp | ?? ] > > Similarly we can compute the bias for exp as 0x21b == 2**53. > Or is it 20 bits of exponent and 0x21b00 == 2**53? > > So: > > fdouble_unpack_max, fdouble_unpack_min: Perform the operation as described, > extracting the mantissa of the min/max absolute value. > > fdouble_add_flags, fdouble_sub_flags: Extract the signs and exponent of the > sources, and compute the sign and exponent of the result. Set a bit, > presumably one of [24:21] that tell fdouble_addsub whether to perform > addition or subtraction. Set the comparison flags. > > fdouble_mul_flags: Extract the signs and exponent of the sources, and > compute > the sign and exponent of the result. Note that the result of the 128-bit > multiplication is guaranteed to be non-normalized : the 2 57-bit inputs will > produce a 114-bit intermediate result. Which means that bits [63:51] are > guaranteed to be zero on entry to the pack stages. Which means that some > bias will need to be applied to the intermediate exponent. > > fdouble_addsub: Add or subtract the mantissas based on a bit in flags. > > fdouble_pack1: Move flags (srcb) to result (dest). > fdouble_pack2: Take the 128-bit mantissa of srca+srcb, the flags of dest, > and normalize and pack the result. > OK, thanks, what you said above sounds reasonable. It is more precise than my current implementation (but it is also a little more complex). For me, if my current implementation can not pass gcc testsuite (I guess not), I shall try to implement what you said above, next. Thanks. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed