On 08/15/2015 11:16 AM, Chen Gang wrote: > OK, thanks, but for float(uns)sisf2 and float(uns)sidf2, we can not only > simply move. :-(
Oh yes, I see that now. Unfortunate. > But what you said is really quite valuable to me!! we can treat the flag > as a caller saved context, then can let the caller can use callee freely > (in fact, I guess, the real hardware treats it as caller context, too). > > - we have to define the flag format based on the existing format in the > related docs and tilegx.md (reserve 0-20 and 25-31 bits). > > - We can only use 21-24 for mark addsub, mul, or typecast result. If > 21-24 bits are all zero, it means typecast result. For fsingle: 32-63 > bits is the input integer; for fdouble: srca is the input integer. Plausible. > > - For addsub and mul result, we use 32-63 bits for an index of resource > handler (like 'fd' returned by open). fsingle_addsub2, fsingle_mul1, > fdouble_mul_flags, fdouble_addsub allocate resource, and pack1 free. No, that's a bad idea. No state external to the inputs to the insns. It really would be nice if we had the same documentation that was used to implement the gcc backend. Otherwise we have to rely on guesswork. For single-precision it appears that the format is 63 31 24 10 9 0 [ mantissa with implicit and guard bits | cmp flags | ?? | s | exp ] We are able to deduce the bias for the exponent based on the input gcc gives us for floatunssisf: 0x9e == 2**31 when the mantissa is normalized. So: fsingle_add1, fsingle_sub1: Perform the operation. Split the result such that all of the fields above are filled in. fsingle_mul1: Perform the operation. Split the result such that all of the fields above except for cmp-flags are filled in. fsingle_addsub2: Nop. fsingle_mul2: Move srca to dest. fsingle_pack1: Normalize and repack the above. In the add/sub/mul case, no normalization will be required, so no change to the result occurs. In the floatunssisf2 case, the input implicit bit may not be set, and guard bits may be set, so real rounding and normalization must occur, adjusting the exponent constructed by gcc in building the flags. For double-precision things are more complicated. Precisely because there is no dedicated fdouble_mul[1-4] instructions, but instead gcc is to use a normal 128-bit integer multiplication on the mantissa. For double-precision it appears that the format is 63 57 4 0 unpack [ overflow bits? | mantissa with implicit bit | guard bits ] 63 31 24 20 19 8 0 flags [ ?? | cmp flags | ?? | s | exp | ?? ] Similarly we can compute the bias for exp as 0x21b == 2**53. Or is it 20 bits of exponent and 0x21b00 == 2**53? So: fdouble_unpack_max, fdouble_unpack_min: Perform the operation as described, extracting the mantissa of the min/max absolute value. fdouble_add_flags, fdouble_sub_flags: Extract the signs and exponent of the sources, and compute the sign and exponent of the result. Set a bit, presumably one of [24:21] that tell fdouble_addsub whether to perform addition or subtraction. Set the comparison flags. fdouble_mul_flags: Extract the signs and exponent of the sources, and compute the sign and exponent of the result. Note that the result of the 128-bit multiplication is guaranteed to be non-normalized : the 2 57-bit inputs will produce a 114-bit intermediate result. Which means that bits [63:51] are guaranteed to be zero on entry to the pack stages. Which means that some bias will need to be applied to the intermediate exponent. fdouble_addsub: Add or subtract the mantissas based on a bit in flags. fdouble_pack1: Move flags (srcb) to result (dest). fdouble_pack2: Take the 128-bit mantissa of srca+srcb, the flags of dest, and normalize and pack the result. r~