Oh, it seems a little complex, for a testsuite case, it lets double add and double mul together! We need save more information for the correct calculation in pack1.
It is 20020314-1.exe, the related code (I guess it is correct): ... fdouble_unpack_max r10, r3, zero .LVL2: fdouble_unpack_max r15, r2, zero fdouble_add_flags r12, r0, r1 mul_hu_lu r13, r15, r10 mul_lu_lu r16, r15, r10 mula_hu_lu r13, r10, r15 fdouble_unpack_min r11, r0, r1 { shli r14, r13, 32 fdouble_unpack_max r17, r0, r1 } { mul_hu_hu r15, r15, r10 add r16, r16, r14 } { shrui r13, r13, 32 fdouble_addsub r17, r11, r12 } { cmpltu r14, r16, r14 fdouble_mul_flags r3, r2, r3 } .LVL3: { add r13, r15, r13 fdouble_pack1 r12, r17, r12 } { add r13, r13, r14 fdouble_unpack_max r10, r0, zero } fdouble_pack1 r3, r13, r3 fdouble_pack2 r12, r17, zero fdouble_pack2 r3, r13, r16 ... Welcome any additional ideas, suggestions and completions. Thanks. On 8/9/15 09:14, Chen Gang wrote: > On 8/9/15 09:10, Chen Gang wrote: >> >> On 8/9/15 01:23, Chen Gang wrote: >>> Hello all: >>> >>> Below is my current idea for all floating point insns. For me, it is not >>> the precise implementation, even not completely implement -- assume pack >>> insns can only for packing (u)int32_t when they are used individually: >>> >>> fsingle_add1 ; return calc flags, save calc result to env. >>> >>> fsingle_sub1 ; return calc flags, save calc result to env. >>> >>> fsingle_addsub2 ; set "has result" flag. >>> >>> fsingle_mul1 ; skip return value, save calc result to env. >>> set "has result" flag. >>> >>> fsingle_mul2 ; skipped. >>> >>> >>> fsingle_pack1 ; skipped. >>> >>> fsingle_pack1 ; if "has result" >>> reset "has result" flag. >>> return calc result from env. >>> else >>> pack srca >>> reference from tilegx.md: float(uns)sisf2. >>> get (u)int32_t a, then (u)int32_to_float32. >> >> For "pack srca and srcb", the related demo like below (srca and srcb >> are uint64_t): >> > > Oh, sorry, for "pack srca" (not for "pack srca and srcb") > >> switch (srca & 0x3ff) { >> >> /* treat it as uint32_t */ >> case 0x9e: >> return uint32_to_float32(srca >> 32, &FP_STATUS); >> >> /* treat it as int32_t, must be negative number */ >> case 0x29e: >> return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS); >> >> default: >> unimplemented (gen_exception). >> } >> >>> >>> fdouble_unpack_max: ; skipped. >>> >>> fdouble_unpack_min: ; skipped. >>> >>> fdouble_add_flags: ; return calc flags, save calc result to env. >>> >>> fdouble_sub_flags: ; return calc flags, save calc result to env. >>> >>> fdouble_addsub: ; set "has result" flag. >>> >>> fdouble_mul_flags: ; skip return flags, save calc result to env. >>> set "has result" flag. >>> >>> fdouble_pack1: ; if "has result" >>> reset "has result" flag. >>> return calc result from env. >>> else >>> pack srca and srcb. >>> reference from tilegx.md: float(uns)sidf2. >>> get (u)int32_t a, then (u)int32_to_float64. >>> >> >> For "pack srca and srcb", the related demo like below (srca and srcb >> are uint64_t): >> >> switch (srcb & 0xffff) { >> > > Oh, sorry, should use 0xfffff instead of 0xffff. > >> /* treat it as uint32_t */ >> case 0x21b00: >> return uint32_to_float64(srca >> 4, &FP_STATUS); >> >> /* treat it as int32_t, must be negative number */ >> case 0xa1b00: >> return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS); >> >> default: >> unimplemented (gen_exception). >> } >> >>> fdouble_pack2: ; skipped. >>> >>> >>> (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually, >>> e.g gcc testsuit for complex number). >>> >>> >>> Next, I shall implement the floating point insns, welcome any related >>> ideas, suggestions, and completions. >>> >>> Thanks. >>> >>> >>> On 8/5/15 22:16, Chen Gang wrote: >>>> On 8/4/15 23:04, Richard Henderson wrote: >>>>> On 08/04/2015 06:56 AM, Chen Gang wrote: >>>>>> >>>>>> On 8/4/15 04:47, Chen Gang wrote: >>>>>>> On 8/4/15 00:40, Richard Henderson wrote: >>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote: >>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1), >>>>>>>>> but for me, I can not find any details about them (the ISA >>>>>>>>> documents only give a summary description, but not details), e.g. >>>>>>>> >>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple >>>>>>>> black-box instructions. You need only really implement one of the >>>>>>>> four, with the rest of them being implemented as nops or moves. >>>>>>>> >>>>>>>> Looking at what gcc produces gives the hints: >>>>>>>> >>>>>>>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max max, >>>>>>>> srca, >>>>>>>> srcb fdouble_add_flags flg, srca, srcb fdouble_addsub max, >>>>>>>> min, flg >>>>>>>> fdouble_pack1 dst, max, flg fdouble_pack2 dst, >>>>>>>> max, zero >>>>>>>> >>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags >>>>>>>> insn can perform the whole operation, the pack1 insn performs a move >>>>>>>> from "flg" to "dst". >>>>>>>> >>>>>>>> Similarly for the single-precision: >>>>>>>> >>>>>>>> fsingle_add1 tmp, srca, srcb fsingle_addsub2 tmp, >>>>>>>> srca, srcb >>>>>>>> fsingle_pack1 flg, tmp fsingle_pack2 dst, tmp, flg >>>>>>>> >>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1 >>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst. >>>>>>>> >>>>>> >>>>>> After check the tilegx.md completely, for me, we still need implement >>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3). >>>>> >>>>> No, you can still implement all of muldf3 in fdouble_mul_flags. >>>>> Again, the fdouble_pack1 copies from the flag input to the output. >>>>> >>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer >>>>> should be able to delete all of that as unused. Especially if you have >>>>> the >>>>> fdouble_unpack* insns store zero into their destinations. >>>>> >>>> >>>> For me, I am not quite sure. But I guess, what you said should be OK (at >>>> least, what you said is very useful for the implementation). >>>> >>>> >>>>> Don't get me wrong -- more accurate implementation of the actual >>>>> insns would be nice, especially for debugging. But if the insns >>>>> aren't accurately documented I don't see what choice we have. >>>>> >>>> >>>> For me, I guess, we can still try to implement the details. >>>> >>>> - The document has all floating point instructions' summary, so we can >>>> think of, or guess its implementation entirely. >>>> >>>> - gcc uses them all and completely, so it is our good sample and good >>>> reference (but we should not assume gcc must be correct, since we >>>> just use qemu for gcc testsuite). >>>> >>>> - Tilegx floating point format should be standard (at least, reference >>>> to the standard format), so we can reference the related information >>>> from google/baidu. >>>> >>>> >>>>> On the good side, implementing the entire operation as part of the >>>>> "flags" step >>>>> probably results in faster emulation. >>>>> >>>> >>>> I guess so, too. >>>> >>>> >>>> I shall try to finish the simple implementation, firstly. Then try to >>>> implement the floating point instructions in details in the future (it >>>> should be lower priority). >>>> >>>> >>>> Thanks. >>>> >>> >> > -- Chen Gang Open, share, and attitude like air, water, and life which God blessed