Hello all: For me, I guess for single insns, they are simple, and each calculation insns group can not be mixed with each other. So current implementation should be OK.
For double insns, I guess, only mul calculation can be mixed with other calculation groups (add/sub groups or int2float/double groups), because of optimization -- the mul calculation group have many insns. So the implementation is below: /* * Assume floating point mul operation group can mix with other groups. * * fdouble_unpack_max: ; skipped. * * fdouble_unpack_min: ; skipped. * * fdouble_add_flags: ; move calc flags to dest. * save calc flags. * save calc addsub result. * * fdouble_sub_flags: ; move calc flags to dest. * save calc flags. * save calc addsub result. * * fdouble_addsub: ; move calc addsub result to dest. * set "addsub result" flag. * * fdouble_mul_flags: ; move calc mul result to dest. * * fdouble_pack1: ; if addsub result set * && srca == saved addsub result * && srcb == saved calc flags * move srca to dest. * else * move srcb to dest. * * fdouble_pack2: ; if srcb == r63 && "addsub result" flag * reset "addsub result" flag. * else if srcb == r63 * pack srca dest (dest is orig srcb of pack1) * reference from tilegx.md: float(uns)sidf2. * get (u)int32_t a, then (u)int32_to_float64. * else * skipped. */ On 8/11/15 21:18, Chen Gang wrote: > > Oh, it seems a little complex, for a testsuite case, it lets double add > and double mul together! We need save more information for the correct > calculation in pack1. > > It is 20020314-1.exe, the related code (I guess it is correct): > > ... > > fdouble_unpack_max r10, r3, zero > .LVL2: > fdouble_unpack_max r15, r2, zero > fdouble_add_flags r12, r0, r1 > mul_hu_lu r13, r15, r10 > mul_lu_lu r16, r15, r10 > mula_hu_lu r13, r10, r15 > fdouble_unpack_min r11, r0, r1 > { > shli r14, r13, 32 > fdouble_unpack_max r17, r0, r1 > } > { > mul_hu_hu r15, r15, r10 > add r16, r16, r14 > } > { > shrui r13, r13, 32 > fdouble_addsub r17, r11, r12 > } > { > cmpltu r14, r16, r14 > fdouble_mul_flags r3, r2, r3 > } > .LVL3: > { > add r13, r15, r13 > fdouble_pack1 r12, r17, r12 > } > { > add r13, r13, r14 > fdouble_unpack_max r10, r0, zero > } > fdouble_pack1 r3, r13, r3 > fdouble_pack2 r12, r17, zero > fdouble_pack2 r3, r13, r16 > > ... > > Welcome any additional ideas, suggestions and completions. > > Thanks. > > On 8/9/15 09:14, Chen Gang wrote: >> On 8/9/15 09:10, Chen Gang wrote: >>> >>> On 8/9/15 01:23, Chen Gang wrote: >>>> Hello all: >>>> >>>> Below is my current idea for all floating point insns. For me, it is not >>>> the precise implementation, even not completely implement -- assume pack >>>> insns can only for packing (u)int32_t when they are used individually: >>>> >>>> fsingle_add1 ; return calc flags, save calc result to env. >>>> >>>> fsingle_sub1 ; return calc flags, save calc result to env. >>>> >>>> fsingle_addsub2 ; set "has result" flag. >>>> >>>> fsingle_mul1 ; skip return value, save calc result to env. >>>> set "has result" flag. >>>> >>>> fsingle_mul2 ; skipped. >>>> >>>> >>>> fsingle_pack1 ; skipped. >>>> >>>> fsingle_pack1 ; if "has result" >>>> reset "has result" flag. >>>> return calc result from env. >>>> else >>>> pack srca >>>> reference from tilegx.md: float(uns)sisf2. >>>> get (u)int32_t a, then (u)int32_to_float32. >>> >>> For "pack srca and srcb", the related demo like below (srca and srcb >>> are uint64_t): >>> >> >> Oh, sorry, for "pack srca" (not for "pack srca and srcb") >> >>> switch (srca & 0x3ff) { >>> >>> /* treat it as uint32_t */ >>> case 0x9e: >>> return uint32_to_float32(srca >> 32, &FP_STATUS); >>> >>> /* treat it as int32_t, must be negative number */ >>> case 0x29e: >>> return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS); >>> >>> default: >>> unimplemented (gen_exception). >>> } >>> >>>> >>>> fdouble_unpack_max: ; skipped. >>>> >>>> fdouble_unpack_min: ; skipped. >>>> >>>> fdouble_add_flags: ; return calc flags, save calc result to env. >>>> >>>> fdouble_sub_flags: ; return calc flags, save calc result to env. >>>> >>>> fdouble_addsub: ; set "has result" flag. >>>> >>>> fdouble_mul_flags: ; skip return flags, save calc result to env. >>>> set "has result" flag. >>>> >>>> fdouble_pack1: ; if "has result" >>>> reset "has result" flag. >>>> return calc result from env. >>>> else >>>> pack srca and srcb. >>>> reference from tilegx.md: float(uns)sidf2. >>>> get (u)int32_t a, then (u)int32_to_float64. >>>> >>> >>> For "pack srca and srcb", the related demo like below (srca and srcb >>> are uint64_t): >>> >>> switch (srcb & 0xffff) { >>> >> >> Oh, sorry, should use 0xfffff instead of 0xffff. >> >>> /* treat it as uint32_t */ >>> case 0x21b00: >>> return uint32_to_float64(srca >> 4, &FP_STATUS); >>> >>> /* treat it as int32_t, must be negative number */ >>> case 0xa1b00: >>> return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS); >>> >>> default: >>> unimplemented (gen_exception). >>> } >>> >>>> fdouble_pack2: ; skipped. >>>> >>>> >>>> (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually, >>>> e.g gcc testsuit for complex number). >>>> >>>> >>>> Next, I shall implement the floating point insns, welcome any related >>>> ideas, suggestions, and completions. >>>> >>>> Thanks. >>>> >>>> >>>> On 8/5/15 22:16, Chen Gang wrote: >>>>> On 8/4/15 23:04, Richard Henderson wrote: >>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote: >>>>>>> >>>>>>> On 8/4/15 04:47, Chen Gang wrote: >>>>>>>> On 8/4/15 00:40, Richard Henderson wrote: >>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote: >>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1), >>>>>>>>>> but for me, I can not find any details about them (the ISA >>>>>>>>>> documents only give a summary description, but not details), e.g. >>>>>>>>> >>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple >>>>>>>>> black-box instructions. You need only really implement one of the >>>>>>>>> four, with the rest of them being implemented as nops or moves. >>>>>>>>> >>>>>>>>> Looking at what gcc produces gives the hints: >>>>>>>>> >>>>>>>>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max max, >>>>>>>>> srca, >>>>>>>>> srcb fdouble_add_flags flg, srca, srcb fdouble_addsub >>>>>>>>> max, min, flg >>>>>>>>> fdouble_pack1 dst, max, flg fdouble_pack2 dst, >>>>>>>>> max, zero >>>>>>>>> >>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags >>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move >>>>>>>>> from "flg" to "dst". >>>>>>>>> >>>>>>>>> Similarly for the single-precision: >>>>>>>>> >>>>>>>>> fsingle_add1 tmp, srca, srcb fsingle_addsub2 tmp, >>>>>>>>> srca, srcb >>>>>>>>> fsingle_pack1 flg, tmp fsingle_pack2 dst, tmp, flg >>>>>>>>> >>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1 >>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst. >>>>>>>>> >>>>>>> >>>>>>> After check the tilegx.md completely, for me, we still need implement >>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3). >>>>>> >>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags. >>>>>> Again, the fdouble_pack1 copies from the flag input to the output. >>>>>> >>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer >>>>>> should be able to delete all of that as unused. Especially if you have >>>>>> the >>>>>> fdouble_unpack* insns store zero into their destinations. >>>>>> >>>>> >>>>> For me, I am not quite sure. But I guess, what you said should be OK (at >>>>> least, what you said is very useful for the implementation). >>>>> >>>>> >>>>>> Don't get me wrong -- more accurate implementation of the actual >>>>>> insns would be nice, especially for debugging. But if the insns >>>>>> aren't accurately documented I don't see what choice we have. >>>>>> >>>>> >>>>> For me, I guess, we can still try to implement the details. >>>>> >>>>> - The document has all floating point instructions' summary, so we can >>>>> think of, or guess its implementation entirely. >>>>> >>>>> - gcc uses them all and completely, so it is our good sample and good >>>>> reference (but we should not assume gcc must be correct, since we >>>>> just use qemu for gcc testsuite). >>>>> >>>>> - Tilegx floating point format should be standard (at least, reference >>>>> to the standard format), so we can reference the related information >>>>> from google/baidu. >>>>> >>>>> >>>>>> On the good side, implementing the entire operation as part of the >>>>>> "flags" step >>>>>> probably results in faster emulation. >>>>>> >>>>> >>>>> I guess so, too. >>>>> >>>>> >>>>> I shall try to finish the simple implementation, firstly. Then try to >>>>> implement the floating point instructions in details in the future (it >>>>> should be lower priority). >>>>> >>>>> >>>>> Thanks. >>>>> >>>> >>> >> > -- Chen Gang Open, share, and attitude like air, water, and life which God blessed