Hello all: Below is my current idea for all floating point insns. For me, it is not the precise implementation, even not completely implement -- assume pack insns can only for packing (u)int32_t when they are used individually:
fsingle_add1 ; return calc flags, save calc result to env. fsingle_sub1 ; return calc flags, save calc result to env. fsingle_addsub2 ; set "has result" flag. fsingle_mul1 ; skip return value, save calc result to env. set "has result" flag. fsingle_mul2 ; skipped. fsingle_pack1 ; skipped. fsingle_pack1 ; if "has result" reset "has result" flag. return calc result from env. else pack srca reference from tilegx.md: float(uns)sisf2. get (u)int32_t a, then (u)int32_to_float32. fdouble_unpack_max: ; skipped. fdouble_unpack_min: ; skipped. fdouble_add_flags: ; return calc flags, save calc result to env. fdouble_sub_flags: ; return calc flags, save calc result to env. fdouble_addsub: ; set "has result" flag. fdouble_mul_flags: ; skip return flags, save calc result to env. set "has result" flag. fdouble_pack1: ; if "has result" reset "has result" flag. return calc result from env. else pack srca and srcb. reference from tilegx.md: float(uns)sidf2. get (u)int32_t a, then (u)int32_to_float64. fdouble_pack2: ; skipped. (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually, e.g gcc testsuit for complex number). Next, I shall implement the floating point insns, welcome any related ideas, suggestions, and completions. Thanks. On 8/5/15 22:16, Chen Gang wrote: > On 8/4/15 23:04, Richard Henderson wrote: >> On 08/04/2015 06:56 AM, Chen Gang wrote: >>> >>> On 8/4/15 04:47, Chen Gang wrote: >>>> On 8/4/15 00:40, Richard Henderson wrote: >>>>> On 08/01/2015 02:47 AM, Chen Gang wrote: >>>>>> I am just adding floating point instructions (e.g. fsingle_add1), >>>>>> but for me, I can not find any details about them (the ISA >>>>>> documents only give a summary description, but not details), e.g. >>>>> >>>>> The tilegx splits the four/six cycle arithmetic into multiple >>>>> black-box instructions. You need only really implement one of the >>>>> four, with the rest of them being implemented as nops or moves. >>>>> >>>>> Looking at what gcc produces gives the hints: >>>>> >>>>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max max, >>>>> srca, >>>>> srcb fdouble_add_flags flg, srca, srcb fdouble_addsub max, >>>>> min, flg >>>>> fdouble_pack1 dst, max, flg fdouble_pack2 dst, >>>>> max, zero >>>>> >>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags >>>>> insn can perform the whole operation, the pack1 insn performs a move >>>>> from "flg" to "dst". >>>>> >>>>> Similarly for the single-precision: >>>>> >>>>> fsingle_add1 tmp, srca, srcb fsingle_addsub2 tmp, >>>>> srca, srcb >>>>> fsingle_pack1 flg, tmp fsingle_pack2 dst, tmp, flg >>>>> >>>>> The add1 insn performs the whole operation, the addsub2 and pack1 >>>>> insns are ignored, and the pack2 insn is a move from tmp to dst. >>>>> >>> >>> After check the tilegx.md completely, for me, we still need implement >>> each of them precisely, or we can not emulate all cases (e.g. muldf3). >> >> No, you can still implement all of muldf3 in fdouble_mul_flags. >> Again, the fdouble_pack1 copies from the flag input to the output. >> >> Yes, there is a 64-bit multiply in there, but the tcg optimizer >> should be able to delete all of that as unused. Especially if you have the >> fdouble_unpack* insns store zero into their destinations. >> > > For me, I am not quite sure. But I guess, what you said should be OK (at > least, what you said is very useful for the implementation). > > >> Don't get me wrong -- more accurate implementation of the actual >> insns would be nice, especially for debugging. But if the insns >> aren't accurately documented I don't see what choice we have. >> > > For me, I guess, we can still try to implement the details. > > - The document has all floating point instructions' summary, so we can > think of, or guess its implementation entirely. > > - gcc uses them all and completely, so it is our good sample and good > reference (but we should not assume gcc must be correct, since we > just use qemu for gcc testsuite). > > - Tilegx floating point format should be standard (at least, reference > to the standard format), so we can reference the related information > from google/baidu. > > >> On the good side, implementing the entire operation as part of the "flags" >> step >> probably results in faster emulation. >> > > I guess so, too. > > > I shall try to finish the simple implementation, firstly. Then try to > implement the floating point instructions in details in the future (it > should be lower priority). > > > Thanks. > -- Chen Gang Open, share, and attitude like air, water, and life which God blessed