On 08/04/2015 06:56 AM, Chen Gang wrote: > > On 8/4/15 04:47, Chen Gang wrote: >> On 8/4/15 00:40, Richard Henderson wrote: >>> On 08/01/2015 02:47 AM, Chen Gang wrote: >>>> I am just adding floating point instructions (e.g. fsingle_add1), >>>> but for me, I can not find any details about them (the ISA >>>> documents only give a summary description, but not details), e.g. >>> >>> The tilegx splits the four/six cycle arithmetic into multiple >>> black-box instructions. You need only really implement one of the >>> four, with the rest of them being implemented as nops or moves. >>> >>> Looking at what gcc produces gives the hints: >>> >>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max max, srca, >>> srcb fdouble_add_flags flg, srca, srcb fdouble_addsub max, >>> min, flg >>> fdouble_pack1 dst, max, flg fdouble_pack2 dst, >>> max, zero >>> >>> The unpack, addsub, and pack2 insns can be ignored, the add_flags >>> insn can perform the whole operation, the pack1 insn performs a move >>> from "flg" to "dst". >>> >>> Similarly for the single-precision: >>> >>> fsingle_add1 tmp, srca, srcb fsingle_addsub2 tmp, >>> srca, srcb >>> fsingle_pack1 flg, tmp fsingle_pack2 dst, tmp, flg >>> >>> The add1 insn performs the whole operation, the addsub2 and pack1 >>> insns are ignored, and the pack2 insn is a move from tmp to dst. >>> > > After check the tilegx.md completely, for me, we still need implement > each of them precisely, or we can not emulate all cases (e.g. muldf3).
No, you can still implement all of muldf3 in fdouble_mul_flags. Again, the fdouble_pack1 copies from the flag input to the output. Yes, there is a 64-bit multiply in there, but the tcg optimizer should be able to delete all of that as unused. Especially if you have the fdouble_unpack* insns store zero into their destinations. Don't get me wrong -- more accurate implementation of the actual insns would be nice, especially for debugging. But if the insns aren't accurately documented I don't see what choice we have. On the good side, implementing the entire operation as part of the "flags" step probably results in faster emulation. r~