On 8/4/15 23:04, Richard Henderson wrote: > On 08/04/2015 06:56 AM, Chen Gang wrote: >> >> On 8/4/15 04:47, Chen Gang wrote: >>> On 8/4/15 00:40, Richard Henderson wrote: >>>> On 08/01/2015 02:47 AM, Chen Gang wrote: >>>>> I am just adding floating point instructions (e.g. fsingle_add1), >>>>> but for me, I can not find any details about them (the ISA >>>>> documents only give a summary description, but not details), e.g. >>>> >>>> The tilegx splits the four/six cycle arithmetic into multiple >>>> black-box instructions. You need only really implement one of the >>>> four, with the rest of them being implemented as nops or moves. >>>> >>>> Looking at what gcc produces gives the hints: >>>> >>>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max max, srca, >>>> srcb fdouble_add_flags flg, srca, srcb fdouble_addsub max, >>>> min, flg >>>> fdouble_pack1 dst, max, flg fdouble_pack2 dst, >>>> max, zero >>>> >>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags >>>> insn can perform the whole operation, the pack1 insn performs a move >>>> from "flg" to "dst". >>>> >>>> Similarly for the single-precision: >>>> >>>> fsingle_add1 tmp, srca, srcb fsingle_addsub2 tmp, >>>> srca, srcb >>>> fsingle_pack1 flg, tmp fsingle_pack2 dst, tmp, flg >>>> >>>> The add1 insn performs the whole operation, the addsub2 and pack1 >>>> insns are ignored, and the pack2 insn is a move from tmp to dst. >>>> >> >> After check the tilegx.md completely, for me, we still need implement >> each of them precisely, or we can not emulate all cases (e.g. muldf3). > > No, you can still implement all of muldf3 in fdouble_mul_flags. > Again, the fdouble_pack1 copies from the flag input to the output. > > Yes, there is a 64-bit multiply in there, but the tcg optimizer > should be able to delete all of that as unused. Especially if you have the > fdouble_unpack* insns store zero into their destinations. >
For me, I am not quite sure. But I guess, what you said should be OK (at least, what you said is very useful for the implementation). > Don't get me wrong -- more accurate implementation of the actual > insns would be nice, especially for debugging. But if the insns > aren't accurately documented I don't see what choice we have. > For me, I guess, we can still try to implement the details. - The document has all floating point instructions' summary, so we can think of, or guess its implementation entirely. - gcc uses them all and completely, so it is our good sample and good reference (but we should not assume gcc must be correct, since we just use qemu for gcc testsuite). - Tilegx floating point format should be standard (at least, reference to the standard format), so we can reference the related information from google/baidu. > On the good side, implementing the entire operation as part of the "flags" > step > probably results in faster emulation. > I guess so, too. I shall try to finish the simple implementation, firstly. Then try to implement the floating point instructions in details in the future (it should be lower priority). Thanks. -- Chen Gang Open, share, and attitude like air, water, and life which God blessed