On 8/13/15 22:59, Chen Gang wrote: > Hello all: > > For me, I guess for single insns, they are simple, and each calculation > insns group can not be mixed with each other. So current implementation > should be OK. > > For double insns, I guess, only mul calculation can be mixed with other > calculation groups (add/sub groups or int2float/double groups), because > of optimization -- the mul calculation group have many insns. >
Oh, we are unlucky, after continue gcc testsuite, add/sub floating point insns also can be mixed together! The related C code, -save-temps, and objdump files are in attachments (is it gcc's issue? I guess not). So, I guess, we have to 'crack' all floating point insns, precisely, or we can not pass gcc testsuite. At present, for me, I shall try to fix another issues which are found by gcc testsuite, at last 'crack' the floating point insns. I guess, I can not finish it in this month (I shall try to finish in the next month). Thanks. > So the implementation is below: > > /* > * Assume floating point mul operation group can mix with other groups. > * > * fdouble_unpack_max: ; skipped. > * > * fdouble_unpack_min: ; skipped. > * > * fdouble_add_flags: ; move calc flags to dest. > * save calc flags. > * save calc addsub result. > * > * fdouble_sub_flags: ; move calc flags to dest. > * save calc flags. > * save calc addsub result. > * > * fdouble_addsub: ; move calc addsub result to dest. > * set "addsub result" flag. > * > * fdouble_mul_flags: ; move calc mul result to dest. > * > * fdouble_pack1: ; if addsub result set > * && srca == saved addsub result > * && srcb == saved calc flags > * move srca to dest. > * else > * move srcb to dest. > * > * fdouble_pack2: ; if srcb == r63 && "addsub result" flag > * reset "addsub result" flag. > * else if srcb == r63 > * pack srca dest (dest is orig srcb of pack1) > * reference from tilegx.md: float(uns)sidf2. > * get (u)int32_t a, then (u)int32_to_float64. > * else > * skipped. > */ > > > On 8/11/15 21:18, Chen Gang wrote: >> >> Oh, it seems a little complex, for a testsuite case, it lets double add >> and double mul together! We need save more information for the correct >> calculation in pack1. >> >> It is 20020314-1.exe, the related code (I guess it is correct): >> >> ... >> >> fdouble_unpack_max r10, r3, zero >> .LVL2: >> fdouble_unpack_max r15, r2, zero >> fdouble_add_flags r12, r0, r1 >> mul_hu_lu r13, r15, r10 >> mul_lu_lu r16, r15, r10 >> mula_hu_lu r13, r10, r15 >> fdouble_unpack_min r11, r0, r1 >> { >> shli r14, r13, 32 >> fdouble_unpack_max r17, r0, r1 >> } >> { >> mul_hu_hu r15, r15, r10 >> add r16, r16, r14 >> } >> { >> shrui r13, r13, 32 >> fdouble_addsub r17, r11, r12 >> } >> { >> cmpltu r14, r16, r14 >> fdouble_mul_flags r3, r2, r3 >> } >> .LVL3: >> { >> add r13, r15, r13 >> fdouble_pack1 r12, r17, r12 >> } >> { >> add r13, r13, r14 >> fdouble_unpack_max r10, r0, zero >> } >> fdouble_pack1 r3, r13, r3 >> fdouble_pack2 r12, r17, zero >> fdouble_pack2 r3, r13, r16 >> >> ... >> >> Welcome any additional ideas, suggestions and completions. >> >> Thanks. >> >> On 8/9/15 09:14, Chen Gang wrote: >>> On 8/9/15 09:10, Chen Gang wrote: >>>> >>>> On 8/9/15 01:23, Chen Gang wrote: >>>>> Hello all: >>>>> >>>>> Below is my current idea for all floating point insns. For me, it is not >>>>> the precise implementation, even not completely implement -- assume pack >>>>> insns can only for packing (u)int32_t when they are used individually: >>>>> >>>>> fsingle_add1 ; return calc flags, save calc result to env. >>>>> >>>>> fsingle_sub1 ; return calc flags, save calc result to env. >>>>> >>>>> fsingle_addsub2 ; set "has result" flag. >>>>> >>>>> fsingle_mul1 ; skip return value, save calc result to env. >>>>> set "has result" flag. >>>>> >>>>> fsingle_mul2 ; skipped. >>>>> >>>>> >>>>> fsingle_pack1 ; skipped. >>>>> >>>>> fsingle_pack1 ; if "has result" >>>>> reset "has result" flag. >>>>> return calc result from env. >>>>> else >>>>> pack srca >>>>> reference from tilegx.md: float(uns)sisf2. >>>>> get (u)int32_t a, then (u)int32_to_float32. >>>> >>>> For "pack srca and srcb", the related demo like below (srca and srcb >>>> are uint64_t): >>>> >>> >>> Oh, sorry, for "pack srca" (not for "pack srca and srcb") >>> >>>> switch (srca & 0x3ff) { >>>> >>>> /* treat it as uint32_t */ >>>> case 0x9e: >>>> return uint32_to_float32(srca >> 32, &FP_STATUS); >>>> >>>> /* treat it as int32_t, must be negative number */ >>>> case 0x29e: >>>> return int32_to_float32(srca >> 32 | 0x80000000, &FP_STATUS); >>>> >>>> default: >>>> unimplemented (gen_exception). >>>> } >>>> >>>>> >>>>> fdouble_unpack_max: ; skipped. >>>>> >>>>> fdouble_unpack_min: ; skipped. >>>>> >>>>> fdouble_add_flags: ; return calc flags, save calc result to env. >>>>> >>>>> fdouble_sub_flags: ; return calc flags, save calc result to env. >>>>> >>>>> fdouble_addsub: ; set "has result" flag. >>>>> >>>>> fdouble_mul_flags: ; skip return flags, save calc result to env. >>>>> set "has result" flag. >>>>> >>>>> fdouble_pack1: ; if "has result" >>>>> reset "has result" flag. >>>>> return calc result from env. >>>>> else >>>>> pack srca and srcb. >>>>> reference from tilegx.md: float(uns)sidf2. >>>>> get (u)int32_t a, then (u)int32_to_float64. >>>>> >>>> >>>> For "pack srca and srcb", the related demo like below (srca and srcb >>>> are uint64_t): >>>> >>>> switch (srcb & 0xffff) { >>>> >>> >>> Oh, sorry, should use 0xfffff instead of 0xffff. >>> >>>> /* treat it as uint32_t */ >>>> case 0x21b00: >>>> return uint32_to_float64(srca >> 4, &FP_STATUS); >>>> >>>> /* treat it as int32_t, must be negative number */ >>>> case 0xa1b00: >>>> return int32_to_float64(srca >> 4 | 0x80000000, &FP_STATUS); >>>> >>>> default: >>>> unimplemented (gen_exception). >>>> } >>>> >>>>> fdouble_pack2: ; skipped. >>>>> >>>>> >>>>> (fsingle_add1/sub1, fdouble_add/sub_flags can be used individually, >>>>> e.g gcc testsuit for complex number). >>>>> >>>>> >>>>> Next, I shall implement the floating point insns, welcome any related >>>>> ideas, suggestions, and completions. >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> On 8/5/15 22:16, Chen Gang wrote: >>>>>> On 8/4/15 23:04, Richard Henderson wrote: >>>>>>> On 08/04/2015 06:56 AM, Chen Gang wrote: >>>>>>>> >>>>>>>> On 8/4/15 04:47, Chen Gang wrote: >>>>>>>>> On 8/4/15 00:40, Richard Henderson wrote: >>>>>>>>>> On 08/01/2015 02:47 AM, Chen Gang wrote: >>>>>>>>>>> I am just adding floating point instructions (e.g. fsingle_add1), >>>>>>>>>>> but for me, I can not find any details about them (the ISA >>>>>>>>>>> documents only give a summary description, but not details), e.g. >>>>>>>>>> >>>>>>>>>> The tilegx splits the four/six cycle arithmetic into multiple >>>>>>>>>> black-box instructions. You need only really implement one of the >>>>>>>>>> four, with the rest of them being implemented as nops or moves. >>>>>>>>>> >>>>>>>>>> Looking at what gcc produces gives the hints: >>>>>>>>>> >>>>>>>>>> fdouble_unpack_min min, srca, srcb fdouble_unpack_max max, >>>>>>>>>> srca, >>>>>>>>>> srcb fdouble_add_flags flg, srca, srcb fdouble_addsub >>>>>>>>>> max, min, flg >>>>>>>>>> fdouble_pack1 dst, max, flg fdouble_pack2 >>>>>>>>>> dst, max, zero >>>>>>>>>> >>>>>>>>>> The unpack, addsub, and pack2 insns can be ignored, the add_flags >>>>>>>>>> insn can perform the whole operation, the pack1 insn performs a move >>>>>>>>>> from "flg" to "dst". >>>>>>>>>> >>>>>>>>>> Similarly for the single-precision: >>>>>>>>>> >>>>>>>>>> fsingle_add1 tmp, srca, srcb fsingle_addsub2 tmp, >>>>>>>>>> srca, srcb >>>>>>>>>> fsingle_pack1 flg, tmp fsingle_pack2 dst, >>>>>>>>>> tmp, flg >>>>>>>>>> >>>>>>>>>> The add1 insn performs the whole operation, the addsub2 and pack1 >>>>>>>>>> insns are ignored, and the pack2 insn is a move from tmp to dst. >>>>>>>>>> >>>>>>>> >>>>>>>> After check the tilegx.md completely, for me, we still need implement >>>>>>>> each of them precisely, or we can not emulate all cases (e.g. muldf3). >>>>>>> >>>>>>> No, you can still implement all of muldf3 in fdouble_mul_flags. >>>>>>> Again, the fdouble_pack1 copies from the flag input to the output. >>>>>>> >>>>>>> Yes, there is a 64-bit multiply in there, but the tcg optimizer >>>>>>> should be able to delete all of that as unused. Especially if you have >>>>>>> the >>>>>>> fdouble_unpack* insns store zero into their destinations. >>>>>>> >>>>>> >>>>>> For me, I am not quite sure. But I guess, what you said should be OK (at >>>>>> least, what you said is very useful for the implementation). >>>>>> >>>>>> >>>>>>> Don't get me wrong -- more accurate implementation of the actual >>>>>>> insns would be nice, especially for debugging. But if the insns >>>>>>> aren't accurately documented I don't see what choice we have. >>>>>>> >>>>>> >>>>>> For me, I guess, we can still try to implement the details. >>>>>> >>>>>> - The document has all floating point instructions' summary, so we can >>>>>> think of, or guess its implementation entirely. >>>>>> >>>>>> - gcc uses them all and completely, so it is our good sample and good >>>>>> reference (but we should not assume gcc must be correct, since we >>>>>> just use qemu for gcc testsuite). >>>>>> >>>>>> - Tilegx floating point format should be standard (at least, reference >>>>>> to the standard format), so we can reference the related information >>>>>> from google/baidu. >>>>>> >>>>>> >>>>>>> On the good side, implementing the entire operation as part of the >>>>>>> "flags" step >>>>>>> probably results in faster emulation. >>>>>>> >>>>>> >>>>>> I guess so, too. >>>>>> >>>>>> >>>>>> I shall try to finish the simple implementation, firstly. Then try to >>>>>> implement the floating point instructions in details in the future (it >>>>>> should be lower priority). >>>>>> >>>>>> >>>>>> Thanks. >>>>>> >>>>> >>>> >>> >> > -- Chen Gang Open, share, and attitude like air, water, and life which God blessed
floating-point-double-add.tar.gz
Description: GNU Zip compressed data