罗勇刚(Yonggang Luo) <luoyongg...@gmail.com> writes:
> On Fri, May 1, 2020 at 7:58 PM BALATON Zoltan <bala...@eik.bme.hu> wrote: > >> On Fri, 1 May 2020, 罗勇刚(Yonggang Luo) wrote: >> > That's what I suggested, >> > We preserve a float computing cache >> > typedef struct FpRecord { >> > uint8_t op; >> > float32 A; >> > float32 B; >> > } FpRecord; >> > FpRecord fp_cache[1024]; >> > int fp_cache_length; >> > uint32_t fp_exceptions; >> > >> > 1. For each new fp operation we push it to the fp_cache, >> > 2. Once we read the fp_exceptions , then we re-compute >> > the fp_exceptions by re-running the fp FpRecord sequence. >> > and clear fp_cache_length. >> >> Why do you need to store more than the last fp op? The cumulative bits can >> be tracked like it's done for other targets by not clearing fp_status then >> you can read it from there. Only the non-sticky FI bit needs to be >> computed but that's only determined by the last op so it's enough to >> remember that and run that with softfloat (or even hardfloat after >> clearing status but softfloat may be faster for this) to get the bits for >> last op when status is read. >> > Yeap, store only the last fp op is also an option. Do you means that store > the last fp op, > and calculate it when necessary? I am thinking about a general fp > optmize method that suite > for all target. I think that's getting a little ahead of yourself. Let's prove the technique is valuable for PPC (given it has the most to gain). We can always generalise later if it's worthwhile. Rather than creating a new structure I would suggest creating 3 new tcg globals (op, inA, inB) and re-factor the front-end code so each FP op loaded the TCG globals. The TCG optimizer should pick up aliased loads and automatically eliminate the dead ones. We might need some new machinery for the TCG to avoid spilling the values over potentially faulting loads/stores but that is likely a phase 2 problem. Next you will want to find places that care about the per-op bits of cpu_fpscr and call a helper with the new globals to re-run the computation and feed the values in. That would give you a reasonable working prototype to start doing some measurements of overhead and if it makes a difference. > >> >> > 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and >> > clear fp_exceptions. >> > 4. If the fp_cache are full, then we re-compute >> > the fp_exceptions by re-running the fp FpRecord sequence. >> >> All this cache management and more than one element seems unnecessary to >> me although I may be missing something. >> >> > Now the keypoint is how to tracking the read and write of FPSCR register, >> > The current code are >> > cpu_fpscr = tcg_global_mem_new(cpu_env, >> > offsetof(CPUPPCState, fpscr), "fpscr"); >> >> Maybe you could search where the value is read which should be the places >> where we need to handle it but changes may be needed to make a clear API >> for this between target/ppc, TCG and softfloat which likely does not >> exist yet. Once the per-op calculation is fixed in the PPC front-end I thing the only change needed is to remove the #if defined(TARGET_PPC) in softfloat.c - it's only really there because it avoids the overhead of checking flags which we always know to be clear in it's case. -- Alex Bennée