> On Apr 30, 2020, at 12:34 PM, Dino Papararo <skizzat...@msn.com> wrote: > > Maybe the fastest way to implement hardfloats for ppc could be run them by > default and until some fpu instruction request for FPSCR register. > At this time probably we want to check for some exception.. so QEMU could > come back to last fpu instruction executed and re-execute it in softfloat > taking care this time of FPSCR flags, then continue in hardfloats unitl > another instruction looking for FPSCR register and so on.. > > Dino
That sounds like a good idea. > -----Messaggio originale----- > Da: BALATON Zoltan <bala...@eik.bme.hu> > Inviato: giovedì 30 aprile 2020 17:36 > A: 罗勇刚(Yonggang Luo) <luoyongg...@gmail.com> > Cc: Richard Henderson <richard.hender...@linaro.org>; Dino Papararo > <skizzat...@msn.com>; qemu-devel@nongnu.org; Programmingkid > <programmingk...@gmail.com>; qemu-...@nongnu.org; Howard Spoelstra > <hsp.c...@gmail.com>; Alex Bennée <alex.ben...@linaro.org> > Oggetto: Re: R: R: About hardfloat in ppc > > On Thu, 30 Apr 2020, 罗勇刚(Yonggang Luo) wrote: >> I propose a new way to computing the float flags, We preserve a float >> computing cash typedef struct FpRecord { uint8_t op; >> float32 A; >> float32 B; >> } FpRecord; >> FpRecord fp_cache[1024]; >> int fp_cache_length; >> uint32_t fp_exceptions; >> >> 1. For each new fp operation we push it to the fp_cache, 2. Once we >> read the fp_exceptions , then we re-compute the fp_exceptions by >> re-running the fp FpRecord sequence. >> and clear fp_cache_length. >> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 >> and clear fp_exceptions. >> 4. If the fp_cache are full, then we re-compute the fp_exceptions by >> re-running the fp FpRecord sequence. >> >> Would this be a general method to use hard-float? >> The consued time should be 2*hard_float. >> Considerating read fp_exceptions are rare, then the amortized time >> complexity would be 1 * hard_float. > > It's hard to guess what the hit rate of such cache would be and if it's low > then managing the cache is probably more expensive than running with > softfloat. So to evaluate any proposed patch we also need some benchmarks > which we can experiment with to tell if the results are good or not otherwise > we're just guessing. Are there some existing tests and benchmarks that we can > use? Alex mentioned fp-bench I think and to evaluate the correctness of the > FP implementation I've seen this other > conversation: > > https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05107.html > https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05126.html > > Is that something we can use for PPC as well to check the correctness? > > So I think before implementing any potential solution that came up in this > brainstorming the first step would be to get and compile (or write if not > available) some tests and benchmarks: > > 1. testing host behaviour for inexact and compare that for different archs 2. > some FP tests that can be used to compare results with QEMU and real CPU to > check correctness of emulation (if these check for inexact differences then > could be used instead of 1.) 3. some benchmarks to evaluate QEMU performance > (these could be same as FP tests or some real world FP heavy applications). > > Then we can see if the proposed solution is faster and still correct. > > Regards, > BALATON Zoltan