On Nov 19, 2010, at 11:53, Eric Botcazou wrote: >> Yes, if all the people who want only one set of libraries agree on what >> that set shall be (or this can be selected with existing configure flags), >> this is the simplest way. > > Yes, this can be selected at configure time with --with-cpu and --with-float. > > The default configuration is also straightforward: LEON is an implementation > of the SPARC-V8 architecture so --with-cpu=v8 and --with-float=hard.
There is LEON2, which is V7, and LEON3/LEON4, which are V8. While LEON3 can support all of V8 in hardware, LEON3 is a configurable system-on-a-chip, targetting both FPGAs and ASICs, where users can configure and synthesize different aspects of the CPU: * CONFIG_PROC_NUM: The number of processor cores. * CONFIG_IU_V8MULDIV: Implements V8 multiply and divide instructions UMUL, UMULCC, SMUL, SMULCC, UDIV, UDIVCC, SDIV, SDIVCC. Costs about 8k gates. * CONFIG_IU_MUL_MAC: Implements the SPARC V8e UMAC/SMAC (multiply-accumulate) instructions with a 40-bits accumulator * CONFIG_FPU_ENABLE: Enable or disable floating point unit Apart from these settings that determine wether instructions are present at all, other settings allow selection of FPU implementation (trading off between cycle count, area and timing), such as: * CONFIG_IU_MUL_LATENCY_2: Implementation options for the integer multiplier. Type Implementation issue-rate/latency 2-clocks 32x32 pipelined multiplier 1/2 4-clocks 16x16 standard multiplier 4/4 5-clocks 16x16 pipelined multiplier 4/5 * CONFIG_IU_LDELAY: One cycle load delay for best performance, or 2-cycles to improve timing at the cost of about 5% reduced performance. CONFIG_FPU_ENABLE Y/N would correspond to --with-float=hard/soft, and I believe setting CONFIG_IU_V8MULDIV to Y/N requires --with-cpu=V8/V7, is that correct? I think it would make sense to build these as multilibs, so the user can experiment to find out performance impacts of the various hardware configurations on generated code. I wonder if it also would be worthwhile to have compiler options for fpu=fast/slow and multiply=fast/slow, so we can schedule appropriately. For the FPU, issue-rate/latency are as follows: GR FPU: 1/4, with FDIV? 16 and FSQRT? 24 cycles, non-pipelined on separate unit GR FPU Lite: 8/8, with FDIVS/FDIVD/FSQRTS/FSQRTD 31/57/46/57 cycles, non-pipelined on same unit While the FPU Lite is not pipelined, integer instructions can be executed in parallel with a FPU instruction as long as no new FPU instructions are pending. -Geert