On Thu, Jul 15, 2010 at 11:19 AM, Paul Brook <p...@codesourcery.com> wrote: >> > Enabling use of VFP does not require use of the hard-float ABI. Please >> > don't confuse the two. >> >> The whole point of the port is that we get rid of the softfloat ABI in >> order to use the VFP unit without playing around moving >> registers around. This sort of came about from Konstantinos' porting >> of the Eigen2 library (after he had done it for AltiVec) >> to NEON and some of the developers noticed it wasn't so much faster >> because gcc inserts what can only be described as >> evil between the start of the function and the real meat of the code. >> The pipeline stalls for register movement are noticable >> in real code as a 20% or higher performance hit. > > Yes, but the point I was responding to is that you don't necessarily need to > use hard-float ABI to get most of the performance gain.
I believe Konstantinos when he says we sort of do. I've been working with him on AltiVec stuff for a long while and what performance gains we saw and tested on this (and eventually got picked up by YellowDog Linux) were pretty good. Exactly what you would expect. Using softfp ABI means that the code you expected to run exactly 4x faster, actually can't because there is a significant prologue and epilogue inserted by the compiler which is causing pipeline stalls in register moves. This means the 4x faster code isn't running for most of the lifetime of the function. > I completely agree that if you want to use the hard-float ABI then you need a > new port. > > However changing the ABI doesn't solve many of the underlying problem. > Specifically how to provide optimized binaries that take advantage of new > features on modern CPUs while still supporting older hardware. ... the point is to not support older hardware. We picked a base level: armv7-a and vfpv3-d16 is our target for it. The same way lpia picked the Pentium III core, SSE2 as an FPU and certain other optimizations as the basis for that port. Not only does compiling it for that CPU and the hardfloat ABI level increase performance (instruction set improvements, reduced amount of code) but it should in the same way improve battery life (lpia was reported as a 10% reduction in power usage). 10% on a 90 minute battery for Atom notebooks is not much but we know we can get 6 hours on a Cortex-A8 already with a 2 cell battery. Bump the battery size to 4 cells and it goes up, oddly enough, to about 12 hours. 10% improvement in power usage on 12 hours is an extra hour of battery life, 30 minutes for the smaller sizes, compared to about enough time to put the system into an emergency standby state on Atom and keep the system on suspend current until it reaches an AC adapter. > Switching to the hard-float ABI certainly does give some benefit. While 20% > isn't a trivial difference, it's important to keep this in context. This is > on top of what I'd guess is a 10x (i.e. 1000%) speedup achieved without > breaking the ABI and requiring a whole new port. How do you figure a 10x speedup? > about performance then a NEON optimized version of your critical code should > get you annother 4x or so on a Cortex-A8. Yes it's about 4x mathematically but 2x in practice because of the ABI fudging. >> What would not be so great is that even if it was fixed, the option to >> use a faster floating point ABI drags in a clone of >> every package on your system (at the very least, libc, libm, and all >> the system library dependencies) increasing the >> size of the installed system. > > What you're describing here is multiarch. Yes, which is needed anyway to support NEON where it's available. But we're considering taking a more comprehensive base CPU and FPU requirement and adding a little bit of multiarch (NEON, -fp16) instead of taking the lowest common denominator and adding an entire distro worth of multiarch and not seeing anything like the performance improvement you'd expect. Using the hard float ABI and picking a base level of the ARM architecture means the port is going to run it's best on a certain subset of CPUs which are going into Smartphones and Smartbooks right now - OMAP3, iMX51, iMX53, Snapdragon, Samsung/Apple.. all benefit. Beagleboard, EfikaMX, Nexus One, Tegra2... That it won't run on other CPUs of a lower pedigree, unfortunate, but armel will always still work for them. What you have to do is weigh the advantage of building for last year's base system against the status quo of 15 years ago. In fact, ARM obseleted 90% of the architecture and cores that "armel" runs on. This base level we have chosen is the new base level... (Genesi has a commercial interest on it running on certain ARM926EJ-S cores and a couple ARM11 cores too, and it won't - i.MX27, i.MX37 and Toshiba TMPA910 will be left out. It's not like we're focusing entirely on what we want to do. This is actually going to be a benefit to every modern smartbook processor, basically lpia done right) -- Matt -- To UNSUBSCRIBE, email to debian-arm-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/aanlktils_uzdbgweski5kgndzc3-crrqcwbhbziir...@mail.gmail.com