Sorry to be the party crasher, but... I'd love to have optimizations for everything out there, but it takes a lot of work to fine tune for something specific.
Right now I see a few variants of ARMv8 ------------ ARM reference stuff - A57 cores and the newer bits.. The scheduling and stuff seems more-or-less similar enough that one tuning could probably work for the vast majority of these parts. Cavium ThunderX - It's ground up and quite different from the ARM reference stuff under the hood APM - Mustang, again ground up and different. I don't have enough hands on to know how different from reference. Broadcom - Coming Soon(tm) - Again no hands on or any data, but certainly very interesting.. ... now add in every variant of ground up implementation and you have 50 shades of gray.. ------------- Soo.. depending on your target hardware, you may be better off with gcc if the end goal is general all-around performance. (It does a quite respectable job of being generic) I realize a lot of people have strong feelings for or against it. I leave that to the reader to decide.. Back to my own glass house.. It will take a few years, but I am trying to make it easier (internally) to expose in some clear way all the pieces which compose a fine tuning per-processor. If this was "just" scheduling models it would be really easy, but it's not.. Those latencies and other magic bits decide things like.. "should I unroll this loop or do something else" and then you venture into the land of accelerators where a custom regalloc may be what you really need and *nothing* off the shelf fits to meet your goals.. (projects like that can take 9 months and in the end only give a general 1-5% median performance gain..) -------------- On Sat, Aug 20, 2016 at 2:02 AM, james <gar...@verizon.net> wrote: > On 08/19/2016 11:15 AM, C Bergström wrote: >> >> On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_z...@gentoo.org> wrote: >>> >>> BTW is pathscale ready to be used as system compiler as well? >> >> >> I wish, but no. We have known issues when building grub2, glibc and >> the Linux kernel at the very least. Someone* did report a long time >> ago that with their unofficial port, were able to build/boot the >> NetBSD kernel. >> (*A community dev we trusted with our sources and was helping us with >> portability across platforms) >> >> The stuff with grub2 may potentially be fixed in the "near" future... >> the others are more tricky. In general if clang can do it, we have a >> strong chance as well. >> >> As a philosophy - "we" aren't really trying to be the best generic >> compiler in the world. We aim more on optimizing as much for known >> targets. So if by system you mean, a compiler that would produce an >> "OS" which only runs on a single class of hardware, then yeah it could >> work at some point in the future. Specifically, on x86 we default on >> host CPU optimizations. So on newer Intel hardware it's easy to get a >> binary that won't run on AMD or older 64bit Intel. >> >> More recently on ARMv8 - we turn on processor specific tuning. So >> while it may "run", the difference between APM's mustang and Cavium >> ThunderX is pretty big and running binaries intended for A and ran on >> B would certainly take a hit.. (this is just the tip of the iceberg) >> >> For general scalar OS code it isn't likely to matter... the real >> impact being like 1-10% difference (being very general.. it could be >> less or more in the real world..) >> >> For HPC codes or anything where you get loops or computationally >> complex - the gloves are off and I could see big differences... (again >> being general and maybe a bit dramatic for fun) > > > > OK (actually fantastic!). Looking at the pathscale site pages and github, > perhaps a cheap arm embedded board where llvm is the centerpiece of > compiling a minimal system to entice gentoo-llvm testers, would be possible > in the near future?. I have a 96boards, HiKey arm64v8 that I could dedicate > to gentoo+armv8-llvm testing, if that'd help. [1] > > Perhaps a baseline bootstrap iso (or such) version targeted at > llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or > lilo or (?), since there seems to be issues with llvm-grub2. > > > [1] http://dev.gentoo.org/~tgall/ > > > No matter how you slice it, from someone who is focused on building > minimized and embedded (bare metal) systems that are customized and > coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful > news. Finally a vendor in the cluster space, with some vision and > common-sense, imho. Heterogeneous and open HPC is where is at, imho. If > there is a forum where the community and pathscale folks discuss issues, > point that out as I could not find one for deeper reading.... > > > hth, > James >