Sorry to be the party crasher, but...

I'd love to have optimizations for everything out there, but it takes
a lot of work to fine tune for something specific.

Right now I see a few variants of ARMv8
------------
ARM reference stuff - A57 cores and the newer bits.. The scheduling
and stuff seems more-or-less similar enough that one tuning could
probably work for the vast majority of these parts.

Cavium ThunderX - It's ground up and quite different from the ARM
reference stuff under the hood

APM - Mustang, again ground up and different. I don't have enough
hands on to know how different from reference.

Broadcom - Coming Soon(tm) - Again no hands on or any data, but
certainly very interesting..

... now add in every variant of ground up implementation and you have
50 shades of gray..
-------------
Soo.. depending on your target hardware, you may be better off with
gcc if the end goal is general all-around performance. (It does a
quite respectable job of being generic) I realize a lot of people have
strong feelings for or against it. I leave that to the reader to
decide..

Back to my own glass house.. It will take a few years, but I am trying
to make it easier (internally) to expose in some clear way all the
pieces which compose a fine tuning per-processor. If this was "just"
scheduling models it would be really easy, but it's not.. Those
latencies and other magic bits decide things like.. "should I unroll
this loop or do something else" and then you venture into the land of
accelerators where a custom regalloc may be what you really need and
*nothing* off the shelf fits to meet your goals.. (projects like that
can take 9 months and in the end only give a general 1-5% median
performance gain..)
--------------


On Sat, Aug 20, 2016 at 2:02 AM, james <gar...@verizon.net> wrote:
> On 08/19/2016 11:15 AM, C Bergström wrote:
>>
>> On Fri, Aug 19, 2016 at 11:01 PM, Luca Barbato <lu_z...@gentoo.org> wrote:
>>>
>>> BTW is pathscale ready to be used as system compiler as well?
>>
>>
>> I wish, but no. We have known issues when building grub2, glibc and
>> the Linux kernel at the very least. Someone* did report a long time
>> ago that with their unofficial port, were able to build/boot the
>> NetBSD kernel.
>> (*A community dev we trusted with our sources and was helping us with
>> portability across platforms)
>>
>> The stuff with grub2 may potentially be fixed in the "near" future...
>> the others are more tricky. In general if clang can do it, we have a
>> strong chance as well.
>>
>> As a philosophy - "we" aren't really trying to be the best generic
>> compiler in the world. We aim more on optimizing as much for known
>> targets. So if by system you mean, a compiler that would produce an
>> "OS" which only runs on a single class of hardware, then yeah it could
>> work at some point in the future. Specifically, on x86 we default on
>> host CPU optimizations. So on newer Intel hardware it's easy to get a
>> binary that won't run on AMD or older 64bit Intel.
>>
>> More recently on ARMv8 - we turn on processor specific tuning. So
>> while it may "run", the difference between APM's mustang and Cavium
>> ThunderX is pretty big and running binaries intended for A and ran on
>> B would certainly take a hit.. (this is just the tip of the iceberg)
>>
>> For general scalar OS code it isn't likely to matter... the real
>> impact being like 1-10% difference (being very general.. it could be
>> less or more in the real world..)
>>
>> For HPC codes or anything where you get loops or computationally
>> complex - the gloves are off and I could see big differences... (again
>> being general and maybe a bit dramatic for fun)
>
>
>
> OK (actually fantastic!). Looking at the pathscale site pages and github,
> perhaps a cheap arm embedded board where llvm is the centerpiece of
> compiling a minimal system to entice gentoo-llvm testers, would be possible
> in the near future?. I have a 96boards, HiKey arm64v8  that I could dedicate
> to gentoo+armv8-llvm testing, if that'd help. [1]
>
> Perhaps a  baseline bootstrap iso (or such) version  targeted at
> llvm-centric testers on x86-64 or armv8 ? Skip grub2 and use grub-legacy or
> lilo or (?), since there seems to be issues with llvm-grub2.
>
>
> [1] http://dev.gentoo.org/~tgall/
>
>
> No matter how you slice it, from someone who is focused on building
> minimized and embedded (bare metal) systems that are customized and
> coalesced into a heterogeneous gentoo cluster for HPC, this is wonderful
> news. Finally a vendor in the cluster space, with some vision and
> common-sense, imho. Heterogeneous and open  HPC is where is at, imho. If
> there is a forum where the community and pathscale folks discuss issues,
> point that out as I could not find one for deeper reading....
>
>
> hth,
> James
>

Reply via email to