On Wed, Jun 25, 2014 at 11:53 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: > On Wed, Jun 25, 2014 at 5:47 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Wed, Jun 25, 2014 at 5:26 PM, Bingfeng Mei <b...@broadcom.com> wrote: >>> Thanks for nice benchmarks. Vladimir. >>> >>> Why is GCC code size so much bigger than LLVM? Does -Ofast have more >>> unrolling >> On the contrary, I don't think rtl unrolling is enabled by default on >> GCC with level O3/Ofast. There is no unroll dump file at all unless >> -funroll-loops/-funroll-all-loops is explicitly specified. > Need to clarify, I did see cases in which GCC's rtl unroller more > aggressive than llvm's once it's specified.
At -O3 you get more aggressive complete peeling from the GIMPLE cunroll pass. Richard. >> Thanks, >> bin >> >>> on GCC? It doesn't seem increasing code size help performance (164.gzip & >>> 197.parser) >>> Is there comparisons for O2? I guess that is more useful for typical >>> mobile/embedded programmers. >>> >>> Bingfeng >>> >>>> -----Original Message----- >>>> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of >>>> Vladimir Makarov >>>> Sent: 24 June 2014 16:07 >>>> To: Ramana Radhakrishnan; gcc.gcc.gnu.org >>>> Subject: Re: Comparison of GCC-4.9 and LLVM-3.4 performance on >>>> SPECInt2000 for x86-64 and ARM >>>> >>>> On 06/24/2014 10:57 AM, Ramana Radhakrishnan wrote: >>>> > >>>> > The ball-park number you have probably won't change much. >>>> > >>>> >>> >>>> >> Unfortunately, that is the configuration I can use on my system >>>> because >>>> >> of lack of libraries for other configurations. >>>> > >>>> > Using --with-fpu={neon / neon-vfpv4} shouldn't cause you ABI issues >>>> > with libraries for any other configurations. neon / neon-vfpv4 enable >>>> > use of the neon unit in a manner that is ABI compatible with the rest >>>> > of the system. >>>> > >>>> > For more on command line options for AArch32 and how they map to >>>> > various CPU's you might find this blog interesting. >>>> > >>>> > http://community.arm.com/groups/tools/blog/2013/04/15/arm-cortex-a- >>>> processors-and-gcc-command-lines >>>> > >>>> > >>>> >> >>>> >> I don't think Neon can improve score for SPECInt2000 significantly >>>> but >>>> >> may be I am wrong. >>>> > >>>> > It won't probably improve the overall score by a large amount but some >>>> > individual benchmarks will get some help. >>>> > >>>> There are some few benchmarks which benefit from autovectorization (eon >>>> particularly). >>>> >>> Did you add any other architecture specific options to your SPEC2k >>>> >>> runs ? >>>> >>> >>>> >>> >>>> >> No. The only options I used are -Ofast. >>>> >> >>>> >> Could you recommend me what best options you think I should use for >>>> this >>>> >> processor. >>>> >> >>>> > >>>> > I would personally use --with-cpu=cortex-a15 --with-fpu=neon-vfpv4 >>>> > --with-float=hard on this processor as that maps with the processor >>>> > available on that particular piece of Silicon. >>>> Thanks, Ramana. Next time, I'll try these options. >>>> > >>>> > Also given it's a big LITTLE system with probably kernel switching - >>>> > it may be better to also make sure that you are always running on the >>>> > big core. >>>> > >>>> The results are pretty stable. Also this version of Fedora does not >>>> implement switching from Big to Little processors. >>>