Re: Byte swapping support

2017-09-14 Thread David Brown
On 12/09/17 20:56, Michael Meissner wrote:
> On Tue, Sep 12, 2017 at 05:26:29PM +0200, David Brown wrote:
>> On 12/09/17 16:15, paul.kon...@dell.com wrote:
>>>
 On Sep 12, 2017, at 5:32 AM, Jürg Billeter 
  wrote:

 Hi,

 To support applications that assume big-endian memory layout on little-
 endian systems, I'm considering adding support for reversing the
 storage order to GCC. In contrast to the existing scalar storage order
 support for structs, the goal is to reverse the storage order for all
 memory operations to achieve maximum compatibility with the behavior on
 big-endian systems, as far as observable by the application.
>>>
>>> I've done this in the past by C++ type magic. As a general setting
>>> it doesn't make sense that I can see. As an attribute applied to a
>>> particular data item, it does. But I'm not sure why you'd put this in
>>> the compiler when programmers can do it easily enough by defining a "big
>>> endian int32" class, etc.
>>>
>>
>> Some people use the compiler for C rather than C++ ...
>>
>> If someone wants to improve on the endianness support in gcc, I can
>> think of a few ideas that /I/ think might be useful.  I have no idea how
>> difficult they might be to put in practice, and can't say if they would
>> be of interest to others.
>>
>> First, I would like to see endianness given as a named address space,
>> rather than as a type attribute.  A key point here is that named address
>> spaces are effectively qualifiers, like "const" and "volatile" - and you
>> can then use them in pointers:
> 
> When I gave the talk at the 2009 GCC summit on the named address support, I
> thought that it could be used to add endianess support.  In fact at one time, 
> I
> had a trial PowerPC compiler that added endianess support.  Unfortunately, 
> that
> was in 2009, and I lost the directory of the work.  I tried again a few years
> ago, but I didn't get far enough into it to get a working compiler before 
> being
> pulled back into work.
> 
> Back when I worked at Cygnus Solutions, we used to get requests to add endian
> support every so often, but nobody wanted to pay the cost that we were then
> quoting to add the support.  Now that named address support is in, it could be
> done better.
> 
> I suspect however, you want to do this at the higher tree level, adding in the
> endianess bits in a separate area than the named address support.  Or perhaps,
> growing the named address support, and adding several standard named 
> addresses.
> 
> The paper where I talked about the named address support was from the 2009 GCC
> summit.  You can download the proceedings of the 2009 summit from here (my
> paper is pages 67-74):
> https://en.wikipedia.org/wiki/GCC_Summit

It's nice to know I am not the only one who things named address spaces
are a logical way to go for endianness support.

I fully appreciate, of course, that even if named address spaces are
nicer for users, a balance must be found for the practicality of the
implementation.  If today's type attribute is easier to implement and
maintain, then that is entirely understandable.  This is not a big issue
as far as I am concerned - it does not make sense to implement it unless
someone has the time and inclination, or someone is willing to pay for
the time.

> 
>> big_endian uint32_t be_buffer[20];
>> little_endian uint32_t le_buffer[20];
>>
>> void copy_buffers(const big_endian uint32_t * src,
>>  little_endian uint32_t * dest)
>> {
>>  for (int i = 0; i < 20; i++) {
>>  dest[i] = src[i];   // Swaps endianness on copy
>>  }
>> }
>>
>> That would also let you use them for scaler types, not just structs, and
>> you could use typedefs:
>>
>>  typedef big_endian uint32_t be_uint32_t;
>>
>>
>> Secondly, I would add more endian types.  As well as big_endian and
>> little_endian, I would add native_endian and reverse_endian.  These
>> could let you write a little clearer definitions sometimes.  And ideally
>> I would like mixed endian with big-endian 16-bit ordering and
>> little-endian ordering for bigger types (i.e., 0x87654321 would be
>> stored 0x43, 0x21, 0x87, 0x65).  That order matches some protocols, such
>> as Modbus.
> 
> It depends, you can add so many different combinations, that in the end you
> don't add the support you want because of th 53 other variants.
> 

I'd stick to the basic four, and perhaps add two more ("PDP endian", and
a sort of reverse PDP endian).  I can't see there being scope for so
very many different endiannesses, but I suppose one could mix in bit
endianness to the pile.

> Note, if you use it in named addresses, you are currently limited to 15 new
> keywords for adding named address support.  This can be grown, but you should
> know about the limit ahead of time.  Of course if you add it in a parallel,
> machine independent version, then you don't have to worry about the existing
> limits.

I am sure that if gcc started usi

Re: Byte swapping support

2017-09-14 Thread David Brown
On 14/09/17 08:22, Eric Botcazou wrote:
>> And there are lots of other problems, I don't have time to document them
>> all, or even remember them all.  Personally, I think you are better off
>> trying to fix the application to make it more portable.  Fixing the
>> compiler is not a magic solution to the problem that is any easier than
>> fixing the application.
> 
> Note that WRS' Diab compiler has got something equivalent to what GCC has got 
> now, i.e. a way to tag a particular component in a structure as BE or LE.
> 

It is over 20 years ago since I played with a Diab Data compiler, for
the m68k.

I seem to remember it being able to attach a big-endian or little-endian
label to any individual variable (rather than a type), which could be a
scaler rather than a struct.  So it was a bit more flexible than gcc.

It also had a more flexible inline assembly system than gcc.

And it had a /massive/ price tag, way beyond our reach at the time - so
I only had a demo version :-(


David



Re: Byte swapping support

2017-09-14 Thread Eric Botcazou
> I seem to remember it being able to attach a big-endian or little-endian
> label to any individual variable (rather than a type), which could be a
> scaler rather than a struct.  So it was a bit more flexible than gcc.

Well, the only thing I see in the documentation for "Byte Ordering" is the 
reference to pragma Pack and the __packed__ keyword for structures, which can 
toggle byte ordering by means of the byte-swap argument:

"#pragma pack
 [ ([[max_member_alignment] , [min_structure_alignment][, byte-swap ]] ) ]

The pack directive specifies that all subsequent structures..."

with the same limitation as GCC about taking the address:

"It is not possible to take the address of a byte-swapped member."

-- 
Eric Botcazou


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen
 wrote:
> On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote:
>> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
>> > On its own -O3 doesn't add much (some loop opts and slightly more
>> > aggressive inlining/unrolling), so whatever it does we
>> > should consider doing at -O2 eventually.
>>
>> Well, -O3 adds vectorization, which we don't enable at -O2 by default.
>>
> Would it be possible to enable basic block vectorization on -O2? I assume that
> doesn't increase binary size since it doesn't unroll loops.

Somebody needs to provide benchmarking looking at the compile-time cost
vs. the runtime benefit and the code size effect.  There's also room to tune
aggressiveness of BB vectorization as it currently allows for cases where
the scalar computation is not fully replaced by vector code.

Richard.

> 'Allan
>


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
> On 12/09/17 16:57, Wilco Dijkstra wrote:
>>
>> [...] As a result users are
>> required to enable several additional optimizations by hand to get good
>> code.
>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
>> was
>> mentioned repeatedly) which GCC could/should do as well.
>> [...]
>>
>> I'd welcome discussion and other proposals for similar improvements.
>
>
> What's the status of graphite? It's been around for years. Isn't it mature
> enough to enable these:
>
> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
>
> by default for -O2? (And I'm not even sure those are the complete set of
> graphite optimization flags, or just the "useful" ones.)

It's not on by default at any optimization level.  The main issue is the
lack of maintainance and a set of known common internal compiler errors
we hit.  The other issue is that there's no benefit of turning those on for
SPEC CPU benchmarking as far as I remember but quite a bit of extra
compile-time cost.

Richard.


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Markus Trippelsdorf
On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
> > On 12/09/17 16:57, Wilco Dijkstra wrote:
> >>
> >> [...] As a result users are
> >> required to enable several additional optimizations by hand to get good
> >> code.
> >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
> >> was
> >> mentioned repeatedly) which GCC could/should do as well.
> >> [...]
> >>
> >> I'd welcome discussion and other proposals for similar improvements.
> >
> >
> > What's the status of graphite? It's been around for years. Isn't it mature
> > enough to enable these:
> >
> > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
> >
> > by default for -O2? (And I'm not even sure those are the complete set of
> > graphite optimization flags, or just the "useful" ones.)
> 
> It's not on by default at any optimization level.  The main issue is the
> lack of maintainance and a set of known common internal compiler errors
> we hit.  The other issue is that there's no benefit of turning those on for
> SPEC CPU benchmarking as far as I remember but quite a bit of extra
> compile-time cost.

Not to mention the numerous wrong-code bugs. IMHO graphite should
deprecated as soon as possible.

-- 
Markus


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Michael Clark

> On 14 Sep 2017, at 3:06 AM, Allan Sandfeld Jensen  wrote:
> 
> On Dienstag, 12. September 2017 23:27:22 CEST Michael Clark wrote:
>>> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra  wrote:
>>> 
>>> Hi all,
>>> 
>>> At the GNU Cauldron I was inspired by several interesting talks about
>>> improving GCC in various ways. While GCC has many great optimizations, a
>>> common theme is that its default settings are rather conservative. As a
>>> result users are required to enable several additional optimizations by
>>> hand to get good code. Other compilers enable more optimizations at -O2
>>> (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should
>>> do as well.
>> 
>> There are some nuances to -O2. Please consider -O2 users who wish use it
>> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC).
>> 
>> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase
>> code size can be skipped from -Os without drastically effecting
>> performance.
>> 
>> This is not the case with GCC where -Os is a size at all costs optimisation
>> mode. GCC users option for size not at the expense of speed is to use -O2.
>> 
>> ClangGCC
>> -Oz  ~=  -Os
>> -Os  ~=  -O2
>> 
> No. Clang's -Os is somewhat limited compared to gcc's, just like the clang 
> -Og 
> is just -O1. AFAIK -Oz is a proprietary Apple clang parameter, and not in 
> clang proper.

It appears to be in mainline clang.

mclark@anarch128:~$ clang -Oz -c a.c -o a.o
mclark@anarch128:~$ clang -Ox -c a.c -o a.o
error: invalid integral value 'x' in '-Ox'
error: invalid integral value 'x' in '-Ox'
mclark@anarch128:~$ uname -a
Linux anarch128.org 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) 
x86_64 GNU/Linux
mclark@anarch128:~$ clang --version
clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

I still think it would be unfortunate to loose the size/speed sweet spot of -O2 
by adding optimisations that increase code size, unless there was a size 
optimisation option that was derived from -O2 at the point -O2 is souped up. 
i.e. create an -O2s (or renaming -Os to -Oz and deriving the new -Os from the 
current -O2).

I’m going to start looking at this point to see whats involved in making a 
patch. Distros want a balance or size and speed might even pick it up, even if 
it is not accepted in mainline.

Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Martin Liška
On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
>>> On 12/09/17 16:57, Wilco Dijkstra wrote:

 [...] As a result users are
 required to enable several additional optimizations by hand to get good
 code.
 Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
 was
 mentioned repeatedly) which GCC could/should do as well.
 [...]

 I'd welcome discussion and other proposals for similar improvements.
>>>
>>>
>>> What's the status of graphite? It's been around for years. Isn't it mature
>>> enough to enable these:
>>>
>>> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
>>>
>>> by default for -O2? (And I'm not even sure those are the complete set of
>>> graphite optimization flags, or just the "useful" ones.)
>>
>> It's not on by default at any optimization level.  The main issue is the
>> lack of maintainance and a set of known common internal compiler errors
>> we hit.  The other issue is that there's no benefit of turning those on for
>> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>> compile-time cost.
> 
> Not to mention the numerous wrong-code bugs. IMHO graphite should
> deprecated as soon as possible.
> 

For wrong-code bugs we've got and I recently went through, I fully agree with 
this
approach and I would do it for GCC 8. There are PRs where order of simple 2 
loops
is changed, causing wrong-code as there's a data dependence.

Moreover, I know that Bin was thinking about selection whether to use classical 
loop
optimizations or Graphite (depending on options provided). This would simplify 
it ;)

Martin


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
>> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
 On 12/09/17 16:57, Wilco Dijkstra wrote:
>
> [...] As a result users are
> required to enable several additional optimizations by hand to get good
> code.
> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
> was
> mentioned repeatedly) which GCC could/should do as well.
> [...]
>
> I'd welcome discussion and other proposals for similar improvements.


 What's the status of graphite? It's been around for years. Isn't it mature
 enough to enable these:

 -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block

 by default for -O2? (And I'm not even sure those are the complete set of
 graphite optimization flags, or just the "useful" ones.)
>>>
>>> It's not on by default at any optimization level.  The main issue is the
>>> lack of maintainance and a set of known common internal compiler errors
>>> we hit.  The other issue is that there's no benefit of turning those on for
>>> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>>> compile-time cost.
>>
>> Not to mention the numerous wrong-code bugs. IMHO graphite should
>> deprecated as soon as possible.
>>
>
> For wrong-code bugs we've got and I recently went through, I fully agree with 
> this
> approach and I would do it for GCC 8. There are PRs where order of simple 2 
> loops
> is changed, causing wrong-code as there's a data dependence.
>
> Moreover, I know that Bin was thinking about selection whether to use 
> classical loop
> optimizations or Graphite (depending on options provided). This would 
> simplify it ;)

I don't think removing graphite is warranted, I still think it is the
approach to use when
handling non-perfect nests.

Richard.

> Martin


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Bin.Cheng
On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
 wrote:
> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
>>> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
 On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
 wrote:
> On 12/09/17 16:57, Wilco Dijkstra wrote:
>>
>> [...] As a result users are
>> required to enable several additional optimizations by hand to get good
>> code.
>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
>> was
>> mentioned repeatedly) which GCC could/should do as well.
>> [...]
>>
>> I'd welcome discussion and other proposals for similar improvements.
>
>
> What's the status of graphite? It's been around for years. Isn't it mature
> enough to enable these:
>
> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
>
> by default for -O2? (And I'm not even sure those are the complete set of
> graphite optimization flags, or just the "useful" ones.)

 It's not on by default at any optimization level.  The main issue is the
 lack of maintainance and a set of known common internal compiler errors
 we hit.  The other issue is that there's no benefit of turning those on for
 SPEC CPU benchmarking as far as I remember but quite a bit of extra
 compile-time cost.
>>>
>>> Not to mention the numerous wrong-code bugs. IMHO graphite should
>>> deprecated as soon as possible.
>>>
>>
>> For wrong-code bugs we've got and I recently went through, I fully agree 
>> with this
>> approach and I would do it for GCC 8. There are PRs where order of simple 2 
>> loops
>> is changed, causing wrong-code as there's a data dependence.
>>
>> Moreover, I know that Bin was thinking about selection whether to use 
>> classical loop
>> optimizations or Graphite (depending on options provided). This would 
>> simplify it ;)
>
> I don't think removing graphite is warranted, I still think it is the
> approach to use when
> handling non-perfect nests.
Hi,
IMHO, we should not be in a hurry to remove graphite, though we are
introducing some traditional transformations.  It's a quite standalone
part in GCC and supports more transformations.  Also as it gets more
attention, never know if somebody will find time to work on it.

Thanks,
bin
>
> Richard.
>
>> Martin


Re: Byte swapping support

2017-09-14 Thread David Brown
On 14/09/17 11:30, Eric Botcazou wrote:
>> I seem to remember it being able to attach a big-endian or little-endian
>> label to any individual variable (rather than a type), which could be a
>> scaler rather than a struct.  So it was a bit more flexible than gcc.
> 
> Well, the only thing I see in the documentation for "Byte Ordering" is the 
> reference to pragma Pack and the __packed__ keyword for structures, which can 
> toggle byte ordering by means of the byte-swap argument:
> 
> "#pragma pack
>  [ ([[max_member_alignment] , [min_structure_alignment][, byte-swap ]] ) ]
> 
> The pack directive specifies that all subsequent structures..."
> 
> with the same limitation as GCC about taking the address:
> 
> "It is not possible to take the address of a byte-swapped member."
> 

It is, as I say, a /long/ time since I looked at it.  I could easily be
remembering incorrectly - and it could also be a difference in the versions.

Being unable to take the address of a byte-swapped member is a
reasonable and understandable limitation when byte-swapping is expressed
this way.  It would be one of the advantages in using a named address
space - then you /could/ take a pointer, but it would be a pointer
qualified by the address space name, and incompatible with normal pointers.



Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Martin Liška
On 09/14/2017 12:37 PM, Bin.Cheng wrote:
> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
>  wrote:
>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>>> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
 On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
> wrote:
>> On 12/09/17 16:57, Wilco Dijkstra wrote:
>>>
>>> [...] As a result users are
>>> required to enable several additional optimizations by hand to get good
>>> code.
>>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
>>> was
>>> mentioned repeatedly) which GCC could/should do as well.
>>> [...]
>>>
>>> I'd welcome discussion and other proposals for similar improvements.
>>
>>
>> What's the status of graphite? It's been around for years. Isn't it 
>> mature
>> enough to enable these:
>>
>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
>> -floop-block
>>
>> by default for -O2? (And I'm not even sure those are the complete set of
>> graphite optimization flags, or just the "useful" ones.)
>
> It's not on by default at any optimization level.  The main issue is the
> lack of maintainance and a set of known common internal compiler errors
> we hit.  The other issue is that there's no benefit of turning those on 
> for
> SPEC CPU benchmarking as far as I remember but quite a bit of extra
> compile-time cost.

 Not to mention the numerous wrong-code bugs. IMHO graphite should
 deprecated as soon as possible.

>>>
>>> For wrong-code bugs we've got and I recently went through, I fully agree 
>>> with this
>>> approach and I would do it for GCC 8. There are PRs where order of simple 2 
>>> loops
>>> is changed, causing wrong-code as there's a data dependence.
>>>
>>> Moreover, I know that Bin was thinking about selection whether to use 
>>> classical loop
>>> optimizations or Graphite (depending on options provided). This would 
>>> simplify it ;)
>>
>> I don't think removing graphite is warranted, I still think it is the
>> approach to use when
>> handling non-perfect nests.
> Hi,
> IMHO, we should not be in a hurry to remove graphite, though we are
> introducing some traditional transformations.  It's a quite standalone
> part in GCC and supports more transformations.  Also as it gets more
> attention, never know if somebody will find time to work on it.

Ok. I just wanted to express that from user's perspective I would not recommend 
it to use.
Even if it improves some interesting (and for classical loop optimization hard) 
loop nests,
it can still blow up on a quite simple data dependence in between loops. That 
said, it's quite
risky to use it.

Thanks,
Martin

> 
> Thanks,
> bin
>>
>> Richard.
>>
>>> Martin



Re: How to configure a bi-arch PowerPC GCC?

2017-09-14 Thread Sebastian Huber

On 13/09/17 15:11, Andreas Schwab wrote:


On Jul 20 2017, Sebastian Huber  wrote:


Ok, so why do I get a "error: unrecognizable insn:"? How can I debug a
message like this:

(insn 12 11 13 2 (set (reg:CCFP 126)
 (compare:CCFP (reg:TF 123)
 (reg:TF 124))) "test-v0.i":5 -1
  (nil))

This is supposed to be matched by the cmptf_internal1 pattern with
-mabi=ibmlongdouble.  Looks like your configuration defaults to
-mabi=ieeelongdouble.


Yes, originally I tried to enable the 128-bit IEEE float support. I use 
now the default settings.


--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška  wrote:
> On 09/14/2017 12:37 PM, Bin.Cheng wrote:
>> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
>>  wrote:
>>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
 On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
>> wrote:
>>> On 12/09/17 16:57, Wilco Dijkstra wrote:

 [...] As a result users are
 required to enable several additional optimizations by hand to get good
 code.
 Other compilers enable more optimizations at -O2 (loop unrolling in 
 LLVM
 was
 mentioned repeatedly) which GCC could/should do as well.
 [...]

 I'd welcome discussion and other proposals for similar improvements.
>>>
>>>
>>> What's the status of graphite? It's been around for years. Isn't it 
>>> mature
>>> enough to enable these:
>>>
>>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
>>> -floop-block
>>>
>>> by default for -O2? (And I'm not even sure those are the complete set of
>>> graphite optimization flags, or just the "useful" ones.)
>>
>> It's not on by default at any optimization level.  The main issue is the
>> lack of maintainance and a set of known common internal compiler errors
>> we hit.  The other issue is that there's no benefit of turning those on 
>> for
>> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>> compile-time cost.
>
> Not to mention the numerous wrong-code bugs. IMHO graphite should
> deprecated as soon as possible.
>

 For wrong-code bugs we've got and I recently went through, I fully agree 
 with this
 approach and I would do it for GCC 8. There are PRs where order of simple 
 2 loops
 is changed, causing wrong-code as there's a data dependence.

 Moreover, I know that Bin was thinking about selection whether to use 
 classical loop
 optimizations or Graphite (depending on options provided). This would 
 simplify it ;)
>>>
>>> I don't think removing graphite is warranted, I still think it is the
>>> approach to use when
>>> handling non-perfect nests.
>> Hi,
>> IMHO, we should not be in a hurry to remove graphite, though we are
>> introducing some traditional transformations.  It's a quite standalone
>> part in GCC and supports more transformations.  Also as it gets more
>> attention, never know if somebody will find time to work on it.
>
> Ok. I just wanted to express that from user's perspective I would not 
> recommend it to use.
> Even if it improves some interesting (and for classical loop optimization 
> hard) loop nests,
> it can still blow up on a quite simple data dependence in between loops. That 
> said, it's quite
> risky to use it.

We only have a single wrong-code bug in bugzilla with a testcase and I
just fixed it (well,
patch in testing).  We do have plenty of ICEs, yes.

Richard.

> Thanks,
> Martin
>
>>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
 Martin
>


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Markus Trippelsdorf
On 2017.09.14 at 14:48 +0200, Richard Biener wrote:
> On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška  wrote:
> > On 09/14/2017 12:37 PM, Bin.Cheng wrote:
> >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
> >>  wrote:
> >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>  On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
> > On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
> >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
> >> wrote:
> >>> On 12/09/17 16:57, Wilco Dijkstra wrote:
> 
>  [...] As a result users are
>  required to enable several additional optimizations by hand to get 
>  good
>  code.
>  Other compilers enable more optimizations at -O2 (loop unrolling in 
>  LLVM
>  was
>  mentioned repeatedly) which GCC could/should do as well.
>  [...]
> 
>  I'd welcome discussion and other proposals for similar improvements.
> >>>
> >>>
> >>> What's the status of graphite? It's been around for years. Isn't it 
> >>> mature
> >>> enough to enable these:
> >>>
> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
> >>> -floop-block
> >>>
> >>> by default for -O2? (And I'm not even sure those are the complete set 
> >>> of
> >>> graphite optimization flags, or just the "useful" ones.)
> >>
> >> It's not on by default at any optimization level.  The main issue is 
> >> the
> >> lack of maintainance and a set of known common internal compiler errors
> >> we hit.  The other issue is that there's no benefit of turning those 
> >> on for
> >> SPEC CPU benchmarking as far as I remember but quite a bit of extra
> >> compile-time cost.
> >
> > Not to mention the numerous wrong-code bugs. IMHO graphite should
> > deprecated as soon as possible.
> >
> 
>  For wrong-code bugs we've got and I recently went through, I fully agree 
>  with this
>  approach and I would do it for GCC 8. There are PRs where order of 
>  simple 2 loops
>  is changed, causing wrong-code as there's a data dependence.
> 
>  Moreover, I know that Bin was thinking about selection whether to use 
>  classical loop
>  optimizations or Graphite (depending on options provided). This would 
>  simplify it ;)
> >>>
> >>> I don't think removing graphite is warranted, I still think it is the
> >>> approach to use when
> >>> handling non-perfect nests.
> >> Hi,
> >> IMHO, we should not be in a hurry to remove graphite, though we are
> >> introducing some traditional transformations.  It's a quite standalone
> >> part in GCC and supports more transformations.  Also as it gets more
> >> attention, never know if somebody will find time to work on it.
> >
> > Ok. I just wanted to express that from user's perspective I would not 
> > recommend it to use.
> > Even if it improves some interesting (and for classical loop optimization 
> > hard) loop nests,
> > it can still blow up on a quite simple data dependence in between loops. 
> > That said, it's quite
> > risky to use it.
> 
> We only have a single wrong-code bug in bugzilla with a testcase and I
> just fixed it (well,
> patch in testing).  We do have plenty of ICEs, yes.

Even tramp3d-v4, which is cited in several graphite papers, gets
miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823.

-- 
Markus


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 3:08 PM, Markus Trippelsdorf
 wrote:
> On 2017.09.14 at 14:48 +0200, Richard Biener wrote:
>> On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška  wrote:
>> > On 09/14/2017 12:37 PM, Bin.Cheng wrote:
>> >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
>> >>  wrote:
>> >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>>  On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
>> > On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>> >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
>> >> wrote:
>> >>> On 12/09/17 16:57, Wilco Dijkstra wrote:
>> 
>>  [...] As a result users are
>>  required to enable several additional optimizations by hand to get 
>>  good
>>  code.
>>  Other compilers enable more optimizations at -O2 (loop unrolling in 
>>  LLVM
>>  was
>>  mentioned repeatedly) which GCC could/should do as well.
>>  [...]
>> 
>>  I'd welcome discussion and other proposals for similar improvements.
>> >>>
>> >>>
>> >>> What's the status of graphite? It's been around for years. Isn't it 
>> >>> mature
>> >>> enough to enable these:
>> >>>
>> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
>> >>> -floop-block
>> >>>
>> >>> by default for -O2? (And I'm not even sure those are the complete 
>> >>> set of
>> >>> graphite optimization flags, or just the "useful" ones.)
>> >>
>> >> It's not on by default at any optimization level.  The main issue is 
>> >> the
>> >> lack of maintainance and a set of known common internal compiler 
>> >> errors
>> >> we hit.  The other issue is that there's no benefit of turning those 
>> >> on for
>> >> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>> >> compile-time cost.
>> >
>> > Not to mention the numerous wrong-code bugs. IMHO graphite should
>> > deprecated as soon as possible.
>> >
>> 
>>  For wrong-code bugs we've got and I recently went through, I fully 
>>  agree with this
>>  approach and I would do it for GCC 8. There are PRs where order of 
>>  simple 2 loops
>>  is changed, causing wrong-code as there's a data dependence.
>> 
>>  Moreover, I know that Bin was thinking about selection whether to use 
>>  classical loop
>>  optimizations or Graphite (depending on options provided). This would 
>>  simplify it ;)
>> >>>
>> >>> I don't think removing graphite is warranted, I still think it is the
>> >>> approach to use when
>> >>> handling non-perfect nests.
>> >> Hi,
>> >> IMHO, we should not be in a hurry to remove graphite, though we are
>> >> introducing some traditional transformations.  It's a quite standalone
>> >> part in GCC and supports more transformations.  Also as it gets more
>> >> attention, never know if somebody will find time to work on it.
>> >
>> > Ok. I just wanted to express that from user's perspective I would not 
>> > recommend it to use.
>> > Even if it improves some interesting (and for classical loop optimization 
>> > hard) loop nests,
>> > it can still blow up on a quite simple data dependence in between loops. 
>> > That said, it's quite
>> > risky to use it.
>>
>> We only have a single wrong-code bug in bugzilla with a testcase and I
>> just fixed it (well,
>> patch in testing).  We do have plenty of ICEs, yes.
>
> Even tramp3d-v4, which is cited in several graphite papers, gets
> miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823.

But unfortunately there isn't a self-contained testcase for that.  The comments
hint at sth like

int a[][];
p = &a[1][0];
for(;;)
  a[i][j] = ...
  p[i] = ...

would get at it, that is, accessing memory via two-dim array and pointer.

Richard.

> --
> Markus


Infering that the condition of a for loop is initially true?

2017-09-14 Thread Niels Möller
This is more of a question than a bug report, so I'm trying to send it
to the list rather than filing a bugzilla issue.

I think it's quite common to write for- and while-loops where the
condition is always initially true. A simple example might be

double average (const double *a, size_t n) 
{
  double sum;
  size_t i;

  assert (n > 0);
  for (i = 0, sum = 0; i < n; i++) 
sum += a[i];
  return sum / n;
}

The programmer could do the microptimization to rewrite it as a
do-while-loop instead. It would be nice if gcc could infer that the
condition is initially true, and convert to a do-while loop
automatically. 

Converting to a do-while-loop should produce slightly better code,
omitting the typical jump to enter the loop at the end where the
condition is checked. It would also make analysis of where variables are
written more accurate, which is my main concern at the moment.

My questions are:

1. Does gcc attempt to do this optimization? 

2. If it does, how often does it succeed on loops in real programs?

3. Can I help the compiler to do that inference?

The code I had some trouble with is at
https://git.lysator.liu.se/nettle/nettle/blob/master/ecc-mod.c. A
simplified version with only the interesting code path would be

void
ecc_mod (mp_size_t mn, mp_size_t bn, mp_limb_t *rp)
{
  mp_limb_t hi;
  mp_size_t sn = mn - bn;
  mp_size_t rn = 2*mn;

  assert (bn < mn);

  while (rn >= 2 * mn - bn)
{
  rn -= sn;
  ... code which sets hi ...
}

  ... code which uses hi ...
}

The intention is that the loop will run at least once, but
nevertheless, in this context, rewriting as do-while would make the
code uglier, imo. 

It looks obvious at first glance that the initial value rn = 2*mn
makes the condition rn >= 2*mn - bn true. All values are unsigned, and
the assert backs up that mn - bn won't underflow.

This is also used with pretty small mn, so that 2*mn won't overflow,
but unfortunately, that's not obvious to the compiler. In theory, the
function could be called with, e.g., on a 64-bit machine, mn ==
(1 << 63) and bn == 1, and then the initial loop condition would be 
0 >= ~0, which is false. So overflow cases seems to rule out the
optimization. 

I've seen gcc warn about hi being used unused in this function (but
only on sparc, and on a machine I no longer use, so I'm afraid I can't
provide details). I've also seen warnings from the clang static
analyzer (which I guess makes different tradeoffs than gcc -Wall when
it comes to false positives).

I would guess it's quite common that conditions which are always true
for relevant inputs may be false due to overflow with extreme inputs,
which in the case of unsigned variables doesn't leave any "undefined
behaviour"-freedom to the compiler.

What's the best way to tell the compiler, to promise that there will
be no overflows involving mn? I could try adding an assert, like

  rn = 2*mn; assert (rn > mn);

to rule out overflow, but that looks a bit unobvious to humans and
possibly unobvious to the compiler too.

Regards,
/Niels

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.


Re: Infering that the condition of a for loop is initially true?

2017-09-14 Thread Marc Glisse

On Thu, 14 Sep 2017, Niels Möller wrote:


This is more of a question than a bug report, so I'm trying to send it
to the list rather than filing a bugzilla issue.

I think it's quite common to write for- and while-loops where the
condition is always initially true. A simple example might be

double average (const double *a, size_t n)
{
 double sum;
 size_t i;

 assert (n > 0);
 for (i = 0, sum = 0; i < n; i++)
   sum += a[i];
 return sum / n;
}

The programmer could do the microptimization to rewrite it as a
do-while-loop instead. It would be nice if gcc could infer that the
condition is initially true, and convert to a do-while loop
automatically.

Converting to a do-while-loop should produce slightly better code,
omitting the typical jump to enter the loop at the end where the
condition is checked. It would also make analysis of where variables are
written more accurate, which is my main concern at the moment.


Hello,

assert is not what you want, since it completely disappears with -DNDEBUG. 
clang has __builtin_assume, with gcc you want a test and 
__builtin_unreachable. Replacing your assert with

if(n==0)__builtin_unreachable();
gcc does skip the first test of the loop, as can be seen in the dump
produced with -fdump-tree-optimized.

--
Marc Glisse


Re: Infering that the condition of a for loop is initially true?

2017-09-14 Thread Geza Herman

Hi,


On 09/14/2017 09:50 PM, Marc Glisse wrote:

On Thu, 14 Sep 2017, Niels Möller wrote:


This is more of a question than a bug report, so I'm trying to send it
to the list rather than filing a bugzilla issue.

I think it's quite common to write for- and while-loops where the
condition is always initially true. A simple example might be

double average (const double *a, size_t n)
{
 double sum;
 size_t i;

 assert (n > 0);
 for (i = 0, sum = 0; i < n; i++)
   sum += a[i];
 return sum / n;
}

The programmer could do the microptimization to rewrite it as a
do-while-loop instead. It would be nice if gcc could infer that the
condition is initially true, and convert to a do-while loop
automatically.

Converting to a do-while-loop should produce slightly better code,
omitting the typical jump to enter the loop at the end where the
condition is checked. It would also make analysis of where variables are
written more accurate, which is my main concern at the moment.


Hello,

assert is not what you want, since it completely disappears with 
-DNDEBUG. clang has __builtin_assume, with gcc you want a test and 
__builtin_unreachable. Replacing your assert with

if(n==0)__builtin_unreachable();
gcc does skip the first test of the loop, as can be seen in the dump
produced with -fdump-tree-optimized.



Do you plan adding something like __builtin_assume to GCC? It is useful, 
because of this idiom to help the compiler to optimize better:


#ifdef MY_DEBUG
#define MY_ASSERT(assertion) do { if (!(assertion)) ... } while (0)
#else
#define MY_ASSERT(assertion) __builtin_assume(assertion)
#endif

It is not the same as "if (!(assertion)) __builtin_unreachable();" 
because __builtin_assume discards any side effects of "assertion".


Thanks,
Geza


Re: Infering that the condition of a for loop is initially true?

2017-09-14 Thread Jeff Law
On 09/14/2017 01:28 PM, Niels Möller wrote:
> This is more of a question than a bug report, so I'm trying to send it
> to the list rather than filing a bugzilla issue.
> 
> I think it's quite common to write for- and while-loops where the
> condition is always initially true. A simple example might be
> 
> double average (const double *a, size_t n) 
> {
>   double sum;
>   size_t i;
> 
>   assert (n > 0);
>   for (i = 0, sum = 0; i < n; i++) 
> sum += a[i];
>   return sum / n;
> }
> 
> The programmer could do the microptimization to rewrite it as a
> do-while-loop instead. It would be nice if gcc could infer that the
> condition is initially true, and convert to a do-while loop
> automatically. 
> 
> Converting to a do-while-loop should produce slightly better code,
> omitting the typical jump to enter the loop at the end where the
> condition is checked. It would also make analysis of where variables are
> written more accurate, which is my main concern at the moment.
> 
> My questions are:
> 
> 1. Does gcc attempt to do this optimization? 
Yes.  It happens as a side effect of jump threading and there are also
dedicated passes to rotate the loop.

> 
> 2. If it does, how often does it succeed on loops in real programs?
Often.  The net benefit is actually small though and sometimes this kind
of loop rotation can impede vectorization.


> 
> 3. Can I help the compiler to do that inference?
In general, I'd advise against it.  You end up with ugly code which
works with specific versions of the compiler, but which needs regular
tweaking as the internal implementations of various optimizers change
over time.


> 
> The code I had some trouble with is at
> https://git.lysator.liu.se/nettle/nettle/blob/master/ecc-mod.c. A
> simplified version with only the interesting code path would be
> 
> void
> ecc_mod (mp_size_t mn, mp_size_t bn, mp_limb_t *rp)
> {
>   mp_limb_t hi;
>   mp_size_t sn = mn - bn;
>   mp_size_t rn = 2*mn;
> 
>   assert (bn < mn);
> 
>   while (rn >= 2 * mn - bn)
In this particular case (ignoring the assert), what you want is better
jump threading exploiting range propagation.   But you have to be real
careful here due to the potential overflow.

I'd have to have a self-contained example to dig into what's really
going on, but my suspicion is either overflow or fairly weak range data
and simplification due to the symbolic ranges.

Jeff



gcc-7-20170914 is now available

2017-09-14 Thread gccadmin
Snapshot gcc-7-20170914 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20170914/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 252775

You'll find:

 gcc-7-20170914.tar.xzComplete GCC

  SHA256=38e05090c2b43dd11d710e729f4dbaf79bc29d8e5b43771cab4415267bc40d22
  SHA1=7c52a1ec855b2a954b5812fba2be50dcba58f2c2

Diffs from 7-20170907 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.