Re: Byte swapping support
On 12/09/17 20:56, Michael Meissner wrote: > On Tue, Sep 12, 2017 at 05:26:29PM +0200, David Brown wrote: >> On 12/09/17 16:15, paul.kon...@dell.com wrote: >>> On Sep 12, 2017, at 5:32 AM, Jürg Billeter wrote: Hi, To support applications that assume big-endian memory layout on little- endian systems, I'm considering adding support for reversing the storage order to GCC. In contrast to the existing scalar storage order support for structs, the goal is to reverse the storage order for all memory operations to achieve maximum compatibility with the behavior on big-endian systems, as far as observable by the application. >>> >>> I've done this in the past by C++ type magic. As a general setting >>> it doesn't make sense that I can see. As an attribute applied to a >>> particular data item, it does. But I'm not sure why you'd put this in >>> the compiler when programmers can do it easily enough by defining a "big >>> endian int32" class, etc. >>> >> >> Some people use the compiler for C rather than C++ ... >> >> If someone wants to improve on the endianness support in gcc, I can >> think of a few ideas that /I/ think might be useful. I have no idea how >> difficult they might be to put in practice, and can't say if they would >> be of interest to others. >> >> First, I would like to see endianness given as a named address space, >> rather than as a type attribute. A key point here is that named address >> spaces are effectively qualifiers, like "const" and "volatile" - and you >> can then use them in pointers: > > When I gave the talk at the 2009 GCC summit on the named address support, I > thought that it could be used to add endianess support. In fact at one time, > I > had a trial PowerPC compiler that added endianess support. Unfortunately, > that > was in 2009, and I lost the directory of the work. I tried again a few years > ago, but I didn't get far enough into it to get a working compiler before > being > pulled back into work. > > Back when I worked at Cygnus Solutions, we used to get requests to add endian > support every so often, but nobody wanted to pay the cost that we were then > quoting to add the support. Now that named address support is in, it could be > done better. > > I suspect however, you want to do this at the higher tree level, adding in the > endianess bits in a separate area than the named address support. Or perhaps, > growing the named address support, and adding several standard named > addresses. > > The paper where I talked about the named address support was from the 2009 GCC > summit. You can download the proceedings of the 2009 summit from here (my > paper is pages 67-74): > https://en.wikipedia.org/wiki/GCC_Summit It's nice to know I am not the only one who things named address spaces are a logical way to go for endianness support. I fully appreciate, of course, that even if named address spaces are nicer for users, a balance must be found for the practicality of the implementation. If today's type attribute is easier to implement and maintain, then that is entirely understandable. This is not a big issue as far as I am concerned - it does not make sense to implement it unless someone has the time and inclination, or someone is willing to pay for the time. > >> big_endian uint32_t be_buffer[20]; >> little_endian uint32_t le_buffer[20]; >> >> void copy_buffers(const big_endian uint32_t * src, >> little_endian uint32_t * dest) >> { >> for (int i = 0; i < 20; i++) { >> dest[i] = src[i]; // Swaps endianness on copy >> } >> } >> >> That would also let you use them for scaler types, not just structs, and >> you could use typedefs: >> >> typedef big_endian uint32_t be_uint32_t; >> >> >> Secondly, I would add more endian types. As well as big_endian and >> little_endian, I would add native_endian and reverse_endian. These >> could let you write a little clearer definitions sometimes. And ideally >> I would like mixed endian with big-endian 16-bit ordering and >> little-endian ordering for bigger types (i.e., 0x87654321 would be >> stored 0x43, 0x21, 0x87, 0x65). That order matches some protocols, such >> as Modbus. > > It depends, you can add so many different combinations, that in the end you > don't add the support you want because of th 53 other variants. > I'd stick to the basic four, and perhaps add two more ("PDP endian", and a sort of reverse PDP endian). I can't see there being scope for so very many different endiannesses, but I suppose one could mix in bit endianness to the pile. > Note, if you use it in named addresses, you are currently limited to 15 new > keywords for adding named address support. This can be grown, but you should > know about the limit ahead of time. Of course if you add it in a parallel, > machine independent version, then you don't have to worry about the existing > limits. I am sure that if gcc started usi
Re: Byte swapping support
On 14/09/17 08:22, Eric Botcazou wrote: >> And there are lots of other problems, I don't have time to document them >> all, or even remember them all. Personally, I think you are better off >> trying to fix the application to make it more portable. Fixing the >> compiler is not a magic solution to the problem that is any easier than >> fixing the application. > > Note that WRS' Diab compiler has got something equivalent to what GCC has got > now, i.e. a way to tag a particular component in a structure as BE or LE. > It is over 20 years ago since I played with a Diab Data compiler, for the m68k. I seem to remember it being able to attach a big-endian or little-endian label to any individual variable (rather than a type), which could be a scaler rather than a struct. So it was a bit more flexible than gcc. It also had a more flexible inline assembly system than gcc. And it had a /massive/ price tag, way beyond our reach at the time - so I only had a demo version :-( David
Re: Byte swapping support
> I seem to remember it being able to attach a big-endian or little-endian > label to any individual variable (rather than a type), which could be a > scaler rather than a struct. So it was a bit more flexible than gcc. Well, the only thing I see in the documentation for "Byte Ordering" is the reference to pragma Pack and the __packed__ keyword for structures, which can toggle byte ordering by means of the byte-swap argument: "#pragma pack [ ([[max_member_alignment] , [min_structure_alignment][, byte-swap ]] ) ] The pack directive specifies that all subsequent structures..." with the same limitation as GCC about taking the address: "It is not possible to take the address of a byte-swapped member." -- Eric Botcazou
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen wrote: > On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote: >> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: >> > On its own -O3 doesn't add much (some loop opts and slightly more >> > aggressive inlining/unrolling), so whatever it does we >> > should consider doing at -O2 eventually. >> >> Well, -O3 adds vectorization, which we don't enable at -O2 by default. >> > Would it be possible to enable basic block vectorization on -O2? I assume that > doesn't increase binary size since it doesn't unroll loops. Somebody needs to provide benchmarking looking at the compile-time cost vs. the runtime benefit and the code size effect. There's also room to tune aggressiveness of BB vectorization as it currently allows for cases where the scalar computation is not fully replaced by vector code. Richard. > 'Allan >
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: > On 12/09/17 16:57, Wilco Dijkstra wrote: >> >> [...] As a result users are >> required to enable several additional optimizations by hand to get good >> code. >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM >> was >> mentioned repeatedly) which GCC could/should do as well. >> [...] >> >> I'd welcome discussion and other proposals for similar improvements. > > > What's the status of graphite? It's been around for years. Isn't it mature > enough to enable these: > > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block > > by default for -O2? (And I'm not even sure those are the complete set of > graphite optimization flags, or just the "useful" ones.) It's not on by default at any optimization level. The main issue is the lack of maintainance and a set of known common internal compiler errors we hit. The other issue is that there's no benefit of turning those on for SPEC CPU benchmarking as far as I remember but quite a bit of extra compile-time cost. Richard.
Re: RFC: Improving GCC8 default option settings
On 2017.09.14 at 11:57 +0200, Richard Biener wrote: > On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: > > On 12/09/17 16:57, Wilco Dijkstra wrote: > >> > >> [...] As a result users are > >> required to enable several additional optimizations by hand to get good > >> code. > >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM > >> was > >> mentioned repeatedly) which GCC could/should do as well. > >> [...] > >> > >> I'd welcome discussion and other proposals for similar improvements. > > > > > > What's the status of graphite? It's been around for years. Isn't it mature > > enough to enable these: > > > > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block > > > > by default for -O2? (And I'm not even sure those are the complete set of > > graphite optimization flags, or just the "useful" ones.) > > It's not on by default at any optimization level. The main issue is the > lack of maintainance and a set of known common internal compiler errors > we hit. The other issue is that there's no benefit of turning those on for > SPEC CPU benchmarking as far as I remember but quite a bit of extra > compile-time cost. Not to mention the numerous wrong-code bugs. IMHO graphite should deprecated as soon as possible. -- Markus
Re: RFC: Improving GCC8 default option settings
> On 14 Sep 2017, at 3:06 AM, Allan Sandfeld Jensen wrote: > > On Dienstag, 12. September 2017 23:27:22 CEST Michael Clark wrote: >>> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra wrote: >>> >>> Hi all, >>> >>> At the GNU Cauldron I was inspired by several interesting talks about >>> improving GCC in various ways. While GCC has many great optimizations, a >>> common theme is that its default settings are rather conservative. As a >>> result users are required to enable several additional optimizations by >>> hand to get good code. Other compilers enable more optimizations at -O2 >>> (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should >>> do as well. >> >> There are some nuances to -O2. Please consider -O2 users who wish use it >> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). >> >> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase >> code size can be skipped from -Os without drastically effecting >> performance. >> >> This is not the case with GCC where -Os is a size at all costs optimisation >> mode. GCC users option for size not at the expense of speed is to use -O2. >> >> ClangGCC >> -Oz ~= -Os >> -Os ~= -O2 >> > No. Clang's -Os is somewhat limited compared to gcc's, just like the clang > -Og > is just -O1. AFAIK -Oz is a proprietary Apple clang parameter, and not in > clang proper. It appears to be in mainline clang. mclark@anarch128:~$ clang -Oz -c a.c -o a.o mclark@anarch128:~$ clang -Ox -c a.c -o a.o error: invalid integral value 'x' in '-Ox' error: invalid integral value 'x' in '-Ox' mclark@anarch128:~$ uname -a Linux anarch128.org 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64 GNU/Linux mclark@anarch128:~$ clang --version clang version 3.8.1-24 (tags/RELEASE_381/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin I still think it would be unfortunate to loose the size/speed sweet spot of -O2 by adding optimisations that increase code size, unless there was a size optimisation option that was derived from -O2 at the point -O2 is souped up. i.e. create an -O2s (or renaming -Os to -Oz and deriving the new -Os from the current -O2). I’m going to start looking at this point to see whats involved in making a patch. Distros want a balance or size and speed might even pick it up, even if it is not accepted in mainline.
Re: RFC: Improving GCC8 default option settings
On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: >>> On 12/09/17 16:57, Wilco Dijkstra wrote: [...] As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. [...] I'd welcome discussion and other proposals for similar improvements. >>> >>> >>> What's the status of graphite? It's been around for years. Isn't it mature >>> enough to enable these: >>> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block >>> >>> by default for -O2? (And I'm not even sure those are the complete set of >>> graphite optimization flags, or just the "useful" ones.) >> >> It's not on by default at any optimization level. The main issue is the >> lack of maintainance and a set of known common internal compiler errors >> we hit. The other issue is that there's no benefit of turning those on for >> SPEC CPU benchmarking as far as I remember but quite a bit of extra >> compile-time cost. > > Not to mention the numerous wrong-code bugs. IMHO graphite should > deprecated as soon as possible. > For wrong-code bugs we've got and I recently went through, I fully agree with this approach and I would do it for GCC 8. There are PRs where order of simple 2 loops is changed, causing wrong-code as there's a data dependence. Moreover, I know that Bin was thinking about selection whether to use classical loop optimizations or Graphite (depending on options provided). This would simplify it ;) Martin
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: > On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: >> On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: On 12/09/17 16:57, Wilco Dijkstra wrote: > > [...] As a result users are > required to enable several additional optimizations by hand to get good > code. > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM > was > mentioned repeatedly) which GCC could/should do as well. > [...] > > I'd welcome discussion and other proposals for similar improvements. What's the status of graphite? It's been around for years. Isn't it mature enough to enable these: -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block by default for -O2? (And I'm not even sure those are the complete set of graphite optimization flags, or just the "useful" ones.) >>> >>> It's not on by default at any optimization level. The main issue is the >>> lack of maintainance and a set of known common internal compiler errors >>> we hit. The other issue is that there's no benefit of turning those on for >>> SPEC CPU benchmarking as far as I remember but quite a bit of extra >>> compile-time cost. >> >> Not to mention the numerous wrong-code bugs. IMHO graphite should >> deprecated as soon as possible. >> > > For wrong-code bugs we've got and I recently went through, I fully agree with > this > approach and I would do it for GCC 8. There are PRs where order of simple 2 > loops > is changed, causing wrong-code as there's a data dependence. > > Moreover, I know that Bin was thinking about selection whether to use > classical loop > optimizations or Graphite (depending on options provided). This would > simplify it ;) I don't think removing graphite is warranted, I still think it is the approach to use when handling non-perfect nests. Richard. > Martin
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener wrote: > On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: >> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: >>> On 2017.09.14 at 11:57 +0200, Richard Biener wrote: On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: > On 12/09/17 16:57, Wilco Dijkstra wrote: >> >> [...] As a result users are >> required to enable several additional optimizations by hand to get good >> code. >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM >> was >> mentioned repeatedly) which GCC could/should do as well. >> [...] >> >> I'd welcome discussion and other proposals for similar improvements. > > > What's the status of graphite? It's been around for years. Isn't it mature > enough to enable these: > > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block > > by default for -O2? (And I'm not even sure those are the complete set of > graphite optimization flags, or just the "useful" ones.) It's not on by default at any optimization level. The main issue is the lack of maintainance and a set of known common internal compiler errors we hit. The other issue is that there's no benefit of turning those on for SPEC CPU benchmarking as far as I remember but quite a bit of extra compile-time cost. >>> >>> Not to mention the numerous wrong-code bugs. IMHO graphite should >>> deprecated as soon as possible. >>> >> >> For wrong-code bugs we've got and I recently went through, I fully agree >> with this >> approach and I would do it for GCC 8. There are PRs where order of simple 2 >> loops >> is changed, causing wrong-code as there's a data dependence. >> >> Moreover, I know that Bin was thinking about selection whether to use >> classical loop >> optimizations or Graphite (depending on options provided). This would >> simplify it ;) > > I don't think removing graphite is warranted, I still think it is the > approach to use when > handling non-perfect nests. Hi, IMHO, we should not be in a hurry to remove graphite, though we are introducing some traditional transformations. It's a quite standalone part in GCC and supports more transformations. Also as it gets more attention, never know if somebody will find time to work on it. Thanks, bin > > Richard. > >> Martin
Re: Byte swapping support
On 14/09/17 11:30, Eric Botcazou wrote: >> I seem to remember it being able to attach a big-endian or little-endian >> label to any individual variable (rather than a type), which could be a >> scaler rather than a struct. So it was a bit more flexible than gcc. > > Well, the only thing I see in the documentation for "Byte Ordering" is the > reference to pragma Pack and the __packed__ keyword for structures, which can > toggle byte ordering by means of the byte-swap argument: > > "#pragma pack > [ ([[max_member_alignment] , [min_structure_alignment][, byte-swap ]] ) ] > > The pack directive specifies that all subsequent structures..." > > with the same limitation as GCC about taking the address: > > "It is not possible to take the address of a byte-swapped member." > It is, as I say, a /long/ time since I looked at it. I could easily be remembering incorrectly - and it could also be a difference in the versions. Being unable to take the address of a byte-swapped member is a reasonable and understandable limitation when byte-swapping is expressed this way. It would be one of the advantages in using a named address space - then you /could/ take a pointer, but it would be a pointer qualified by the address space name, and incompatible with normal pointers.
Re: RFC: Improving GCC8 default option settings
On 09/14/2017 12:37 PM, Bin.Cheng wrote: > On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener > wrote: >> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: >>> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: On 2017.09.14 at 11:57 +0200, Richard Biener wrote: > On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras > wrote: >> On 12/09/17 16:57, Wilco Dijkstra wrote: >>> >>> [...] As a result users are >>> required to enable several additional optimizations by hand to get good >>> code. >>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM >>> was >>> mentioned repeatedly) which GCC could/should do as well. >>> [...] >>> >>> I'd welcome discussion and other proposals for similar improvements. >> >> >> What's the status of graphite? It's been around for years. Isn't it >> mature >> enough to enable these: >> >> -floop-interchange -ftree-loop-distribution -floop-strip-mine >> -floop-block >> >> by default for -O2? (And I'm not even sure those are the complete set of >> graphite optimization flags, or just the "useful" ones.) > > It's not on by default at any optimization level. The main issue is the > lack of maintainance and a set of known common internal compiler errors > we hit. The other issue is that there's no benefit of turning those on > for > SPEC CPU benchmarking as far as I remember but quite a bit of extra > compile-time cost. Not to mention the numerous wrong-code bugs. IMHO graphite should deprecated as soon as possible. >>> >>> For wrong-code bugs we've got and I recently went through, I fully agree >>> with this >>> approach and I would do it for GCC 8. There are PRs where order of simple 2 >>> loops >>> is changed, causing wrong-code as there's a data dependence. >>> >>> Moreover, I know that Bin was thinking about selection whether to use >>> classical loop >>> optimizations or Graphite (depending on options provided). This would >>> simplify it ;) >> >> I don't think removing graphite is warranted, I still think it is the >> approach to use when >> handling non-perfect nests. > Hi, > IMHO, we should not be in a hurry to remove graphite, though we are > introducing some traditional transformations. It's a quite standalone > part in GCC and supports more transformations. Also as it gets more > attention, never know if somebody will find time to work on it. Ok. I just wanted to express that from user's perspective I would not recommend it to use. Even if it improves some interesting (and for classical loop optimization hard) loop nests, it can still blow up on a quite simple data dependence in between loops. That said, it's quite risky to use it. Thanks, Martin > > Thanks, > bin >> >> Richard. >> >>> Martin
Re: How to configure a bi-arch PowerPC GCC?
On 13/09/17 15:11, Andreas Schwab wrote: On Jul 20 2017, Sebastian Huber wrote: Ok, so why do I get a "error: unrecognizable insn:"? How can I debug a message like this: (insn 12 11 13 2 (set (reg:CCFP 126) (compare:CCFP (reg:TF 123) (reg:TF 124))) "test-v0.i":5 -1 (nil)) This is supposed to be matched by the cmptf_internal1 pattern with -mabi=ibmlongdouble. Looks like your configuration defaults to -mabi=ieeelongdouble. Yes, originally I tried to enable the 128-bit IEEE float support. I use now the default settings. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška wrote: > On 09/14/2017 12:37 PM, Bin.Cheng wrote: >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener >> wrote: >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras >> wrote: >>> On 12/09/17 16:57, Wilco Dijkstra wrote: [...] As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. [...] I'd welcome discussion and other proposals for similar improvements. >>> >>> >>> What's the status of graphite? It's been around for years. Isn't it >>> mature >>> enough to enable these: >>> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine >>> -floop-block >>> >>> by default for -O2? (And I'm not even sure those are the complete set of >>> graphite optimization flags, or just the "useful" ones.) >> >> It's not on by default at any optimization level. The main issue is the >> lack of maintainance and a set of known common internal compiler errors >> we hit. The other issue is that there's no benefit of turning those on >> for >> SPEC CPU benchmarking as far as I remember but quite a bit of extra >> compile-time cost. > > Not to mention the numerous wrong-code bugs. IMHO graphite should > deprecated as soon as possible. > For wrong-code bugs we've got and I recently went through, I fully agree with this approach and I would do it for GCC 8. There are PRs where order of simple 2 loops is changed, causing wrong-code as there's a data dependence. Moreover, I know that Bin was thinking about selection whether to use classical loop optimizations or Graphite (depending on options provided). This would simplify it ;) >>> >>> I don't think removing graphite is warranted, I still think it is the >>> approach to use when >>> handling non-perfect nests. >> Hi, >> IMHO, we should not be in a hurry to remove graphite, though we are >> introducing some traditional transformations. It's a quite standalone >> part in GCC and supports more transformations. Also as it gets more >> attention, never know if somebody will find time to work on it. > > Ok. I just wanted to express that from user's perspective I would not > recommend it to use. > Even if it improves some interesting (and for classical loop optimization > hard) loop nests, > it can still blow up on a quite simple data dependence in between loops. That > said, it's quite > risky to use it. We only have a single wrong-code bug in bugzilla with a testcase and I just fixed it (well, patch in testing). We do have plenty of ICEs, yes. Richard. > Thanks, > Martin > >> >> Thanks, >> bin >>> >>> Richard. >>> Martin >
Re: RFC: Improving GCC8 default option settings
On 2017.09.14 at 14:48 +0200, Richard Biener wrote: > On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška wrote: > > On 09/14/2017 12:37 PM, Bin.Cheng wrote: > >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener > >> wrote: > >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: > On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: > > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: > >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras > >> wrote: > >>> On 12/09/17 16:57, Wilco Dijkstra wrote: > > [...] As a result users are > required to enable several additional optimizations by hand to get > good > code. > Other compilers enable more optimizations at -O2 (loop unrolling in > LLVM > was > mentioned repeatedly) which GCC could/should do as well. > [...] > > I'd welcome discussion and other proposals for similar improvements. > >>> > >>> > >>> What's the status of graphite? It's been around for years. Isn't it > >>> mature > >>> enough to enable these: > >>> > >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine > >>> -floop-block > >>> > >>> by default for -O2? (And I'm not even sure those are the complete set > >>> of > >>> graphite optimization flags, or just the "useful" ones.) > >> > >> It's not on by default at any optimization level. The main issue is > >> the > >> lack of maintainance and a set of known common internal compiler errors > >> we hit. The other issue is that there's no benefit of turning those > >> on for > >> SPEC CPU benchmarking as far as I remember but quite a bit of extra > >> compile-time cost. > > > > Not to mention the numerous wrong-code bugs. IMHO graphite should > > deprecated as soon as possible. > > > > For wrong-code bugs we've got and I recently went through, I fully agree > with this > approach and I would do it for GCC 8. There are PRs where order of > simple 2 loops > is changed, causing wrong-code as there's a data dependence. > > Moreover, I know that Bin was thinking about selection whether to use > classical loop > optimizations or Graphite (depending on options provided). This would > simplify it ;) > >>> > >>> I don't think removing graphite is warranted, I still think it is the > >>> approach to use when > >>> handling non-perfect nests. > >> Hi, > >> IMHO, we should not be in a hurry to remove graphite, though we are > >> introducing some traditional transformations. It's a quite standalone > >> part in GCC and supports more transformations. Also as it gets more > >> attention, never know if somebody will find time to work on it. > > > > Ok. I just wanted to express that from user's perspective I would not > > recommend it to use. > > Even if it improves some interesting (and for classical loop optimization > > hard) loop nests, > > it can still blow up on a quite simple data dependence in between loops. > > That said, it's quite > > risky to use it. > > We only have a single wrong-code bug in bugzilla with a testcase and I > just fixed it (well, > patch in testing). We do have plenty of ICEs, yes. Even tramp3d-v4, which is cited in several graphite papers, gets miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823. -- Markus
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 3:08 PM, Markus Trippelsdorf wrote: > On 2017.09.14 at 14:48 +0200, Richard Biener wrote: >> On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška wrote: >> > On 09/14/2017 12:37 PM, Bin.Cheng wrote: >> >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener >> >> wrote: >> >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: >> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: >> > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >> >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras >> >> wrote: >> >>> On 12/09/17 16:57, Wilco Dijkstra wrote: >> >> [...] As a result users are >> required to enable several additional optimizations by hand to get >> good >> code. >> Other compilers enable more optimizations at -O2 (loop unrolling in >> LLVM >> was >> mentioned repeatedly) which GCC could/should do as well. >> [...] >> >> I'd welcome discussion and other proposals for similar improvements. >> >>> >> >>> >> >>> What's the status of graphite? It's been around for years. Isn't it >> >>> mature >> >>> enough to enable these: >> >>> >> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine >> >>> -floop-block >> >>> >> >>> by default for -O2? (And I'm not even sure those are the complete >> >>> set of >> >>> graphite optimization flags, or just the "useful" ones.) >> >> >> >> It's not on by default at any optimization level. The main issue is >> >> the >> >> lack of maintainance and a set of known common internal compiler >> >> errors >> >> we hit. The other issue is that there's no benefit of turning those >> >> on for >> >> SPEC CPU benchmarking as far as I remember but quite a bit of extra >> >> compile-time cost. >> > >> > Not to mention the numerous wrong-code bugs. IMHO graphite should >> > deprecated as soon as possible. >> > >> >> For wrong-code bugs we've got and I recently went through, I fully >> agree with this >> approach and I would do it for GCC 8. There are PRs where order of >> simple 2 loops >> is changed, causing wrong-code as there's a data dependence. >> >> Moreover, I know that Bin was thinking about selection whether to use >> classical loop >> optimizations or Graphite (depending on options provided). This would >> simplify it ;) >> >>> >> >>> I don't think removing graphite is warranted, I still think it is the >> >>> approach to use when >> >>> handling non-perfect nests. >> >> Hi, >> >> IMHO, we should not be in a hurry to remove graphite, though we are >> >> introducing some traditional transformations. It's a quite standalone >> >> part in GCC and supports more transformations. Also as it gets more >> >> attention, never know if somebody will find time to work on it. >> > >> > Ok. I just wanted to express that from user's perspective I would not >> > recommend it to use. >> > Even if it improves some interesting (and for classical loop optimization >> > hard) loop nests, >> > it can still blow up on a quite simple data dependence in between loops. >> > That said, it's quite >> > risky to use it. >> >> We only have a single wrong-code bug in bugzilla with a testcase and I >> just fixed it (well, >> patch in testing). We do have plenty of ICEs, yes. > > Even tramp3d-v4, which is cited in several graphite papers, gets > miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823. But unfortunately there isn't a self-contained testcase for that. The comments hint at sth like int a[][]; p = &a[1][0]; for(;;) a[i][j] = ... p[i] = ... would get at it, that is, accessing memory via two-dim array and pointer. Richard. > -- > Markus
Infering that the condition of a for loop is initially true?
This is more of a question than a bug report, so I'm trying to send it to the list rather than filing a bugzilla issue. I think it's quite common to write for- and while-loops where the condition is always initially true. A simple example might be double average (const double *a, size_t n) { double sum; size_t i; assert (n > 0); for (i = 0, sum = 0; i < n; i++) sum += a[i]; return sum / n; } The programmer could do the microptimization to rewrite it as a do-while-loop instead. It would be nice if gcc could infer that the condition is initially true, and convert to a do-while loop automatically. Converting to a do-while-loop should produce slightly better code, omitting the typical jump to enter the loop at the end where the condition is checked. It would also make analysis of where variables are written more accurate, which is my main concern at the moment. My questions are: 1. Does gcc attempt to do this optimization? 2. If it does, how often does it succeed on loops in real programs? 3. Can I help the compiler to do that inference? The code I had some trouble with is at https://git.lysator.liu.se/nettle/nettle/blob/master/ecc-mod.c. A simplified version with only the interesting code path would be void ecc_mod (mp_size_t mn, mp_size_t bn, mp_limb_t *rp) { mp_limb_t hi; mp_size_t sn = mn - bn; mp_size_t rn = 2*mn; assert (bn < mn); while (rn >= 2 * mn - bn) { rn -= sn; ... code which sets hi ... } ... code which uses hi ... } The intention is that the loop will run at least once, but nevertheless, in this context, rewriting as do-while would make the code uglier, imo. It looks obvious at first glance that the initial value rn = 2*mn makes the condition rn >= 2*mn - bn true. All values are unsigned, and the assert backs up that mn - bn won't underflow. This is also used with pretty small mn, so that 2*mn won't overflow, but unfortunately, that's not obvious to the compiler. In theory, the function could be called with, e.g., on a 64-bit machine, mn == (1 << 63) and bn == 1, and then the initial loop condition would be 0 >= ~0, which is false. So overflow cases seems to rule out the optimization. I've seen gcc warn about hi being used unused in this function (but only on sparc, and on a machine I no longer use, so I'm afraid I can't provide details). I've also seen warnings from the clang static analyzer (which I guess makes different tradeoffs than gcc -Wall when it comes to false positives). I would guess it's quite common that conditions which are always true for relevant inputs may be false due to overflow with extreme inputs, which in the case of unsigned variables doesn't leave any "undefined behaviour"-freedom to the compiler. What's the best way to tell the compiler, to promise that there will be no overflows involving mn? I could try adding an assert, like rn = 2*mn; assert (rn > mn); to rule out overflow, but that looks a bit unobvious to humans and possibly unobvious to the compiler too. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance.
Re: Infering that the condition of a for loop is initially true?
On Thu, 14 Sep 2017, Niels Möller wrote: This is more of a question than a bug report, so I'm trying to send it to the list rather than filing a bugzilla issue. I think it's quite common to write for- and while-loops where the condition is always initially true. A simple example might be double average (const double *a, size_t n) { double sum; size_t i; assert (n > 0); for (i = 0, sum = 0; i < n; i++) sum += a[i]; return sum / n; } The programmer could do the microptimization to rewrite it as a do-while-loop instead. It would be nice if gcc could infer that the condition is initially true, and convert to a do-while loop automatically. Converting to a do-while-loop should produce slightly better code, omitting the typical jump to enter the loop at the end where the condition is checked. It would also make analysis of where variables are written more accurate, which is my main concern at the moment. Hello, assert is not what you want, since it completely disappears with -DNDEBUG. clang has __builtin_assume, with gcc you want a test and __builtin_unreachable. Replacing your assert with if(n==0)__builtin_unreachable(); gcc does skip the first test of the loop, as can be seen in the dump produced with -fdump-tree-optimized. -- Marc Glisse
Re: Infering that the condition of a for loop is initially true?
Hi, On 09/14/2017 09:50 PM, Marc Glisse wrote: On Thu, 14 Sep 2017, Niels Möller wrote: This is more of a question than a bug report, so I'm trying to send it to the list rather than filing a bugzilla issue. I think it's quite common to write for- and while-loops where the condition is always initially true. A simple example might be double average (const double *a, size_t n) { double sum; size_t i; assert (n > 0); for (i = 0, sum = 0; i < n; i++) sum += a[i]; return sum / n; } The programmer could do the microptimization to rewrite it as a do-while-loop instead. It would be nice if gcc could infer that the condition is initially true, and convert to a do-while loop automatically. Converting to a do-while-loop should produce slightly better code, omitting the typical jump to enter the loop at the end where the condition is checked. It would also make analysis of where variables are written more accurate, which is my main concern at the moment. Hello, assert is not what you want, since it completely disappears with -DNDEBUG. clang has __builtin_assume, with gcc you want a test and __builtin_unreachable. Replacing your assert with if(n==0)__builtin_unreachable(); gcc does skip the first test of the loop, as can be seen in the dump produced with -fdump-tree-optimized. Do you plan adding something like __builtin_assume to GCC? It is useful, because of this idiom to help the compiler to optimize better: #ifdef MY_DEBUG #define MY_ASSERT(assertion) do { if (!(assertion)) ... } while (0) #else #define MY_ASSERT(assertion) __builtin_assume(assertion) #endif It is not the same as "if (!(assertion)) __builtin_unreachable();" because __builtin_assume discards any side effects of "assertion". Thanks, Geza
Re: Infering that the condition of a for loop is initially true?
On 09/14/2017 01:28 PM, Niels Möller wrote: > This is more of a question than a bug report, so I'm trying to send it > to the list rather than filing a bugzilla issue. > > I think it's quite common to write for- and while-loops where the > condition is always initially true. A simple example might be > > double average (const double *a, size_t n) > { > double sum; > size_t i; > > assert (n > 0); > for (i = 0, sum = 0; i < n; i++) > sum += a[i]; > return sum / n; > } > > The programmer could do the microptimization to rewrite it as a > do-while-loop instead. It would be nice if gcc could infer that the > condition is initially true, and convert to a do-while loop > automatically. > > Converting to a do-while-loop should produce slightly better code, > omitting the typical jump to enter the loop at the end where the > condition is checked. It would also make analysis of where variables are > written more accurate, which is my main concern at the moment. > > My questions are: > > 1. Does gcc attempt to do this optimization? Yes. It happens as a side effect of jump threading and there are also dedicated passes to rotate the loop. > > 2. If it does, how often does it succeed on loops in real programs? Often. The net benefit is actually small though and sometimes this kind of loop rotation can impede vectorization. > > 3. Can I help the compiler to do that inference? In general, I'd advise against it. You end up with ugly code which works with specific versions of the compiler, but which needs regular tweaking as the internal implementations of various optimizers change over time. > > The code I had some trouble with is at > https://git.lysator.liu.se/nettle/nettle/blob/master/ecc-mod.c. A > simplified version with only the interesting code path would be > > void > ecc_mod (mp_size_t mn, mp_size_t bn, mp_limb_t *rp) > { > mp_limb_t hi; > mp_size_t sn = mn - bn; > mp_size_t rn = 2*mn; > > assert (bn < mn); > > while (rn >= 2 * mn - bn) In this particular case (ignoring the assert), what you want is better jump threading exploiting range propagation. But you have to be real careful here due to the potential overflow. I'd have to have a self-contained example to dig into what's really going on, but my suspicion is either overflow or fairly weak range data and simplification due to the symbolic ranges. Jeff
gcc-7-20170914 is now available
Snapshot gcc-7-20170914 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/7-20170914/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 7 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch revision 252775 You'll find: gcc-7-20170914.tar.xzComplete GCC SHA256=38e05090c2b43dd11d710e729f4dbaf79bc29d8e5b43771cab4415267bc40d22 SHA1=7c52a1ec855b2a954b5812fba2be50dcba58f2c2 Diffs from 7-20170907 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-7 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.