Re: GCC support for PowerPC VLE
James Lemke codesourcery.com> writes: > I have completed the binutils submission for VLE. > I am working on the gcc submission. The test results are looking good > now. Patches will be posted very soon. Do you have any update on the work on VLE-support? Thanks for any feedback you can provide!
Re: [RFC][AArch64] function prologue analyzer in linux kernel
On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote: > On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote: > >On 07/01/16 14:22, Will Deacon wrote: > >>On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote: > >>>So I'd like to introduce a function prologue analyzer to determine > >>>a size allocated by a function's prologue and deduce it from "Depth". > >>>My implementation of this analyzer has been submitted to > >>>linux-arm-kernel mailing list[1]. > >>>I borrowed some ideas from gdb's analyzer[2], especially a loop of > >>>instruction decoding as well as stop of decoding at exiting a basic block, > >>>but implemented my own simplified one because gdb version seems to do > >>>a bit more than what we expect here. > >>>Anyhow, since it is somewhat heuristic (and may not be maintainable for > >>>a long term), could you review it from a broader viewpoint of toolchain, > >>>please? > >>> > >>My main issue with this is that we cannot rely on the frame layout > >>generated by the compiler and there's little point in asking for > >>commitment here. Therefore, the heuristics will need updating as and > >>when we identify new frames that we can't handle. That's pretty fragile > >>and puts us on the back foot when faced with newer compilers. This might > >>be sustainable if we don't expect to encounter much variation, but even > >>that would require some sort of "buy-in" from the various toolchain > >>communities. > >> > >>GCC already has an option (-fstack-usage) to determine the stack usage > >>on a per-function basis and produce a report at build time. Why can't > >>we use that to provide the information we need, rather than attempt to > >>compute it at runtime based on your analyser? > >> > >>If -fstack-usage is not sufficient, understanding why might allow us to > >>propose a better option. > > > >Can you not use the dwarf frame unwind data? That's always sufficient > >to recover the CFA (canonical frame address - the value in SP when > >executing the first instruction in a function). It seems to me it's > >unlikely you're going to need something that's an exceedingly high > >performance operation. > > Thank you for your comment. > Yeah, but we need some utility routines to handle unwind data(.debug_frame). > In fact, some guy has already attempted to merge (part of) libunwind into > the kernel[1], but it was rejected by the kernel community (including Linus > if I correctly remember). It seems that they thought the code was still buggy. The ARC guys seem to have sneaked something in for their architecture: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c so it might not be impossible if we don't require all the bells and whistles of libunwind. > That is one of reasons that I wanted to implement my own analyzer. I still don't understand why you can't use fstack-usage. Can you please tell me why that doesn't work? Am I missing something? Will
Re: [RFC][AArch64] function prologue analyzer in linux kernel
On Tue, Jan 12, 2016 at 03:11:29PM +0900, AKASHI Takahiro wrote: > Will, > > On 01/09/2016 12:53 AM, Will Deacon wrote: > >On Fri, Jan 08, 2016 at 02:36:32PM +0900, AKASHI Takahiro wrote: > >>On 01/07/2016 11:56 PM, Richard Earnshaw (lists) wrote: > >>>On 07/01/16 14:22, Will Deacon wrote: > >>>>On Thu, Dec 24, 2015 at 04:57:54PM +0900, AKASHI Takahiro wrote: > >>>>>So I'd like to introduce a function prologue analyzer to determine > >>>>>a size allocated by a function's prologue and deduce it from "Depth". > >>>>>My implementation of this analyzer has been submitted to > >>>>>linux-arm-kernel mailing list[1]. > >>>>>I borrowed some ideas from gdb's analyzer[2], especially a loop of > >>>>>instruction decoding as well as stop of decoding at exiting a basic > >>>>>block, > >>>>>but implemented my own simplified one because gdb version seems to do > >>>>>a bit more than what we expect here. > >>>>>Anyhow, since it is somewhat heuristic (and may not be maintainable for > >>>>>a long term), could you review it from a broader viewpoint of toolchain, > >>>>>please? > >>>>> > >>>>My main issue with this is that we cannot rely on the frame layout > >>>>generated by the compiler and there's little point in asking for > >>>>commitment here. Therefore, the heuristics will need updating as and > >>>>when we identify new frames that we can't handle. That's pretty fragile > >>>>and puts us on the back foot when faced with newer compilers. This might > >>>>be sustainable if we don't expect to encounter much variation, but even > >>>>that would require some sort of "buy-in" from the various toolchain > >>>>communities. > >>>> > >>>>GCC already has an option (-fstack-usage) to determine the stack usage > >>>>on a per-function basis and produce a report at build time. Why can't > >>>>we use that to provide the information we need, rather than attempt to > >>>>compute it at runtime based on your analyser? > >>>> > >>>>If -fstack-usage is not sufficient, understanding why might allow us to > >>>>propose a better option. > >>> > >>>Can you not use the dwarf frame unwind data? That's always sufficient > >>>to recover the CFA (canonical frame address - the value in SP when > >>>executing the first instruction in a function). It seems to me it's > >>>unlikely you're going to need something that's an exceedingly high > >>>performance operation. > >> > >>Thank you for your comment. > >>Yeah, but we need some utility routines to handle unwind data(.debug_frame). > >>In fact, some guy has already attempted to merge (part of) libunwind into > >>the kernel[1], but it was rejected by the kernel community (including Linus > >>if I correctly remember). It seems that they thought the code was still > >>buggy. > > > >The ARC guys seem to have sneaked something in for their architecture: > > > > > > http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arc/kernel/unwind.c > > > >so it might not be impossible if we don't require all the bells and > >whistles of libunwind. > > Thanks. I didn't notice this code. > > >>That is one of reasons that I wanted to implement my own analyzer. > > > >I still don't understand why you can't use fstack-usage. Can you please > >tell me why that doesn't work? Am I missing something? > > I don't know how gcc calculates the usage here, but I guess it would be more > robust than my analyzer. > > The issues, that come up to my mind, are > - -fstack-usage generates a separate output file, *.su and so we have to > manage them to be incorporated in the kernel binary. That doesn't sound too bad to me. How much data are we talking about here? > This implies that (common) kernel makefiles might have to be a bit changed. > - more worse, what if kernel module case? We will have no way to let the > kernel > know the stack usage without adding an extra step at loading. We can easily add a new __init section to modules, which is a table representing the module functions and their stack sizes (like we do for other things like alternatives). We'd just then need to slurp this information at load time and throw it into an rbtree or something. Will
Re: [RFC][AArch64] function prologue analyzer in linux kernel
On Wed, Jan 13, 2016 at 05:13:29PM +0900, AKASHI Takahiro wrote: > On 01/13/2016 03:04 AM, Will Deacon wrote: > >On Tue, Jan 12, 2016 at 03:11:29PM +0900, AKASHI Takahiro wrote: > >>On 01/09/2016 12:53 AM, Will Deacon wrote: > >>>I still don't understand why you can't use fstack-usage. Can you please > >>>tell me why that doesn't work? Am I missing something? > >> > >>I don't know how gcc calculates the usage here, but I guess it would be more > >>robust than my analyzer. > >> > >>The issues, that come up to my mind, are > >>- -fstack-usage generates a separate output file, *.su and so we have to > >> manage them to be incorporated in the kernel binary. > > > >That doesn't sound too bad to me. How much data are we talking about here? > > > >> This implies that (common) kernel makefiles might have to be a bit > >> changed. > >>- more worse, what if kernel module case? We will have no way to let the > >>kernel > >> know the stack usage without adding an extra step at loading. > > > >We can easily add a new __init section to modules, which is a table > >representing the module functions and their stack sizes (like we do > >for other things like alternatives). We'd just then need to slurp this > >information at load time and throw it into an rbtree or something. > > I found another issue. > Let's think about 'dynamic storage' case like: > $ cat stack.c > extern long fooX(long a); > extern long fooY(long b[]); > > long foo1(long a) { > > if (a > 1) { > long b[a]; <== Here > > return a + fooY(b); > } else { > return a + fooX(a); > } > } > > Then, -fstack-usage returns 48 for foo1(): > $ aarch64-linux-gnu-gcc -fno-omit-frame-pointer -fstack-usage main.c stack.c \ > -pg -O2 -fasynchronous-unwind-tables > $ cat stack.su > stack.c:4:6:foo1 48 dynamic > > This indicates that foo1() may use 48 bytes or more depending on a condition. > But in my case (ftrace-based stack tracer), I always expect 32 whether we're > backtracing from fooY() or from fooX() because my stack tracer estimates: >(stack pointer) = (callee's frame pointer) + (callee's stack usage) > (in my previous e-mail, '-(minus)' was wrong.) > > where (callee's stack usage) is, as I described in my previous e-mail, a size > of > memory which is initially allocated on a stack in a function prologue, and > should not > contain a size of dynamically allocate area. According to who? What's the use in reporting only the prologue size? Will
History of GCC
Hello everyone! My name is Will Hawkins and I am a longtime user of gcc and admirer of the project. I hope that this is the proper forum for the question I am going to ask. If it isn't, please accept my apology and ignore me. I am a real geek and I love the history behind open source projects. I've found several good resources about the history of "famous" open source projects and organizations (including, but definitely not limited to, the very interesting Free as in Freedom 2.0). Unfortunately there does not appear to be a good history of the awesome and fundamental GCC project. I know that there is a page on the wiki (https://gcc.gnu.org/wiki/History) but that is really the best that I can find. Am I missing something? Are there good anecdotes about the history of the development of GCC that you think I might find interesting? Any pointers would be really great! Thanks for taking the time to read my questions. Thanks in advance for any information that you have to offer. I really appreciate everyone's effort to make such a great compiler suite. It's only with such a great compiler that all our other open source projects are able to succeed! Thank you! Will
Re: History of GCC
On Wed, Oct 26, 2016 at 9:07 AM, Ian Lance Taylor wrote: > On Tue, Oct 25, 2016 at 10:53 PM, Will Hawkins wrote: >> >> My name is Will Hawkins and I am a longtime user of gcc and admirer of >> the project. I hope that this is the proper forum for the question I >> am going to ask. If it isn't, please accept my apology and ignore me. >> >> I am a real geek and I love the history behind open source projects. >> I've found several good resources about the history of "famous" open >> source projects and organizations (including, but definitely not >> limited to, the very interesting Free as in Freedom 2.0). >> >> Unfortunately there does not appear to be a good history of the >> awesome and fundamental GCC project. I know that there is a page on >> the wiki (https://gcc.gnu.org/wiki/History) but that is really the >> best that I can find. >> >> Am I missing something? Are there good anecdotes about the history of >> the development of GCC that you think I might find interesting? Any >> pointers would be really great! >> >> Thanks for taking the time to read my questions. Thanks in advance for >> any information that you have to offer. I really appreciate everyone's >> effort to make such a great compiler suite. It's only with such a >> great compiler that all our other open source projects are able to >> succeed! > > There is some history and links at > https://en.wikipedia.org/wiki/GNU_Compiler_Collection . > > In my opinion, the history of GCC is not really one of drama or even > anecdotes, except for the EGCS split. There are plenty of people who > work on GCC out of personal interest, but for decades now the majority > of work on GCC has been by people paid to work on it. I expect that > the result is less interesting as history and more interesting as > software. > > Ian Ian, Thank you for your response! I don't think that there has to be controversy to be interesting. Obviously that split/reunification was important, but I think that there might even be some value in documenting the minutia of the project's growth. In other words, what was the process for incorporating each new version of the C++ standard? Who and why did GCC start a frontend for X language? Things like that. Thanks again for your response! Will
Re: History of GCC
On Wed, Oct 26, 2016 at 11:55 AM, Jeff Law wrote: > On 10/26/2016 07:07 AM, Ian Lance Taylor wrote: >> >> On Tue, Oct 25, 2016 at 10:53 PM, Will Hawkins wrote: >>> >>> >>> My name is Will Hawkins and I am a longtime user of gcc and admirer of >>> the project. I hope that this is the proper forum for the question I >>> am going to ask. If it isn't, please accept my apology and ignore me. >>> >>> I am a real geek and I love the history behind open source projects. >>> I've found several good resources about the history of "famous" open >>> source projects and organizations (including, but definitely not >>> limited to, the very interesting Free as in Freedom 2.0). >>> >>> Unfortunately there does not appear to be a good history of the >>> awesome and fundamental GCC project. I know that there is a page on >>> the wiki (https://gcc.gnu.org/wiki/History) but that is really the >>> best that I can find. >>> >>> Am I missing something? Are there good anecdotes about the history of >>> the development of GCC that you think I might find interesting? Any >>> pointers would be really great! >>> >>> Thanks for taking the time to read my questions. Thanks in advance for >>> any information that you have to offer. I really appreciate everyone's >>> effort to make such a great compiler suite. It's only with such a >>> great compiler that all our other open source projects are able to >>> succeed! >> >> >> There is some history and links at >> https://en.wikipedia.org/wiki/GNU_Compiler_Collection . >> >> In my opinion, the history of GCC is not really one of drama or even >> anecdotes, except for the EGCS split. There are plenty of people who >> work on GCC out of personal interest, but for decades now the majority >> of work on GCC has been by people paid to work on it. I expect that >> the result is less interesting as history and more interesting as >> software. > > Agreed. Speaking for myself, I got interested in GCC to solve a problem, > then another, then another... Hacking GCC made for an interesting hobby for > a few years. I never imagined it would turn into a career, but 20 years > later, here I am. I wouldn't be surprised if others have followed a similar > path. > > jeff Thank you Ian, Joel and Jeff for your responses! I really appreciate it. As I said in my first message, I am a real geek and I certainly did not mean to imply that I thought many people would find the history interesting. That said, I think that there might be some people who do find it so. As an example of the type of software history that I find interesting, consider these: https://www.youtube.com/watch?v=TMjgShRuYbg or https://www.youtube.com/watch?v=2kEJoWfobpA or https://www.usenix.org/system/files/login/articles/03_lu_010-017_final.pdf or as a final example https://www.youtube.com/watch?v=69edOm889V4 To answer the question you are probably asking, "No, I have no idea why I enjoy this type of stuff as much as I do!" Can any of you recall a turning point where development went from being driven by hobbyists to being driven by career developers? As a result of that shift, has there been a change in the project's priorities? Have there been conflicts between the employer's interests and those of the project (in terms of project goals, licensing issues, code quality, etc)? In any event, I really appreciate your answers. If you have any information that you think I might find interesting, please feel free to pass it along. Thanks again! Will
Re: History of GCC
On Wed, Oct 26, 2016 at 1:06 PM, Jakub Jelinek wrote: > On Wed, Oct 26, 2016 at 06:57:31PM +0200, Marek Polacek wrote: >> I think you can learn a lot if you follow the Changes pages, so e.g. >> <https://gcc.gnu.org/gcc-6/changes.html>, and go back down the history until >> you reach the ancient <https://gcc.gnu.org/gcc-3.1/changes.html>. > > Even older releases, while they don't have changes.html, have changes/news > etc. written in the pages referenced from > https://gcc.gnu.org/releases.html#timeline > Also see https://gcc.gnu.org/develop.html#timeline > For questions like who has added feature XYZ, the best source is just the > source repository's history or ChangeLog files. > > Jakub Jakub, Marek, Great suggestions! In fact, I had just thought of the same thing! Thanks for your response! Will
Re: History of GCC
On Wed, Oct 26, 2016 at 1:15 PM, Ian Lance Taylor wrote: > On Wed, Oct 26, 2016 at 9:31 AM, Will Hawkins wrote: >> >> Thank you for your response! I don't think that there has to be >> controversy to be interesting. Obviously that split/reunification was >> important, but I think that there might even be some value in >> documenting the minutia of the project's growth. In other words, what >> was the process for incorporating each new version of the C++ >> standard? Who and why did GCC start a frontend for X language? Things >> like that. > > It is easier to answer specific questions. > > There have always been GCC developers that have tracked the evolution > of C++. The first C++ standard was of course in 1998, at which point > the language was over 10 years old, so there were a lot of C++ > language changes before then. GCC has generally acquired new language > features as they were being adopted into the standard, usually > controlled by options like the current -std=c++1z. This of course > means that the new features have shifted as the standard has shifted, > but as far as I know that hasn't happened too often. > > GCC started as a C compiler. The C++ frontend was started by Michael > Tiemann around 1987 or so. It started as a patch and was later > incorporated into the mainline. > > The Objective C frontend was started at NeXT. They originally > intended to keep it proprietary, but when they understood that the GPL > made that impossible they contributed it back. I forget when the > Objective C++ frontend came in. > > Cygnus Support developed the Chill and, later, Java frontends. The > Chill frontend was removed later, and in fact the Java frontend was > removed just recently. > > As I recall Fortran was a hobbyist project that eventually made it in. > There were two competing forks, I think. I don't remember too much > about that off the top of my head. > > The Ada frontend was developed at AdaCore. > > The Go frontend was written by me, mostly because I like Go and I've > been working on GCC for a long time. I work at Google, and Go was > developed at Google, but there wouldn't be a GCC Go frontend if I > hadn't decided to write one. > > There is a Modula frontend that is always close to getting in. I > think there is a Pascal frontend out there too, somewhere. And a D > frontend. > > Ian Wow, thanks Ian! This is awesome stuff! As I read through it, I may have some additional questions. If I do, would you mind if I emailed you directly? Thanks again for taking the time to write all this down! Fascinating! Will
Re: History of GCC
On Wed, Oct 26, 2016 at 1:28 PM, Richard Kenner wrote: >> The Ada frontend was developed at AdaCore. > > The Ada frontend was developed at NYU, as an Air Force-funded project > to show that Ada95 (then called Ada9X) was implementable. AdaCore was > later formed once that was complete to provide commercial support for > the Ada compiler. The members of that NYU project were the initial > team at AdaCore. Such great information, Richard! Thanks so much! Will
Re: History of GCC
On Wed, Oct 26, 2016 at 2:23 PM, Eric Gallager wrote: > On 10/26/16, Ian Lance Taylor wrote: >> On Wed, Oct 26, 2016 at 9:31 AM, Will Hawkins wrote: >>> >>> Thank you for your response! I don't think that there has to be >>> controversy to be interesting. Obviously that split/reunification was >>> important, but I think that there might even be some value in >>> documenting the minutia of the project's growth. In other words, what >>> was the process for incorporating each new version of the C++ >>> standard? Who and why did GCC start a frontend for X language? Things >>> like that. >> >> It is easier to answer specific questions. >> >> There have always been GCC developers that have tracked the evolution >> of C++. The first C++ standard was of course in 1998, at which point >> the language was over 10 years old, so there were a lot of C++ >> language changes before then. GCC has generally acquired new language >> features as they were being adopted into the standard, usually >> controlled by options like the current -std=c++1z. This of course >> means that the new features have shifted as the standard has shifted, >> but as far as I know that hasn't happened too often. >> >> GCC started as a C compiler. The C++ frontend was started by Michael >> Tiemann around 1987 or so. It started as a patch and was later >> incorporated into the mainline. >> >> The Objective C frontend was started at NeXT. They originally >> intended to keep it proprietary, but when they understood that the GPL >> made that impossible they contributed it back. I forget when the >> Objective C++ frontend came in. > > > The Objective C++ frontend was contribute by Apple. The earliest > proposal I can find for adding it was in 2001 for GCC 3.x: > https://gcc.gnu.org/ml/gcc/2001-11/msg00609.html > However, it didn't actually make it in to the FSF version until 4.1: > https://gcc.gnu.org/ml/gcc-patches/2005-05/msg01781.html > https://gcc.gnu.org/ml/gcc-patches/2005-12/msg01812.html > Personally, I think one of the interesting stories of GCC history is > how Apple used to be really involved in GCC development until 2007, at > which point the GPL3 and iPhone came out, and Apple abandoned GCC for > llvm/clang. If you read through the mailing list archives on > gcc.gnu.org, you can find all sorts of emails from people with "at > apple dot com" email addresses in the early 2000s, until they just > sort of stopped later that decade. Even llvm/clang was originally just > another branch of gcc, and Chris Lattner was even going to contribute > it and keep it part of gcc, but then he never got around to getting > his copyright assignment paperwork filed, and then Apple turned it > into a separate project: > https://gcc.gnu.org/ml/gcc/2005-11/msg00888.html > https://gcc.gnu.org/ml/gcc/2006-03/msg00706.html > > >> >> Cygnus Support developed the Chill and, later, Java frontends. The >> Chill frontend was removed later, and in fact the Java frontend was >> removed just recently. >> >> As I recall Fortran was a hobbyist project that eventually made it in. >> There were two competing forks, I think. I don't remember too much >> about that off the top of my head. >> >> The Ada frontend was developed at AdaCore. >> >> The Go frontend was written by me, mostly because I like Go and I've >> been working on GCC for a long time. I work at Google, and Go was >> developed at Google, but there wouldn't be a GCC Go frontend if I >> hadn't decided to write one. >> >> There is a Modula frontend that is always close to getting in. I >> think there is a Pascal frontend out there too, somewhere. And a D >> frontend. >> >> Ian >> I want to thank each individual for his/her reply, but I don't want to SPAM the list. So, I will do it in one email! Thanks! This is so much more information than I expected to get and it's just amazing. Thanks again! Will
Re: Compilers and RCU readers: Once more unto the breach!
Hi Paul, On Wed, May 20, 2015 at 03:41:48AM +0100, Paul E. McKenney wrote: > On Tue, May 19, 2015 at 07:10:12PM -0700, Linus Torvalds wrote: > > On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds > > wrote: > > So I think you're better off just saying that operations designed to > > drop significant bits break the dependency chain, and give things like > > "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. > > > > Making that just an extension of your existing "& 0" language would > > seem to be natural. > > Works for me! I added the following bullet to the list of things > that break dependencies: > > If a pointer is part of a dependency chain, and if the values > added to or subtracted from that pointer cancel the pointer > value so as to allow the compiler to precisely determine the > resulting value, then the resulting value will not be part of > any dependency chain. For example, if p is part of a dependency > chain, then ((char *)p-(uintptr_t)p)+65536 will not be. > > Seem reasonable? Whilst I understand what you're saying (the ARM architecture makes these sorts of distinctions when calling out dependency-based ordering), it feels like we're dangerously close to defining the difference between a true and a false dependency. If we want to do this in the context of the C language specification, you run into issues because you need to evaluate the program in order to determine data values in order to determine the nature of the dependency. You tackle this above by saying "to allow the compiler to precisely determine the resulting value", but I can't see how that can be cleanly fitted into something like the C language specification. Even if it can, then we'd need to reword the "?:" treatment that you currently have: "If a pointer is part of a dependency chain, and that pointer appears in the entry of a ?: expression selected by the condition, then the chain extends to the result." which I think requires the state of the condition to be known statically if we only want to extend the chain from the selected expression. In the general case, wouldn't a compiler have to assume that the chain is extended from both? Additionally, what about the following code? char *x = y ? z : z; Does that extend a dependency chain from z to x? If so, I can imagine a CPU breaking that in practice. > > Humans will understand, and compiler writers won't care. They will > > either depend on hardware semantics anyway (and argue that your > > language is tight enough that they don't need to do anything special) > > or they will turn the consume into an acquire (on platforms that have > > too weak hardware). > > Agreed. Plus Core Working Group will hammer out the exact wording, > should this approach meet their approval. For the avoidance of doubt, I'm completely behind any attempts to tackle this problem, but I anticipate an uphill struggle getting this text into the C standard. Is your intention to change the carries-a-dependency relation to encompass this change? Cheers, Will
Re: Compilers and RCU readers: Once more unto the breach!
On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote: > On Wed, May 20, 2015 at 12:47:45PM +0100, Will Deacon wrote: > > On Wed, May 20, 2015 at 03:41:48AM +0100, Paul E. McKenney wrote: > > > If a pointer is part of a dependency chain, and if the values > > > added to or subtracted from that pointer cancel the pointer > > > value so as to allow the compiler to precisely determine the > > > resulting value, then the resulting value will not be part of > > > any dependency chain. For example, if p is part of a dependency > > > chain, then ((char *)p-(uintptr_t)p)+65536 will not be. > > > > > > Seem reasonable? > > > > Whilst I understand what you're saying (the ARM architecture makes these > > sorts of distinctions when calling out dependency-based ordering), it > > feels like we're dangerously close to defining the difference between a > > true and a false dependency. If we want to do this in the context of the > > C language specification, you run into issues because you need to evaluate > > the program in order to determine data values in order to determine the > > nature of the dependency. > > Indeed, something like this does -not- carry a dependency from the > memory_order_consume load to q: > > char *p, q; > > p = atomic_load_explicit(&gp, memory_order_consume); > q = gq + (intptr_t)p - (intptr_t)p; > > If this was compiled with -O0, ARM and Power might well carry a > dependency, but given any optimization, the assembly language would have > no hint of any such dependency. So I am not seeing any particular danger. The above is a welcome relaxation over C11, since ARM doesn't even give you ordering based off false data dependencies. My concern is more to do with how this can be specified precisely without prohibing honest compiler and hardware optimisations. Out of interest, how do you tackle examples (4) and (5) of (assuming the reads are promoted to consume loads)?: http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html my understanding is that you permit both outcomes (I appreciate you're not directly tackling out-of-thin-air, but treatment of dependencies is heavily related). > > You tackle this above by saying "to allow the compiler to precisely > > determine the resulting value", but I can't see how that can be cleanly > > fitted into something like the C language specification. > > I am sure that there will be significant rework from where this document > is to language appropriate from the standard. Which is why I am glad > that Jens is taking an interest in this, as he is particularly good at > producing standards language. Ok. I'm curious to see how that comes along. > > Even if it can, > > then we'd need to reword the "?:" treatment that you currently have: > > > > "If a pointer is part of a dependency chain, and that pointer appears > >in the entry of a ?: expression selected by the condition, then the > >chain extends to the result." > > > > which I think requires the state of the condition to be known statically > > if we only want to extend the chain from the selected expression. In the > > general case, wouldn't a compiler have to assume that the chain is > > extended from both? > > In practice, yes, if the compiler cannot determine which expression is > selected, it must arrange for the dependency to be carried from either, > depending on the run-time value of the condition. But you would have > to work pretty hard to create code that did not carry the dependencies > as require, not? I'm not sure... you'd require the compiler to perform static analysis of loops to determine the state of the machine when they exit (if they exit!) in order to show whether or not a dependency is carried to subsequent operations. If it can't prove otherwise, it would have to assume that a dependency *is* carried, and it's not clear to me how it would use this information to restrict any subsequent dependency removing optimisations. I guess that's one for the GCC folks. > > Additionally, what about the following code? > > > > char *x = y ? z : z; > > > > Does that extend a dependency chain from z to x? If so, I can imagine a > > CPU breaking that in practice. > > I am not seeing this. I would expect the compiler to optimize to > something like this: > > char *x = z; > > How does this avoid carrying the dependency? Or are you saying that > ARM loses the dependency via a store to memory and a later reload? > That would be a bi
Re: Compilers and RCU readers: Once more unto the breach!
On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > On Wed, May 20, 2015 at 04:46:17PM +0100, Will Deacon wrote: > > On Wed, May 20, 2015 at 01:15:22PM +0100, Paul E. McKenney wrote: > > > Indeed, something like this does -not- carry a dependency from the > > > memory_order_consume load to q: > > > > > > char *p, q; > > > > > > p = atomic_load_explicit(&gp, memory_order_consume); > > > q = gq + (intptr_t)p - (intptr_t)p; > > > > > > If this was compiled with -O0, ARM and Power might well carry a > > > dependency, but given any optimization, the assembly language would have > > > no hint of any such dependency. So I am not seeing any particular danger. > > > > The above is a welcome relaxation over C11, since ARM doesn't even give > > you ordering based off false data dependencies. My concern is more to do > > with how this can be specified precisely without prohibing honest compiler > > and hardware optimisations. > > That last is the challenge. I believe that I am pretty close, but I am > sure that additional adjustment will be required. Especially given that > we also need the memory model to be amenable to formal analysis. Well, there's still the whole thin-air problem which unfortunately doesn't go away with your proposal... (I was hoping that differentiating between true and false dependencies would solve that, but your set of rules isn't broad enough and I don't blame you at all for that!). > > Out of interest, how do you tackle examples (4) and (5) of (assuming the > > reads are promoted to consume loads)?: > > > > http://www.cl.cam.ac.uk/~pes20/cpp/notes42.html > > > > my understanding is that you permit both outcomes (I appreciate you're > > not directly tackling out-of-thin-air, but treatment of dependencies > > is heavily related). Thanks for taking the time to walk these two examples through. > Let's see... #4 is as follows, given promotion to memory_order_consume > and (I am guessing) memory_order_relaxed: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > -- > r2 = atomic_load_explicit(&y, memory_order_consume); > if (r2 == 42) > atomic_store_explicit(&x, 42, memory_order_relaxed); > else > atomic_store_explicit(&x, 42, memory_order_relaxed); > > The second thread does not have a proper control dependency, even with > the memory_order_consume load because both branches assign the same > value to "x". This means that the compiler is within its rights to > optimize this into the following: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > -- > r2 = atomic_load_explicit(&y, memory_order_consume); > atomic_store_explicit(&x, 42, memory_order_relaxed); > > There is no dependency between the second thread's pair of statements, > so both the compiler and the CPU are within their rights to optimize > further as follows: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > -- > atomic_store_explicit(&x, 42, memory_order_relaxed); > r2 = atomic_load_explicit(&y, memory_order_consume); > > If the compiler makes this final optimization, even mythical SC hardware > is within its rights to end up with (r1 == 42 && r2 == 42). Which is > fine, as far as I am concerned. Or at least something that can be > lived with. Agreed. > On to #5: > > r1 = atomic_load_explicit(&x, memory_order_consume); > if (r1 == 42) > atomic_store_explicit(&y, r1, memory_order_relaxed); > > r2 = atomic_load_explicit(&y, memory_order_consume); > if (r2 == 42) > atomic_store_explicit(&x, 42, memory_order_relaxed); > > The first thread's accesses are dependency ordered. The second thread's > ordering is in a corner case that memory-barriers.txt does not cover. > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > a memory_order_consume load (AKA rcu_dereference and friends). However, > Alpha would have a full barrier as part of the memory_orde
Re: Compilers and RCU readers: Once more unto the breach!
Hi Paul, On Thu, May 21, 2015 at 09:02:12PM +0100, Paul E. McKenney wrote: > On Thu, May 21, 2015 at 08:24:22PM +0100, Will Deacon wrote: > > On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote: > > > On to #5: > > > > > > r1 = atomic_load_explicit(&x, memory_order_consume); > > > if (r1 == 42) > > > atomic_store_explicit(&y, r1, memory_order_relaxed); > > > > > > r2 = atomic_load_explicit(&y, memory_order_consume); > > > if (r2 == 42) > > > atomic_store_explicit(&x, 42, memory_order_relaxed); > > > > > > The first thread's accesses are dependency ordered. The second thread's > > > ordering is in a corner case that memory-barriers.txt does not cover. > > > You are supposed to start control dependencies with READ_ONCE_CTRL(), not > > > a memory_order_consume load (AKA rcu_dereference and friends). However, > > > Alpha would have a full barrier as part of the memory_order_consume load, > > > and the rest of the processors would (one way or another) respect the > > > control dependency. And the compiler would have some fun trying to > > > break it. > > > > But this is interesting because the first thread is ordered whilst the > > second is not, so doesn't that effectively forbid the compiler from > > constant-folding values if it can't prove that there is no dependency > > chain? > > You lost me on this one. Are you suggesting that the compiler > speculate the second thread's atomic store? That would be very > bad regardless of dependency chains. > > So what constant-folding optimization are you thinking of here? > If the above example is not amenable to such an optimization, could > you please give me an example where constant folding would apply > in a way that is sensitive to dependency chains? Unless I'm missing something, I can't see what would prevent a compiler from looking at the code in thread1 and transforming it into the code in thread2 (i.e. constant folding r1 with 42 given that the taken branch must mean that r1 == 42). However, such an optimisation breaks the dependency chain, which means that a compiler needs to walk backwards to see if there is a dependency chain extending to r1. > > > So the current Linux memory model would allow (r1 == 42 && r2 == 42), > > > but I don't know of any hardware/compiler combination that would > > > allow it. And no, I am -not- going to update memory-barriers.txt for > > > this litmus test, its theoretical interest notwithstanding! ;-) Of course, I'm not asking for that at all! I'm just trying to see how your proposal holds up with the example. Will
Pretty print of C++11 scoped enums - request help towards a proper fix
Re: "Pretty print of enumerator never prints the id, always falls back to C-style cast output" https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364 The bug report gives a one-line 'fix' to enable output of enum id but, for C++11 scoped enums, it fails to qualify as enum type::id. The code is located in c-pretty-print.c It has not been updated to deal with C++11 scoped enumerations. 'Separation of responsibilities' between c and cxx-pretty-print seems fairly lax - it's convenient to push some c++ printing to c (there are a few comments like /* This C++ bit is handled here...*/) I have not quite managed to make a fix confined to c-pretty-print.c I have a fix which duplicates the code in pp_c_enumeration_constant to pp_cxx_enumeration_constant in cxx-pretty print, with modification if (value != NULL_TREE) { if (ENUM_IS_SCOPED (type)) pp_cxx_nested_name_specifier (pp, type); pp->id_expression (TREE_PURPOSE (value)); } This works in my testing so far, but - It duplicates code from c to cxx (not DRY 'Don't Repeat Yourself) - I didn't find a single function to print full nested, scoped id so had to check if ENUM_IS_SCOPED to output nested specifiers. I'm learning by hacking but would like guidance on a proper fix from anyone more familiar with gcc pretty print and/or grammar - the guideline comment, at the top of the file, states: /* The pretty-printer code is primarily designed to closely follow (GNU) C and C++ grammars... */ I'd appreciate any recommendations towards a proper fix, or pointers for how to write unit tests for the fix. Thanks, Will
Re: Pretty print of C++11 scoped enums - request help towards a proper fix
Thanks Nathan, In fact, after testing with enums nested in namespaces or structs, or function local, I realised nested specifiers should be printed for both scoped and unscoped enums, but for unscoped enums one level of nested specifier (the enum type) needs to be stripped. So I inverted the IS_SCOPED test and used get_containing_scope: if (value != NULL_TREE) { if (!ENUM_IS_SCOPED (type)) type = get_containing_scope (type); pp_cxx_nested_name_specifier (pp, type); pp->id_expression (TREE_PURPOSE (value)); } I submitted this fix as a patch to the bug report, with tests. With this fix GCC now has similar output to both Clang and MSVC for enumerated values. For non-enumerated values GCC continues to print a C-style cast while Clang & MSVC print plain digits. Yay! GCC is winning! (gives type info for non-enumerated values). A downside of nested specifiers is that output gets verbose. Richard Smith suggests to use less verbose output for known types compared to auto deduced types. Discussion starts here http://lists.llvm.org/pipermail/cfe-dev/2018-September/059229.html For enum args, I guess that this would mean distinguishing whether the corresponding template parameter was auto or a given enum type and only printing a simple id with no nested specs for given type. I don't know yet if that info is available at the leaf level here. Similarly, type info should be added to deduced Integral values. I may start to investigate how to do this in GCC pretty print. I submitted the related request to MSVC devs: https://developercommunity.visualstudio.com/content/problem/339663/improve-pretty-print-of-integral-non-type-template.html > given the code base ... GCC pretty-print code was committed by GDR mid 2002, K&R style C, updated to C90 'prototype' in mid 2003, untouched since then, not for C++11 or C++17 enum updates. I found this corner of the code base fairly easy to hack, thanks perhaps to GDRs attempts to follow the grammar. On Mon, Sep 24, 2018 at 3:53 PM Nathan Sidwell wrote: > On 9/19/18 7:41 AM, will wray wrote: > > Re: "Pretty print of enumerator never prints the id, > > always falls back to C-style cast output" > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364 > > > > > I have a fix which duplicates the code in pp_c_enumeration_constant > > to pp_cxx_enumeration_constant in cxx-pretty print, with modification > > > >if (value != NULL_TREE) > >{ > > if (ENUM_IS_SCOPED (type)) > >pp_cxx_nested_name_specifier (pp, type); > > pp->id_expression (TREE_PURPOSE (value)); > >} > > > > This works in my testing so far, but > > - It duplicates code from c to cxx (not DRY 'Don't Repeat Yourself) > > - I didn't find a single function to print full nested, scoped id > > so had to check if ENUM_IS_SCOPED to output nested specifiers. > > This seems a fine approach, given the code base. > > nathan > > -- > Nathan Sidwell >
Re: Pretty print of C++11 scoped enums - request help towards a proper fix
BTW The bug is still UNCONFIRMED https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364 It is easy to CONFIRM: Follow this Compiler Explorer link https://godbolt.org/z/P4ejiy or paste this code into a file and compile with g++: template struct wauto; enum e { a }; wauto v;// error Note that GCC reports error: ... 'wauto v' => It should report error: ... 'wauto v' This is a bug; the intent of the code is print the enumerator id (clang prints the enumerator id and so do recent MSVC previews). There is also test code linked to the bug covering more cases. I'd appreciate if someone would confirm the bug. Thanks, Will On Mon, Sep 24, 2018 at 5:23 PM will wray wrote: > Thanks Nathan, > > > In fact, after testing with enums nested in namespaces or structs, > > or function local, I realised nested specifiers should be printed > > for both scoped and unscoped enums, but for unscoped enums one > > level of nested specifier (the enum type) needs to be stripped. > > So I inverted the IS_SCOPED test and used get_containing_scope: > > > if (value != NULL_TREE) > { > > if (!ENUM_IS_SCOPED (type)) > > type = get_containing_scope (type); > > pp_cxx_nested_name_specifier (pp, type); > > pp->id_expression (TREE_PURPOSE (value)); > > } > > > I submitted this fix as a patch to the bug report, with tests. > > With this fix GCC now has similar output to both Clang and MSVC > for enumerated values. For non-enumerated values GCC continues > to print a C-style cast while Clang & MSVC print plain digits. > Yay! GCC is winning! (gives type info for non-enumerated values). > > A downside of nested specifiers is that output gets verbose. > > Richard Smith suggests to use less verbose output for known types > compared to auto deduced types. Discussion starts here > http://lists.llvm.org/pipermail/cfe-dev/2018-September/059229.html > > For enum args, I guess that this would mean distinguishing whether > the corresponding template parameter was auto or a given enum type > and only printing a simple id with no nested specs for given type. > I don't know yet if that info is available at the leaf level here. > > Similarly, type info should be added to deduced Integral values. > I may start to investigate how to do this in GCC pretty print. > I submitted the related request to MSVC devs: > > https://developercommunity.visualstudio.com/content/problem/339663/improve-pretty-print-of-integral-non-type-template.html > > > given the code base ... > > GCC pretty-print code was committed by GDR mid 2002, > K&R style C, updated to C90 'prototype' in mid 2003, > untouched since then, not for C++11 or C++17 enum updates. > > I found this corner of the code base fairly easy to hack, > thanks perhaps to GDRs attempts to follow the grammar. > > > On Mon, Sep 24, 2018 at 3:53 PM Nathan Sidwell wrote: > >> On 9/19/18 7:41 AM, will wray wrote: >> > Re: "Pretty print of enumerator never prints the id, >> > always falls back to C-style cast output" >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364 >> > >> >> > I have a fix which duplicates the code in pp_c_enumeration_constant >> > to pp_cxx_enumeration_constant in cxx-pretty print, with modification >> > >> >if (value != NULL_TREE) >> >{ >> > if (ENUM_IS_SCOPED (type)) >> >pp_cxx_nested_name_specifier (pp, type); >> > pp->id_expression (TREE_PURPOSE (value)); >> >} >> > >> > This works in my testing so far, but >> > - It duplicates code from c to cxx (not DRY 'Don't Repeat Yourself) >> > - I didn't find a single function to print full nested, scoped id >> > so had to check if ENUM_IS_SCOPED to output nested specifiers. >> >> This seems a fine approach, given the code base. >> >> nathan >> >> -- >> Nathan Sidwell >> >
Re: Pretty print of C++11 scoped enums - request help towards a proper fix
Patch submitted: https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00452.html [C++ PATCH] Fix pretty-print of enumerator ids (PR c++/87364) My first GCC patch attempt, so more eyes would be good. Cheers, Will On Tue, Sep 25, 2018 at 4:25 PM will wray wrote: > BTW The bug is still UNCONFIRMED > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364 > > It is easy to CONFIRM: > > Follow this Compiler Explorer link https://godbolt.org/z/P4ejiy > or paste this code into a file and compile with g++: > > template struct wauto; > enum e { a }; > wauto v;// error > > > Note that GCC reports error: ... 'wauto v' > => It should report error: ... 'wauto v' > > This is a bug; the intent of the code is print the enumerator id > (clang prints the enumerator id and so do recent MSVC previews). > There is also test code linked to the bug covering more cases. > > I'd appreciate if someone would confirm the bug. > > Thanks, Will > > On Mon, Sep 24, 2018 at 5:23 PM will wray wrote: > >> Thanks Nathan, >> >> >> In fact, after testing with enums nested in namespaces or structs, >> >> or function local, I realised nested specifiers should be printed >> >> for both scoped and unscoped enums, but for unscoped enums one >> >> level of nested specifier (the enum type) needs to be stripped. >> >> So I inverted the IS_SCOPED test and used get_containing_scope: >> >> >> if (value != NULL_TREE) >> { >> >> if (!ENUM_IS_SCOPED (type)) >> >> type = get_containing_scope (type); >> >> pp_cxx_nested_name_specifier (pp, type); >> >> pp->id_expression (TREE_PURPOSE (value)); >> >> } >> >> >> I submitted this fix as a patch to the bug report, with tests. >> >> With this fix GCC now has similar output to both Clang and MSVC >> for enumerated values. For non-enumerated values GCC continues >> to print a C-style cast while Clang & MSVC print plain digits. >> Yay! GCC is winning! (gives type info for non-enumerated values). >> >> A downside of nested specifiers is that output gets verbose. >> >> Richard Smith suggests to use less verbose output for known types >> compared to auto deduced types. Discussion starts here >> http://lists.llvm.org/pipermail/cfe-dev/2018-September/059229.html >> >> For enum args, I guess that this would mean distinguishing whether >> the corresponding template parameter was auto or a given enum type >> and only printing a simple id with no nested specs for given type. >> I don't know yet if that info is available at the leaf level here. >> >> Similarly, type info should be added to deduced Integral values. >> I may start to investigate how to do this in GCC pretty print. >> I submitted the related request to MSVC devs: >> >> https://developercommunity.visualstudio.com/content/problem/339663/improve-pretty-print-of-integral-non-type-template.html >> >> > given the code base ... >> >> GCC pretty-print code was committed by GDR mid 2002, >> K&R style C, updated to C90 'prototype' in mid 2003, >> untouched since then, not for C++11 or C++17 enum updates. >> >> I found this corner of the code base fairly easy to hack, >> thanks perhaps to GDRs attempts to follow the grammar. >> >> >> On Mon, Sep 24, 2018 at 3:53 PM Nathan Sidwell wrote: >> >>> On 9/19/18 7:41 AM, will wray wrote: >>> > Re: "Pretty print of enumerator never prints the id, >>> > always falls back to C-style cast output" >>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87364 >>> > >>> >>> > I have a fix which duplicates the code in pp_c_enumeration_constant >>> > to pp_cxx_enumeration_constant in cxx-pretty print, with modification >>> > >>> >if (value != NULL_TREE) >>> >{ >>> > if (ENUM_IS_SCOPED (type)) >>> >pp_cxx_nested_name_specifier (pp, type); >>> > pp->id_expression (TREE_PURPOSE (value)); >>> >} >>> > >>> > This works in my testing so far, but >>> > - It duplicates code from c to cxx (not DRY 'Don't Repeat Yourself) >>> > - I didn't find a single function to print full nested, scoped id >>> > so had to check if ENUM_IS_SCOPED to output nested specifiers. >>> >>> This seems a fine approach, given the code base. >>> >>> nathan >>> >>> -- >>> Nathan Sidwell >>> >>
Basic Block Statistics
Hello everyone! I apologize if this is not the right venue to ask this question and/or this is a waste of your time. I was just wondering if there are statistics that gcc can emit that includes either a) the average number of instructions per basic block and/or b) the average size (in bytes) per basic block in a compilation unit. If nothing like this exists, I am more than happy to code something up if people besides me think that it might be interesting. I promise that I googled for information before asking, but I can't guarantee that I didn't miss anything. Again, I apologize if I just needed to RTFM better. Thanks in advance for any responses! Will
Re: Basic Block Statistics
On Tue, May 16, 2017 at 2:33 PM, Jeff Law wrote: > On 05/16/2017 12:24 PM, Will Hawkins wrote: >> Hello everyone! >> >> I apologize if this is not the right venue to ask this question and/or >> this is a waste of your time. >> >> I was just wondering if there are statistics that gcc can emit that >> includes either a) the average number of instructions per basic block >> and/or b) the average size (in bytes) per basic block in a compilation >> unit. >> >> If nothing like this exists, I am more than happy to code something up >> if people besides me think that it might be interesting. >> >> I promise that I googled for information before asking, but I can't >> guarantee that I didn't miss anything. Again, I apologize if I just >> needed to RTFM better. > I don't think we have anything which inherently will give you this > information. > > It'd be a useful thing to have though. Implementation may be made more > difficult by insns that generate > 1 instruction. > > Jeff Thank you, Mr. Law. I think that this is something I'd really like to work on. As I start to take a peak into how hard/easy this is to implement, I may circle back and ask some additional technical questions. Thanks for your quick response! Will
Re: Basic Block Statistics
On Tue, May 16, 2017 at 2:45 PM, David Malcolm wrote: > On Tue, 2017-05-16 at 14:24 -0400, Will Hawkins wrote: >> Hello everyone! >> >> I apologize if this is not the right venue to ask this question >> and/or >> this is a waste of your time. >> >> I was just wondering if there are statistics that gcc can emit that >> includes either a) the average number of instructions per basic block >> and/or b) the average size (in bytes) per basic block in a >> compilation >> unit. >> >> If nothing like this exists, I am more than happy to code something >> up >> if people besides me think that it might be interesting. >> >> I promise that I googled for information before asking, but I can't >> guarantee that I didn't miss anything. Again, I apologize if I just >> needed to RTFM better. >> >> Thanks in advance for any responses! >> Will > > I don't think anything like this currently exists, but it's probably > doable via a plugin, e.g. by hooking up a new RTL pass somewhere > towards the end of the pass pipeline. > > That said, IIRC basic blocks aren't used in the final passes; > presumably they're not meaningful after the "free_cfg" pass. > > Hope this is helpful Very helpful, thank you Mr. Malcolm! Will > Dave
Re: Basic Block Statistics
As I started looking into this, it seems like PLUGIN_FINISH is where my plugin will go. Everything is great so far. However, when plugins at that event are invoked, they get no data. That means I will have to look into global structures for information regarding the compilation. Are there pointers to the documentation that describe the relevant global data structures that are accessible at this point? I am looking through the source code and documentation and can't find what I am looking for. I am happy to continue working, but thought I'd ask just in case I was missing something silly. Thanks again for all your help getting me started on this! Will On Tue, May 16, 2017 at 2:54 PM, Jeff Law wrote: > On 05/16/2017 12:37 PM, Will Hawkins wrote: >> On Tue, May 16, 2017 at 2:33 PM, Jeff Law wrote: >>> On 05/16/2017 12:24 PM, Will Hawkins wrote: >>>> Hello everyone! >>>> >>>> I apologize if this is not the right venue to ask this question and/or >>>> this is a waste of your time. >>>> >>>> I was just wondering if there are statistics that gcc can emit that >>>> includes either a) the average number of instructions per basic block >>>> and/or b) the average size (in bytes) per basic block in a compilation >>>> unit. >>>> >>>> If nothing like this exists, I am more than happy to code something up >>>> if people besides me think that it might be interesting. >>>> >>>> I promise that I googled for information before asking, but I can't >>>> guarantee that I didn't miss anything. Again, I apologize if I just >>>> needed to RTFM better. >>> I don't think we have anything which inherently will give you this >>> information. >>> >>> It'd be a useful thing to have though. Implementation may be made more >>> difficult by insns that generate > 1 instruction. >>> >>> Jeff >> >> Thank you, Mr. Law. I think that this is something I'd really like to >> work on. As I start to take a peak into how hard/easy this is to >> implement, I may circle back and ask some additional technical >> questions. > Sure. On-list is best. > Jeff
Re: Basic Block Statistics
On Wed, May 17, 2017 at 1:02 PM, Jeff Law wrote: > On 05/17/2017 10:36 AM, Will Hawkins wrote: >> As I started looking into this, it seems like PLUGIN_FINISH is where >> my plugin will go. Everything is great so far. However, when plugins >> at that event are invoked, they get no data. That means I will have to >> look into global structures for information regarding the compilation. >> Are there pointers to the documentation that describe the relevant >> global data structures that are accessible at this point? >> >> I am looking through the source code and documentation and can't find >> what I am looking for. I am happy to continue working, but thought I'd >> ask just in case I was missing something silly. >> >> Thanks again for all your help getting me started on this! > FOR_EACH_BB (bb) is what you're looking for. That will iterate over the > basic blocks. Thank you so much for your response! I just found this as soon as you sent it. Sorry for wasting your time! > > Assuming you're running late, you'll then want to walk each insn within > the bb. So something like this > > basic_block bb > FOR_EACH_BB (bb) > { > rtx_insn *insn; > FOR_BB_INSNS (bb, insn) > { > /* Do something with INSN. */ > } > } > > > Note that if you're running too late the CFG may have been released, in > which case this code wouldn't do anything. I will just have to experiment to see exactly when the right time to invoke this plugin to get the best data. Thanks again! Will > > jeff
Re: Basic Block Statistics
On Wed, May 17, 2017 at 1:04 PM, Will Hawkins wrote: > On Wed, May 17, 2017 at 1:02 PM, Jeff Law wrote: >> On 05/17/2017 10:36 AM, Will Hawkins wrote: >>> As I started looking into this, it seems like PLUGIN_FINISH is where >>> my plugin will go. Everything is great so far. However, when plugins >>> at that event are invoked, they get no data. That means I will have to >>> look into global structures for information regarding the compilation. >>> Are there pointers to the documentation that describe the relevant >>> global data structures that are accessible at this point? >>> >>> I am looking through the source code and documentation and can't find >>> what I am looking for. I am happy to continue working, but thought I'd >>> ask just in case I was missing something silly. >>> >>> Thanks again for all your help getting me started on this! >> FOR_EACH_BB (bb) is what you're looking for. That will iterate over the >> basic blocks. > > Thank you so much for your response! > > I just found this as soon as you sent it. Sorry for wasting your time! > > >> >> Assuming you're running late, you'll then want to walk each insn within >> the bb. So something like this >> >> basic_block bb >> FOR_EACH_BB (bb) >> { >> rtx_insn *insn; >> FOR_BB_INSNS (bb, insn) >> { >> /* Do something with INSN. */ >> } >> } >> >> >> Note that if you're running too late the CFG may have been released, in >> which case this code wouldn't do anything. This macro seems to require that there be a valid cfun. This seems to imply that the macro will work only where the plugin callback is invoked before/after a pass that does some optimization for a particular function. In particular, at PLUGIN_FINISH, cfun is NULL. This makes perfect sense. Since PLUGIN_FINISH is the place where diagnostics are supposed to be printed, I was wondering if there was an equivalent iterator for all translation units (from which I could derive functions, from which I could derive basic blocks) that just "FINISH"ed compiling? The other way to approach the problem, I suppose, is to just accumulate those stats at the end of each pass execution phase and then simply print them when PLUGIN_FINISH is invoked. I'm sorry to make this so difficult. I am just wondering about the way that the community expects the plugins to be written in the most modular fashion. Thanks again for walking me through all this! Will > > I will just have to experiment to see exactly when the right time to > invoke this plugin to get the best data. > > Thanks again! > Will > >> >> jeff
Re: Basic Block Statistics
On Wed, May 17, 2017 at 2:41 PM, Will Hawkins wrote: > On Wed, May 17, 2017 at 1:04 PM, Will Hawkins wrote: >> On Wed, May 17, 2017 at 1:02 PM, Jeff Law wrote: >>> On 05/17/2017 10:36 AM, Will Hawkins wrote: >>>> As I started looking into this, it seems like PLUGIN_FINISH is where >>>> my plugin will go. Everything is great so far. However, when plugins >>>> at that event are invoked, they get no data. That means I will have to >>>> look into global structures for information regarding the compilation. >>>> Are there pointers to the documentation that describe the relevant >>>> global data structures that are accessible at this point? >>>> >>>> I am looking through the source code and documentation and can't find >>>> what I am looking for. I am happy to continue working, but thought I'd >>>> ask just in case I was missing something silly. >>>> >>>> Thanks again for all your help getting me started on this! >>> FOR_EACH_BB (bb) is what you're looking for. That will iterate over the >>> basic blocks. >> >> Thank you so much for your response! >> >> I just found this as soon as you sent it. Sorry for wasting your time! >> >> >>> >>> Assuming you're running late, you'll then want to walk each insn within >>> the bb. So something like this >>> >>> basic_block bb >>> FOR_EACH_BB (bb) >>> { >>> rtx_insn *insn; >>> FOR_BB_INSNS (bb, insn) >>> { >>> /* Do something with INSN. */ >>> } >>> } >>> >>> >>> Note that if you're running too late the CFG may have been released, in >>> which case this code wouldn't do anything. > > This macro seems to require that there be a valid cfun. This seems to > imply that the macro will work only where the plugin callback is > invoked before/after a pass that does some optimization for a > particular function. In particular, at PLUGIN_FINISH, cfun is NULL. > This makes perfect sense. > > Since PLUGIN_FINISH is the place where diagnostics are supposed to be > printed, I was wondering if there was an equivalent iterator for all > translation units (from which I could derive functions, from which I > could derive basic blocks) that just "FINISH"ed compiling? Answering my own question for historical purposes and anyone else who might need this: FOR_EACH_VEC_ELT(*all_translation_units, i, t) is exactly what I was looking for! Sorry for the earlier spam and thank you for your patience! Will > > The other way to approach the problem, I suppose, is to just > accumulate those stats at the end of each pass execution phase and > then simply print them when PLUGIN_FINISH is invoked. > > I'm sorry to make this so difficult. I am just wondering about the way > that the community expects the plugins to be written in the most > modular fashion. > > Thanks again for walking me through all this! > Will > >> >> I will just have to experiment to see exactly when the right time to >> invoke this plugin to get the best data. >> >> Thanks again! >> Will >> >>> >>> jeff
Re: Basic Block Statistics
On Wed, May 17, 2017 at 2:59 PM, Will Hawkins wrote: > On Wed, May 17, 2017 at 2:41 PM, Will Hawkins wrote: >> On Wed, May 17, 2017 at 1:04 PM, Will Hawkins wrote: >>> On Wed, May 17, 2017 at 1:02 PM, Jeff Law wrote: >>>> On 05/17/2017 10:36 AM, Will Hawkins wrote: >>>>> As I started looking into this, it seems like PLUGIN_FINISH is where >>>>> my plugin will go. Everything is great so far. However, when plugins >>>>> at that event are invoked, they get no data. That means I will have to >>>>> look into global structures for information regarding the compilation. >>>>> Are there pointers to the documentation that describe the relevant >>>>> global data structures that are accessible at this point? >>>>> >>>>> I am looking through the source code and documentation and can't find >>>>> what I am looking for. I am happy to continue working, but thought I'd >>>>> ask just in case I was missing something silly. >>>>> >>>>> Thanks again for all your help getting me started on this! >>>> FOR_EACH_BB (bb) is what you're looking for. That will iterate over the >>>> basic blocks. >>> >>> Thank you so much for your response! >>> >>> I just found this as soon as you sent it. Sorry for wasting your time! >>> >>> >>>> >>>> Assuming you're running late, you'll then want to walk each insn within >>>> the bb. So something like this >>>> >>>> basic_block bb >>>> FOR_EACH_BB (bb) >>>> { >>>> rtx_insn *insn; >>>> FOR_BB_INSNS (bb, insn) >>>> { >>>> /* Do something with INSN. */ >>>> } >>>> } >>>> >>>> >>>> Note that if you're running too late the CFG may have been released, in >>>> which case this code wouldn't do anything. >> >> This macro seems to require that there be a valid cfun. This seems to >> imply that the macro will work only where the plugin callback is >> invoked before/after a pass that does some optimization for a >> particular function. In particular, at PLUGIN_FINISH, cfun is NULL. >> This makes perfect sense. >> >> Since PLUGIN_FINISH is the place where diagnostics are supposed to be >> printed, I was wondering if there was an equivalent iterator for all >> translation units (from which I could derive functions, from which I >> could derive basic blocks) that just "FINISH"ed compiling? > > > Answering my own question for historical purposes and anyone else who > might need this: > > FOR_EACH_VEC_ELT(*all_translation_units, i, t) > > is exactly what I was looking for! > > Sorry for the earlier spam and thank you for your patience! > Will Well, I thought that this was what I wanted, but it turns out perhaps I was wrong. So, I am turning back for some help. Again, i apologize for the incessant emails. I would have thought that a translation unit tree node's chain would point to all the nested tree nodes. This does not seem to be the case, however. Am I missing something? Or is this the intended behavior? Again, thank you for your patience! Will > >> >> The other way to approach the problem, I suppose, is to just >> accumulate those stats at the end of each pass execution phase and >> then simply print them when PLUGIN_FINISH is invoked. >> >> I'm sorry to make this so difficult. I am just wondering about the way >> that the community expects the plugins to be written in the most >> modular fashion. >> >> Thanks again for walking me through all this! >> Will >> >>> >>> I will just have to experiment to see exactly when the right time to >>> invoke this plugin to get the best data. >>> >>> Thanks again! >>> Will >>> >>>> >>>> jeff
Re: Basic Block Statistics
On Fri, May 19, 2017 at 4:40 PM, Jeff Law wrote: > On 05/17/2017 08:22 PM, Will Hawkins wrote: >> On Wed, May 17, 2017 at 2:59 PM, Will Hawkins wrote: >>> On Wed, May 17, 2017 at 2:41 PM, Will Hawkins wrote: >>>> On Wed, May 17, 2017 at 1:04 PM, Will Hawkins wrote: >>>>> On Wed, May 17, 2017 at 1:02 PM, Jeff Law wrote: >>>>>> On 05/17/2017 10:36 AM, Will Hawkins wrote: >>>>>>> As I started looking into this, it seems like PLUGIN_FINISH is where >>>>>>> my plugin will go. Everything is great so far. However, when plugins >>>>>>> at that event are invoked, they get no data. That means I will have to >>>>>>> look into global structures for information regarding the compilation. >>>>>>> Are there pointers to the documentation that describe the relevant >>>>>>> global data structures that are accessible at this point? >>>>>>> >>>>>>> I am looking through the source code and documentation and can't find >>>>>>> what I am looking for. I am happy to continue working, but thought I'd >>>>>>> ask just in case I was missing something silly. >>>>>>> >>>>>>> Thanks again for all your help getting me started on this! >>>>>> FOR_EACH_BB (bb) is what you're looking for. That will iterate over the >>>>>> basic blocks. >>>>> >>>>> Thank you so much for your response! >>>>> >>>>> I just found this as soon as you sent it. Sorry for wasting your time! >>>>> >>>>> >>>>>> >>>>>> Assuming you're running late, you'll then want to walk each insn within >>>>>> the bb. So something like this >>>>>> >>>>>> basic_block bb >>>>>> FOR_EACH_BB (bb) >>>>>> { >>>>>> rtx_insn *insn; >>>>>> FOR_BB_INSNS (bb, insn) >>>>>> { >>>>>> /* Do something with INSN. */ >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> Note that if you're running too late the CFG may have been released, in >>>>>> which case this code wouldn't do anything. >>>> >>>> This macro seems to require that there be a valid cfun. This seems to >>>> imply that the macro will work only where the plugin callback is >>>> invoked before/after a pass that does some optimization for a >>>> particular function. In particular, at PLUGIN_FINISH, cfun is NULL. >>>> This makes perfect sense. >>>> >>>> Since PLUGIN_FINISH is the place where diagnostics are supposed to be >>>> printed, I was wondering if there was an equivalent iterator for all >>>> translation units (from which I could derive functions, from which I >>>> could derive basic blocks) that just "FINISH"ed compiling? >>> >>> >>> Answering my own question for historical purposes and anyone else who >>> might need this: >>> >>> FOR_EACH_VEC_ELT(*all_translation_units, i, t) >>> >>> is exactly what I was looking for! >>> >>> Sorry for the earlier spam and thank you for your patience! >>> Will >> >> >> Well, I thought that this was what I wanted, but it turns out perhaps >> I was wrong. So, I am turning back for some help. Again, i apologize >> for the incessant emails. >> >> I would have thought that a translation unit tree node's chain would >> point to all the nested tree nodes. This does not seem to be the case, >> however. Am I missing something? Or is this the intended behavior? > I think there's a fundamental misunderstanding. You are right, Mr. Law. I'm really sorry for the confusion. I got things straightened out in my head and now I am making great progress. > > We don't hold the RTL IR for all the functions in a translation unit in > memory at the same time. You have to look at the RTL IR for each as its > generated. Thank you, as ever, for your continued input. I am going to continue to work and I will keep everyone on the list posted and let you know when it is complete. Thanks again and have a great rest of the weekend! Will > > jeff
Re: Basic Block Statistics
I just wanted to send a quick follow up. Thanks to the incredible support on this list from Mr. Law and support in IRC from segher, djgpp and dmalcolm, I was able to put together a serviceable little plugin that does some very basic statistic generation on basic blocks. Here is a link to the source with information about how to build/run: https://github.com/whh8b/bb_stats If you are interested in more information, just send me an email. Thanks again for everyone's help! Will On Sat, May 20, 2017 at 11:29 PM, Will Hawkins wrote: > On Fri, May 19, 2017 at 4:40 PM, Jeff Law wrote: >> On 05/17/2017 08:22 PM, Will Hawkins wrote: >>> On Wed, May 17, 2017 at 2:59 PM, Will Hawkins wrote: >>>> On Wed, May 17, 2017 at 2:41 PM, Will Hawkins wrote: >>>>> On Wed, May 17, 2017 at 1:04 PM, Will Hawkins wrote: >>>>>> On Wed, May 17, 2017 at 1:02 PM, Jeff Law wrote: >>>>>>> On 05/17/2017 10:36 AM, Will Hawkins wrote: >>>>>>>> As I started looking into this, it seems like PLUGIN_FINISH is where >>>>>>>> my plugin will go. Everything is great so far. However, when plugins >>>>>>>> at that event are invoked, they get no data. That means I will have to >>>>>>>> look into global structures for information regarding the compilation. >>>>>>>> Are there pointers to the documentation that describe the relevant >>>>>>>> global data structures that are accessible at this point? >>>>>>>> >>>>>>>> I am looking through the source code and documentation and can't find >>>>>>>> what I am looking for. I am happy to continue working, but thought I'd >>>>>>>> ask just in case I was missing something silly. >>>>>>>> >>>>>>>> Thanks again for all your help getting me started on this! >>>>>>> FOR_EACH_BB (bb) is what you're looking for. That will iterate over the >>>>>>> basic blocks. >>>>>> >>>>>> Thank you so much for your response! >>>>>> >>>>>> I just found this as soon as you sent it. Sorry for wasting your time! >>>>>> >>>>>> >>>>>>> >>>>>>> Assuming you're running late, you'll then want to walk each insn within >>>>>>> the bb. So something like this >>>>>>> >>>>>>> basic_block bb >>>>>>> FOR_EACH_BB (bb) >>>>>>> { >>>>>>> rtx_insn *insn; >>>>>>> FOR_BB_INSNS (bb, insn) >>>>>>> { >>>>>>> /* Do something with INSN. */ >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> >>>>>>> Note that if you're running too late the CFG may have been released, in >>>>>>> which case this code wouldn't do anything. >>>>> >>>>> This macro seems to require that there be a valid cfun. This seems to >>>>> imply that the macro will work only where the plugin callback is >>>>> invoked before/after a pass that does some optimization for a >>>>> particular function. In particular, at PLUGIN_FINISH, cfun is NULL. >>>>> This makes perfect sense. >>>>> >>>>> Since PLUGIN_FINISH is the place where diagnostics are supposed to be >>>>> printed, I was wondering if there was an equivalent iterator for all >>>>> translation units (from which I could derive functions, from which I >>>>> could derive basic blocks) that just "FINISH"ed compiling? >>>> >>>> >>>> Answering my own question for historical purposes and anyone else who >>>> might need this: >>>> >>>> FOR_EACH_VEC_ELT(*all_translation_units, i, t) >>>> >>>> is exactly what I was looking for! >>>> >>>> Sorry for the earlier spam and thank you for your patience! >>>> Will >>> >>> >>> Well, I thought that this was what I wanted, but it turns out perhaps >>> I was wrong. So, I am turning back for some help. Again, i apologize >>> for the incessant emails. >>> >>> I would have thought that a translation unit tree node's chain would >>> point to all the nested tree nodes. This does not seem to be the case, >>> however. Am I missing something? Or is this the intended behavior? >> I think there's a fundamental misunderstanding. > > You are right, Mr. Law. I'm really sorry for the confusion. I got > things straightened out in my head and now I am making great progress. >> >> We don't hold the RTL IR for all the functions in a translation unit in >> memory at the same time. You have to look at the RTL IR for each as its >> generated. > > Thank you, as ever, for your continued input. I am going to continue > to work and I will keep everyone on the list posted and let you know > when it is complete. > > Thanks again and have a great rest of the weekend! > > Will >> >> jeff
Re: GCC and Meltdown and Spectre vulnerabilities
On Thu, Jan 4, 2018 at 10:10 PM, Eric Gallager wrote: > Is there anything GCC could be doing at the compiler level to mitigate > the recently-announced Meltdown and Spectre vulnerabilities? From > reading about them, it seems like they involve speculative execution > and indirect branch prediction, and those are the domain of things the > compiler deals with, right? (For reference, Meltdown is CVE-2017-5754, > and Spectre is CVE-2017-5753 and CVE-2017-5715) > > Just wondering, > Eric Check out https://support.google.com/faqs/answer/7625886 and especially http://git.infradead.org/users/dwmw2/gcc-retpoline.git/shortlog/refs/heads/gcc-7_2_0-retpoline-20171219 I'd love to hear what other people have heard! Will
Re: About Bug 52485
Thanks to your brand new Bugzilla account, you may now comment! :-) You will receive instructions on how to reset your default default password and access your account. Please let me know if you have any questions or trouble gaining access. I'd be happy to help in any way that I can! Thanks for contributing to GCC! Will On Wed, May 9, 2018 at 4:08 AM, SHIH YEN-TE wrote: > Want to comment on "Bug 52485 - [c++11] add an option to disable c++11 > user-defined literals" > > > It's a pity GCC doesn't support this, which forces me to give up introducing > newer C++ standard into my project. I know it is ridiculous, but we must know > the real world is somehow ridiculous as well as nothing is perfect. > >
Re: [RFC][PATCH 0/5] arch: atomic rework
On Thu, Feb 06, 2014 at 06:55:01PM +, Ramana Radhakrishnan wrote: > On 02/06/14 18:25, David Howells wrote: > > > > Is it worth considering a move towards using C11 atomics and barriers and > > compiler intrinsics inside the kernel? The compiler _ought_ to be able to > > do > > these. > > > It sounds interesting to me, if we can make it work properly and > reliably. + gcc@gcc.gnu.org for others in the GCC community to chip in. Given my (albeit limited) experience playing with the C11 spec and GCC, I really think this is a bad idea for the kernel. It seems that nobody really agrees on exactly how the C11 atomics map to real architectural instructions on anything but the trivial architectures. For example, should the following code fire the assert? extern atomic foo, bar, baz; void thread1(void) { foo.store(42, memory_order_relaxed); bar.fetch_add(1, memory_order_seq_cst); baz.store(42, memory_order_relaxed); } void thread2(void) { while (baz.load(memory_order_seq_cst) != 42) { /* do nothing */ } assert(foo.load(memory_order_seq_cst) == 42); } To answer that question, you need to go and look at the definitions of synchronises-with, happens-before, dependency_ordered_before and a whole pile of vaguely written waffle to realise that you don't know. Certainly, the code that arm64 GCC currently spits out would allow the assertion to fire on some microarchitectures. There are also so many ways to blow your head off it's untrue. For example, cmpxchg takes a separate memory model parameter for failure and success, but then there are restrictions on the sets you can use for each. It's not hard to find well-known memory-ordering experts shouting "Just use memory_model_seq_cst for everything, it's too hard otherwise". Then there's the fun of load-consume vs load-acquire (arm64 GCC completely ignores consume atm and optimises all of the data dependencies away) as well as the definition of "data races", which seem to be used as an excuse to miscompile a program at the earliest opportunity. Trying to introduce system concepts (writes to devices, interrupts, non-coherent agents) into this mess is going to be an uphill battle IMHO. I'd just rather stick to the semantics we have and the asm volatile barriers. That's not to say I don't there's no room for improvement in what we have in the kernel. Certainly, I'd welcome allowing more relaxed operations on architectures that support them, but it needs to be something that at least the different architecture maintainers can understand how to implement efficiently behind an uncomplicated interface. I don't think that interface is C11. Just my thoughts on the matter... Will
Re: [RFC][PATCH 0/5] arch: atomic rework
Hello Torvald, It looks like Paul clarified most of the points I was trying to make (thanks Paul!), so I won't go back over them here. On Thu, Feb 06, 2014 at 09:09:25PM +, Torvald Riegel wrote: > Are you familiar with the formalization of the C11/C++11 model by Batty > et al.? > http://www.cl.cam.ac.uk/~mjb220/popl085ap-sewell.pdf > http://www.cl.cam.ac.uk/~mjb220/n3132.pdf > > They also have a nice tool that can run condensed examples and show you > all allowed (and forbidden) executions (it runs in the browser, so is > slow for larger examples), including nice annotated graphs for those: > http://svr-pes20-cppmem.cl.cam.ac.uk/cppmem/ Thanks for the link, that's incredibly helpful. I've used ppcmem and armmem in the past, but I didn't realise they have a version for C++11 too. Actually, the armmem backend doesn't implement our atomic instructions or the acquire/release accessors, so it's not been as useful as it could be. I should probably try to learn OCaml... > IMHO, one thing worth considering is that for C/C++, the C11/C++11 is > the only memory model that has widespread support. So, even though it's > a fairly weak memory model (unless you go for the "only seq-cst" > beginners advice) and thus comes with a higher complexity, this model is > what likely most people will be familiar with over time. Deviating from > the "standard" model can have valid reasons, but it also has a cost in > that new contributors are more likely to be familiar with the "standard" > model. Indeed, I wasn't trying to write-off the C11 memory model as something we can never use in the kernel. I just don't think the current situation is anywhere close to usable for a project such as Linux. If a greater understanding of the memory model does eventually manifest amongst C/C++ developers (by which I mean, the beginners advice is really treated as such and there is a widespread intuition about ordering guarantees, as opposed to the need to use formal tools), then surely the tools and libraries will stabilise and provide uniform semantics across the 25+ architectures that Linux currently supports. If *that* happens, this discussion is certainly worth having again. Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Fri, Feb 07, 2014 at 05:06:54PM +, Peter Zijlstra wrote: > On Fri, Feb 07, 2014 at 04:55:48PM +0000, Will Deacon wrote: > > Hi Paul, > > > > On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote: > > > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote: > > > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote: > > > > > Hopefully some discussion of out-of-thin-air values as well. > > > > > > > > Yes, absolutely shoot store speculation in the head already. Then drive > > > > a wooden stake through its hart. > > > > > > > > C11/C++11 should not be allowed to claim itself a memory model until > > > > that > > > > is sorted. > > > > > > There actually is a proposal being put forward, but it might not make ARM > > > and Power people happy because it involves adding a compare, a branch, > > > and an ISB/isync after every relaxed load... Me, I agree with you, > > > much preferring the no-store-speculation approach. > > > > Can you elaborate a bit on this please? We don't permit speculative stores > > in the ARM architecture, so it seems counter-intuitive that GCC needs to > > emit any additional instructions to prevent that from happening. > > > > Stores can, of course, be observed out-of-order but that's a lot more > > reasonable :) > > This is more about the compiler speculating on stores; imagine: > > if (x) > y = 1; > else > y = 2; > > The compiler is allowed to change that into: > > y = 2; > if (x) > y = 1; > > Which is of course a big problem when you want to rely on the ordering. Understood, but that doesn't explain why Paul wants to add ISB/isync instructions which affect the *CPU* rather than the compiler! Will
Re: [RFC][PATCH 0/5] arch: atomic rework
Hi Paul, On Fri, Feb 07, 2014 at 04:50:28PM +, Paul E. McKenney wrote: > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote: > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote: > > > Hopefully some discussion of out-of-thin-air values as well. > > > > Yes, absolutely shoot store speculation in the head already. Then drive > > a wooden stake through its hart. > > > > C11/C++11 should not be allowed to claim itself a memory model until that > > is sorted. > > There actually is a proposal being put forward, but it might not make ARM > and Power people happy because it involves adding a compare, a branch, > and an ISB/isync after every relaxed load... Me, I agree with you, > much preferring the no-store-speculation approach. Can you elaborate a bit on this please? We don't permit speculative stores in the ARM architecture, so it seems counter-intuitive that GCC needs to emit any additional instructions to prevent that from happening. Stores can, of course, be observed out-of-order but that's a lot more reasonable :) Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote: > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > > As near as I can tell, compiler writers hate the idea of prohibiting > > speculative-store optimizations because it requires them to introduce > > both control and data dependency tracking into their compilers. Many of > > them seem to hate dependency tracking with a purple passion. At least, > > such a hatred would go a long way towards explaining the incomplete > > and high-overhead implementations of memory_order_consume, the long > > and successful use of idioms based on the memory_order_consume pattern > > notwithstanding [*]. ;-) > > Just tell them that because the hardware provides control dependencies > we actually use and rely on them. s/control/address/ ? Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 03:04:43PM +, Paul E. McKenney wrote: > On Mon, Feb 10, 2014 at 11:49:29AM +0000, Will Deacon wrote: > > On Mon, Feb 10, 2014 at 11:48:13AM +, Peter Zijlstra wrote: > > > On Fri, Feb 07, 2014 at 10:02:16AM -0800, Paul E. McKenney wrote: > > > > As near as I can tell, compiler writers hate the idea of prohibiting > > > > speculative-store optimizations because it requires them to introduce > > > > both control and data dependency tracking into their compilers. Many of > > > > them seem to hate dependency tracking with a purple passion. At least, > > > > such a hatred would go a long way towards explaining the incomplete > > > > and high-overhead implementations of memory_order_consume, the long > > > > and successful use of idioms based on the memory_order_consume pattern > > > > notwithstanding [*]. ;-) > > > > > > Just tell them that because the hardware provides control dependencies > > > we actually use and rely on them. > > > > s/control/address/ ? > > Both are important, but as Peter's reply noted, it was control > dependencies under discussion. Data dependencies (which include the > ARM/PowerPC notion of address dependencies) are called out by the standard > already, but control dependencies are not. I am not all that satisified > by current implementations of data dependencies, admittedly. Should > be an interesting discussion. ;-) Ok, but since you can't use control dependencies to order LOAD -> LOAD, it's a pretty big ask of the compiler to make use of them for things like consume, where a data dependency will suffice for any combination of accesses. Will
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 17, 2014 at 06:59:31PM +, Joseph S. Myers wrote: > On Sat, 15 Feb 2014, Torvald Riegel wrote: > > > glibc is a counterexample that comes to mind, although it's a smaller > > code base. (It's currently not using C11 atomics, but transitioning > > there makes sense, and some thing I want to get to eventually.) > > glibc is using C11 atomics (GCC builtins rather than _Atomic / > , but using __atomic_* with explicitly specified memory model > rather than the older __sync_*) on AArch64, plus in certain cases on ARM > and MIPS. Hmm, actually that results in a change in behaviour for the __sync_* primitives on AArch64. The documentation for those states that: `In most cases, these built-in functions are considered a full barrier. That is, no memory operand is moved across the operation, either forward or backward. Further, instructions are issued as necessary to prevent the processor from speculating loads across the operation and from queuing stores after the operation.' which is stronger than simply mapping them to memory_model_seq_cst, which seems to be what the AArch64 compiler is doing (so you get acquire + release instead of a full fence). Will
Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote: > Don't try instrumenting functions in arch/arm64/kernel/vdso/vgettimeofday.c. > Otherwise that can cause issues if the cleanup pass of stackleak gcc plugin > is disabled. > > Signed-off-by: Alexander Popov > --- > arch/arm64/kernel/vdso/Makefile | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile > index 3862cad2410c..9b84cafbd2da 100644 > --- a/arch/arm64/kernel/vdso/Makefile > +++ b/arch/arm64/kernel/vdso/Makefile > @@ -32,7 +32,8 @@ UBSAN_SANITIZE := n > OBJECT_FILES_NON_STANDARD:= y > KCOV_INSTRUMENT := n > > -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables > +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \ > + $(DISABLE_STACKLEAK_PLUGIN) I can pick this one up via arm64, thanks. Are there any other plugins we should be wary of? It looks like x86 filters out $(GCC_PLUGINS_CFLAGS) when building the vDSO. Will
Re: [PATCH 5/5] gcc-plugins/stackleak: Don't instrument vgettimeofday.c in arm64 VDSO
On Tue, Jun 09, 2020 at 12:09:27PM -0700, Kees Cook wrote: > On Thu, Jun 04, 2020 at 02:58:06PM +0100, Will Deacon wrote: > > On Thu, Jun 04, 2020 at 04:49:57PM +0300, Alexander Popov wrote: > > > Don't try instrumenting functions in > > > arch/arm64/kernel/vdso/vgettimeofday.c. > > > Otherwise that can cause issues if the cleanup pass of stackleak gcc > > > plugin > > > is disabled. > > > > > > Signed-off-by: Alexander Popov > > > --- > > > arch/arm64/kernel/vdso/Makefile | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/arch/arm64/kernel/vdso/Makefile > > > b/arch/arm64/kernel/vdso/Makefile > > > index 3862cad2410c..9b84cafbd2da 100644 > > > --- a/arch/arm64/kernel/vdso/Makefile > > > +++ b/arch/arm64/kernel/vdso/Makefile > > > @@ -32,7 +32,8 @@ UBSAN_SANITIZE := n > > > OBJECT_FILES_NON_STANDARD:= y > > > KCOV_INSTRUMENT := n > > > > > > -CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables > > > +CFLAGS_vgettimeofday.o = -O2 -mcmodel=tiny -fasynchronous-unwind-tables \ > > > + $(DISABLE_STACKLEAK_PLUGIN) > > > > I can pick this one up via arm64, thanks. Are there any other plugins we > > should be wary of? It looks like x86 filters out $(GCC_PLUGINS_CFLAGS) > > when building the vDSO. > > I didn't realize/remember that arm64 retained the kernel build flags for > vDSO builds. (I'm used to x86 throwing all its flags away for its vDSO.) > > How does 32-bit ARM do its vDSO? > > My quick run-through on plugins: > > arm_ssp_per_task_plugin.c > 32-bit ARM only (but likely needs disabling for 32-bit ARM vDSO?) On arm64, the 32-bit toolchain is picked up via CC_COMPAT -- does that still get the plugins? > cyc_complexity_plugin.c > compile-time reporting only > > latent_entropy_plugin.c > this shouldn't get triggered for the vDSO (no __latent_entropy > nor __init attributes in vDSO), but perhaps explicitly disabling > it would be a sensible thing to do, just for robustness? > > randomize_layout_plugin.c > this shouldn't get triggered (again, lacking attributes), but > should likely be disabled too. > > sancov_plugin.c > This should be tracking the KCOV directly (see > scripts/Makefile.kcov), which is already disabled here. > > structleak_plugin.c > This should be fine in the vDSO, but there's not security > boundary here, so it wouldn't be important to KEEP it enabled. Thanks for going through these. In general though, it seems like an opt-in strategy would make more sense, as it doesn't make an awful lot of sense to me for the plugins to be used to build the vDSO. So I would prefer that this patch filters out $(GCC_PLUGINS_CFLAGS). Will
Re: [PATCH v2 3/5] arm64: vdso: Don't use gcc plugins for building vgettimeofday.c
On Wed, Jun 24, 2020 at 03:33:28PM +0300, Alexander Popov wrote: > Don't use gcc plugins for building arch/arm64/kernel/vdso/vgettimeofday.c > to avoid unneeded instrumentation. > > Signed-off-by: Alexander Popov > --- > arch/arm64/kernel/vdso/Makefile | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile > index 556d424c6f52..0f1ad63b3326 100644 > --- a/arch/arm64/kernel/vdso/Makefile > +++ b/arch/arm64/kernel/vdso/Makefile > @@ -29,7 +29,7 @@ ldflags-y := -shared -nostdlib -soname=linux-vdso.so.1 > --hash-style=sysv \ > ccflags-y := -fno-common -fno-builtin -fno-stack-protector -ffixed-x18 > ccflags-y += -DDISABLE_BRANCH_PROFILING > > -CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) > +CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) > $(GCC_PLUGINS_CFLAGS) > KBUILD_CFLAGS+= $(DISABLE_LTO) > KASAN_SANITIZE := n > UBSAN_SANITIZE := n > -- > 2.25.4 I'll pick this one up as a fix for 5.8, please let me know if that's a problem. Will
Re: [PATCH v2 0/5] Improvements of the stackleak gcc plugin
On Wed, 24 Jun 2020 15:33:25 +0300, Alexander Popov wrote: > This is the v2 of the patch series with various improvements of the > stackleak gcc plugin. > > The first three patches disable unneeded gcc plugin instrumentation for > some files. > > The fourth patch is the main improvement. It eliminates an unwanted > side-effect of kernel code instrumentation performed by stackleak gcc > plugin. This patch is a deep reengineering of the idea described on > grsecurity blog: > https://grsecurity.net/resolving_an_unfortunate_stackleak_interaction > > [...] Applied to arm64 (for-next/fixes), thanks! [1/1] arm64: vdso: Don't use gcc plugins for building vgettimeofday.c https://git.kernel.org/arm64/c/e56404e8e475 Cheers, -- Will https://fixes.arm64.dev https://next.arm64.dev https://will.arm64.dev
Re: Re: typeof and operands in named address spaces
On Tue, Nov 17, 2020 at 11:31:57AM -0800, Linus Torvalds wrote: > On Tue, Nov 17, 2020 at 11:25 AM Jakub Jelinek wrote: > > > > It would need to be typeof( (typeof(type)) (type) ) to not be that > > constrained on what kind of expressions it accepts as arguments. > > Yup. > > > Anyway, it won't work with array types at least, > > int a[10]; > > typeof ((typeof (a)) (a)) b; > > is an error (in both gcc and clang), while typeof (a) b; will work > > (but not drop the qualifiers). Don't know if the kernel cares or not. > > Well, the kernel already doesn't allow that, because our existing > horror only handles simple integer scalar types. > > So that macro is a clear improvement - if it actually works (local > testing says it does, but who knows about random compiler versions > etc) I'll give it a go now, although if it works I honestly won't know whether to laugh or cry. Will
Re: Re: typeof and operands in named address spaces
On Tue, Nov 17, 2020 at 09:10:53PM +, Will Deacon wrote: > On Tue, Nov 17, 2020 at 11:31:57AM -0800, Linus Torvalds wrote: > > On Tue, Nov 17, 2020 at 11:25 AM Jakub Jelinek wrote: > > > > > > It would need to be typeof( (typeof(type)) (type) ) to not be that > > > constrained on what kind of expressions it accepts as arguments. > > > > Yup. > > > > > Anyway, it won't work with array types at least, > > > int a[10]; > > > typeof ((typeof (a)) (a)) b; > > > is an error (in both gcc and clang), while typeof (a) b; will work > > > (but not drop the qualifiers). Don't know if the kernel cares or not. > > > > Well, the kernel already doesn't allow that, because our existing > > horror only handles simple integer scalar types. > > > > So that macro is a clear improvement - if it actually works (local > > testing says it does, but who knows about random compiler versions > > etc) > > I'll give it a go now, although if it works I honestly won't know whether > to laugh or cry. GCC 9 and Clang 11 both seem to generate decent code for aarch64 defconfig with: #define __unqual_scalar_typeof(x) typeof( (typeof(x)) (x)) replacing the current monstrosity. allnoconfig and allmodconfig build fine too. However, GCC 4.9.0 goes mad and starts spilling to the stack when dealing with a pointer to volatile, as though we were just using typeof(). I tried GCC 5.4.0 and that looks ok, so I think if anybody cares about the potential performance regression with 4.9 then perhaps they should consider upgrading their toolchain. In other words, let's do it. Will
Kick-starting P1997 implementation, array copy semantics
P1997 Relaxing Restrictions on Array https://wg21.link/p1997 proposes copy semantics for C array; initialization and assignment of arrays from arrays, and array as a function return type. For C++, a new placeholder deduction syntax is proposed. The paper was seen for the first time on Friday by SG22, The Joint C and C++ Liaison Study Group. It was well received. The next step is implementation experience... I'm looking to kick-start a gcc branch for this work. My gcc dev-fu is padawan level (two small patches ~ two years ago) so I may need a kick-start myself. If anyone has an interest or experience in array mechanics, C or C++, I could do with a mentor / lifeline / phone-a-friend. I've joined the gcc developer IRC as doodle. At some point, ABI specifications for array return types will be needed. Thanks, Will
Re: [PATCH] arm64/io: Remind compiler that there is a memory side effect
On Sun, Apr 03, 2022 at 09:47:47AM +0200, Ard Biesheuvel wrote: > On Sun, 3 Apr 2022 at 09:47, Ard Biesheuvel wrote: > > On Sun, 3 Apr 2022 at 09:38, Andrew Pinski wrote: > > > It might not be the most restricted fix but it is a fix. > > > The best fix is to tell that you are writing to that location of memory. > > > volatile asm does not do what you think it does. > > > You didn't read further down about memory clobbers: > > > https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers > > > Specifically this part: > > > The "memory" clobber tells the compiler that the assembly code > > > performs memory reads or writes to items other than those listed in > > > the input and output operands > > > > > > > So should we be using "m"(*addr) instead of "r"(addr) here? > > > > (along with the appropriately sized casts) > > I mean "=m" not "m" That can generate writeback addressing modes, which I think breaks MMIO virtualisation. We usually end up using "Q" instead but the codegen tends to be worse iirc. In any case, for this specific problem I think we either need a fixed compiler or some kbuild magic to avoid using it / disable the new behaviour. We rely on 'asm volatile' not being elided in other places too. Will
Re: htsearch broken?
Hans-Peter Nilsson-2 wrote: > > If you mean "latest" instead of "earliest", it's because the > search engine has stopped indexing, permanently. No ETA; I'm > not sure it'll be fixed at all. > Try search Nabble, the gcc user list is archived here: http://www.nabble.com/gcc---General-f1157.html Posts from the list are updated to the minute. The new posts will be almost immediately searchable. Nabble also have a combined gcc archive which combines all the lists from gcc - this includes the gcc-fortran, gcc-help, gcc-java... Instead of searching every list individually, you can search all the lists in one place: http://www.nabble.com/gcc-f1154.html Regards, Will L Nabble.com -- Sent from the gcc - General forum at Nabble.com: http://www.nabble.com/htsearch-broken--t704225.html#a1879186
Re: htsearch broken?
Jonathan Wakely wrote: > > Please note this is NOT, I repeat NOT, a GCC users list - this is a GCC > developers list. There have been several mails sent via nabble.com > to this list that should have been sent to gcc-help instead. > > jon > Jon, Sorry for the confusion. I just corrected the info on Nabble, and I quoted your remarks in the description. The url is now: http://www.nabble.com/gcc---Dev-f1157.html Nabble will develop a feature that will allow a user like you to correct this type of mistake by yourself. It will be kind of like wiki. But for now, thanks again for pointing this out. Regards, Will L Nabble.com -- Sent from the gcc - Dev forum at Nabble.com: http://www.nabble.com/htsearch-broken--t704225.html#a1884523
Re: GCC mailing list archive search omits results after May 2005
> Re: GCC mailing list archive search omits results after May 2005 I have been following this thread of discussion. I am a little puzzled. Google and Gmail are both free but they are not "free" software according to the FSF definition. But does it matter? We still use them for work. Gmane is completely free and non-commercial, but its free-ness is still somehow questioned. How free is free? Don't get me wrong, I know what real free is and I appreciate it, but still I want to be practical. I am a member of the Nabble project (similiar to Gmane), so I have a selfish interest in discussing this with you guys. Below is my view of the pros and cons of different alternatives: 1. Google - Not free, has ads But the real problem is that Google does not index all the posts, and we don't know what are the criteria for indexing. One thing for sure, the recent posts will take days or weeks to get crawled and put into the index. So using Google to search is uncertain. Some people in this thread of discussion have noticed this. 2. Gmail - Not free, has ads But the real problem is that it is not open to the public. Yes, I can search my gmail, but what about the newcomers to this list? Where do they search? 3. Gmane - free, no ads I don't see a problem with using Gmane to search this list. 4. Nabble - Not free, no ads now but will have ads eventually Gmane is the pioneer. Nabble tries to do better. The main improvement is to allow cross search and browing of multiple lists. You can search or browse all GCC lists here: http://www.nabble.com/gcc-f1154.html You can also drill down to the child node for this list http://www.nabble.com/gcc---Dev-f1157.html Nabble allows a lot more parameters in fine-tuning a search. Try a search, then click the 'Search Tips' link. The problem with Nabble is that it only started archiving GCC lists half a year ago, so the data is not complete. But if the mbox file is still available, we can probably do a custom import. I am not involved in any of the gcc work, but I hope this helps your cause. Regards, Will L Nabble.com -- Sent from the gcc - Dev forum at Nabble.com: http://www.nabble.com/GCC-mailing-list-archive-search-omits-results-after-May-2005-t738227.html#a1963126