Re: [MIPS] Avoiding FP operations/register usage
Matthew Fortune writes: > I'm still interested in how successfully the MIPS backend is managing to > avoid floating point but I am also convinced there are bugs in ld.so > entry points for MIPS. It uses the standard mechanism to avoid it, which is marking uses of FP registers for integer moves, loads and stores with "*". This tells the register allocator to ignore those alternatives. AFAIK it is effective and I think any cases where it doesn't work would be fair bug reports. It becomes a lot more difficult to define with things like the Loongson extensions though, since some of those are also useful as scalar integer operations. And of course the same goes for MSA. Thanks, Richard
Re: Fwd: LLVM collaboration?
Hi Jan, I think this is a very good example where we could all collaborate (including binutils). I'll leave your reply intact, so that Chandler (CC'd) can get a bit more context. I'm copying him because he (and I believe Diego) had more contact with LTO than I had. If I got it right, LTO today: - needs the drivers to explicitly declare the plugin - needs the library available somewhere - may have to change the library loading semantics (via LD_PRELOAD) Since both toolchains do the magic, binutils has no incentive to create any automatic detection of objects. The part that I didn't get is when you say about backward compatibility. Would LTO work on a newer binutils with the liblto but on an older compiler that knew nothing about LTO? Your proposal is, then, to get binutils: - recognizing LTO logic in the objects - automatically loading liblto if recognized - warning if not I'm assuming the extra symbols would be discarded if no library is found, together with the warning, right? Maybe an error if -Wall or whatever. Can we get someone from the binutils community to opine on that? cheers, --renato On 11 February 2014 02:29, Jan Hubicka wrote: > One practical experience I have with LLVM developers is sharing experiences > about getting Firefox to work with LTO with Rafael Espindola and I think it > was > useful for both of us. I am definitly open to more discussion. > > Lets try a specific topic that is on my TODO list for some time. > > I would like to make it possible for mutliple compilers to be used to LTO a > single binary. As we are all making LTO more useful, I think it is matter of > time until people will start shipping LTO object files by default and users > will end up feeding them into different compilers or incompatible version of > the same compiler. We probably want to make this work, even thought the > cross-module optimization will not happen in this case. > > The plugin interface in binutils seems to do its job well both for GCC and > LLVM > and I hope that open64 and ICC will eventually join, too. > > The trouble however is that one needs to pass explicit --plugin argument > specifying the particular plugin to load and so GCC ships with its own > wrappers > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar > thing. > > It may be smoother if binutils was able to load multiple plugins at once and > grab plugins from system and user installed compilers without explicit > --plugin > argument. > > Binutils probably should also have a way to detect LTO object files and > produce > more useful diagnostic than they do now, when there is no plugin claiming > them. > > There are some PRs filled on the topic > http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300 > http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227 > but not much progress on them. > > I wonder if we can get this designed and implemented. > > On the other hand, GCC current maintains non-plugin path for LTO that is now > only used by darwin port due to lack of plugin enabled LD there. It seems > that liblto used by darwin is losely compatible with the plugin API, but it > makes > it harder to have different compilers share it (one has to LD_PRELOAD liblto > to different one prior executing the linker?) > > I wonder, is there chance to implement linker plugin API to libLTO glue or add > plugin support to native Darwin tools? > > Honza
RE: [MIPS] Avoiding FP operations/register usage
> Matthew Fortune writes: > > I'm still interested in how successfully the MIPS backend is managing > > to avoid floating point but I am also convinced there are bugs in > > ld.so entry points for MIPS. > > It uses the standard mechanism to avoid it, which is marking uses of FP > registers for integer moves, loads and stores with "*". This tells the > register > allocator to ignore those alternatives. AFAIK it is effective and I think any > cases where it doesn't work would be fair bug reports. I understand that '*' has no effect on whether reload/LRA will use the alternative though so I take that to mean they could still allocate FP regs as part of an integer move? > It becomes a lot more difficult to define with things like the Loongson > extensions though, since some of those are also useful as scalar integer > operations. And of course the same goes for MSA. Indeed. Avoiding FP registers 99.9% of time is fine for performance, it's the potential 0.1% I'm concerned about for correctness. I'm tending towards accounting for potential FPU usage even from integer only source just to be safe. I don't want to ever be the one debugging something like ld.so in the face of this kind of bug. I'll move the discussion to glibc regarding ld.so. Regards, Matthew
Re: Fwd: LLVM collaboration?
On Tuesday 11 February 2014 03:25 PM, Renato Golin wrote: Hi Jan, I think this is a very good example where we could all collaborate (including binutils). I'll leave your reply intact, so that Chandler (CC'd) can get a bit more context. I'm copying him because he (and I believe Diego) had more contact with LTO than I had. If I got it right, LTO today: - needs the drivers to explicitly declare the plugin - needs the library available somewhere - may have to change the library loading semantics (via LD_PRELOAD) There is another need that I have felt in LTO for quite some time. Currently, it has a non-partitioned mode or a partitioned mode but this decision is taken before the compilation begins. It would be nice to have a mode that allows dynamic loading of function bodies so that a flow and context sensitive IPA could load functions bodies on demand, and unload them when they are not needed. Uday. Since both toolchains do the magic, binutils has no incentive to create any automatic detection of objects. The part that I didn't get is when you say about backward compatibility. Would LTO work on a newer binutils with the liblto but on an older compiler that knew nothing about LTO? Your proposal is, then, to get binutils: - recognizing LTO logic in the objects - automatically loading liblto if recognized - warning if not I'm assuming the extra symbols would be discarded if no library is found, together with the warning, right? Maybe an error if -Wall or whatever. Can we get someone from the binutils community to opine on that? cheers, --renato On 11 February 2014 02:29, Jan Hubicka wrote: One practical experience I have with LLVM developers is sharing experiences about getting Firefox to work with LTO with Rafael Espindola and I think it was useful for both of us. I am definitly open to more discussion. Lets try a specific topic that is on my TODO list for some time. I would like to make it possible for mutliple compilers to be used to LTO a single binary. As we are all making LTO more useful, I think it is matter of time until people will start shipping LTO object files by default and users will end up feeding them into different compilers or incompatible version of the same compiler. We probably want to make this work, even thought the cross-module optimization will not happen in this case. The plugin interface in binutils seems to do its job well both for GCC and LLVM and I hope that open64 and ICC will eventually join, too. The trouble however is that one needs to pass explicit --plugin argument specifying the particular plugin to load and so GCC ships with its own wrappers (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing. It may be smoother if binutils was able to load multiple plugins at once and grab plugins from system and user installed compilers without explicit --plugin argument. Binutils probably should also have a way to detect LTO object files and produce more useful diagnostic than they do now, when there is no plugin claiming them. There are some PRs filled on the topic http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300 http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227 but not much progress on them. I wonder if we can get this designed and implemented. On the other hand, GCC current maintains non-plugin path for LTO that is now only used by darwin port due to lack of plugin enabled LD there. It seems that liblto used by darwin is losely compatible with the plugin API, but it makes it harder to have different compilers share it (one has to LD_PRELOAD liblto to different one prior executing the linker?) I wonder, is there chance to implement linker plugin API to libLTO glue or add plugin support to native Darwin tools? Honza
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote: > On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel wrote: > > > > Intuitively, this is wrong because this let's the program take a step > > the abstract machine wouldn't do. This is different to the sequential > > code that Peter posted because it uses atomics, and thus one can't > > easily assume that the difference is not observable. > > Btw, what is the definition of "observable" for the atomics? > > Because I'm hoping that it's not the same as for volatiles, where > "observable" is about the virtual machine itself, and as such volatile > accesses cannot be combined or optimized at all. > > Now, I claim that atomic accesses cannot be done speculatively for > writes, and not re-done for reads (because the value could change), > but *combining* them would be possible and good. > > For example, we often have multiple independent atomic accesses that > could certainly be combined: testing the individual bits of an atomic > value with helper functions, causing things like "load atomic, test > bit, load same atomic, test another bit". The two atomic loads could > be done as a single load without possibly changing semantics on a real > machine, but if "visibility" is defined in the same way it is for > "volatile", that wouldn't be a valid transformation. Right now we use > "volatile" semantics for these kinds of things, and they really can > hurt. > > Same goes for multiple writes (possibly due to setting bits): > combining multiple accesses into a single one is generally fine, it's > *adding* write accesses speculatively that is broken by design.. > > At the same time, you can't combine atomic loads or stores infinitely > - "visibility" on a real machine definitely is about timeliness. > Removing all but the last write when there are multiple consecutive > writes is generally fine, even if you unroll a loop to generate those > writes. But if what remains is a loop, it might be a busy-loop > basically waiting for something, so it would be wrong ("untimely") to > hoist a store in a loop entirely past the end of the loop, or hoist a > load in a loop to before the loop. > > Does the standard allow for that kind of behavior? You asked! ;-) So the current standard allows merging of both loads and stores, unless of course ordring constraints prevent the merging. Volatile semantics may be used to prevent this merging, if desired, for example, for real-time code. Infinite merging is intended to be prohibited, but I am not certain that the current wording is bullet-proof (1.10p24 and 1.10p25). The only prohibition against speculative stores that I can see is in a non-normative note, and it can be argued to apply only to things that are not atomics (1.10p22). I don't see any prohibition against reordering a store to precede a load preceding a conditional branch -- which would not be speculative if the branch was know to be taken and the load hit in the store buffer. In a system where stores could be reordered, some other CPU might perceive the store as happening before the load that controlled the conditional branch. This needs to be addressed. Why this hole? At the time, the current formalizations of popular CPU architectures did not exist, and it was not clear that all popular hardware avoided speculative stores. There is also fun with "out of thin air" values, which everyone agrees should be prohibited, but where there is not agreement on how to prohibit them in a mathematically constructive manner. The current draft contains a clause simply stating that out-of-thin-air values are prohibited, which doesn't help someone constructing tools to analyze C++ code. One proposal requires that subsequent atomic stores never be reordered before prior atomic loads, which requires useless ordering code to be emitted on ARM and PowerPC (you may have seen Will Deacon's and Peter Zijlstra's reaction to this proposal a few days ago). Note that Itanium already pays this price in order to provide full single-variable cache coherence. This out-of-thin-air discussion is also happening in the Java community in preparation for a new rev of the Java memory model. There will also be some discussions on memory_order_consume, which is intended to (eventually) implement rcu_dereference(). The compiler writers don't like tracking dependencies, but there may be some ways of constraining optimizations to preserve the common dependencies that, while providing some syntax to force preservation of dependencies that would normally be optimized out. One example of this is where you have an RCU-protected array that might sometimes contain only a single element. In the single-element case, the compiler knows a priori which element will be used, and will therefore optimize the dependency away, so that the reader might see pre-initialization state. But this is rare, so if added syntax needs to be added in this case, I believe we should be OK with it. (If syntax is
Re: Fwd: LLVM collaboration?
> > > > > On Tuesday 11 February 2014 03:25 PM, Renato Golin wrote: > >Hi Jan, > > > >I think this is a very good example where we could all collaborate > >(including binutils). > > > >I'll leave your reply intact, so that Chandler (CC'd) can get a bit > >more context. I'm copying him because he (and I believe Diego) had > >more contact with LTO than I had. > > > >If I got it right, LTO today: > > > >- needs the drivers to explicitly declare the plugin Yes. > >- needs the library available somewhere > >- may have to change the library loading semantics (via LD_PRELOAD) Not in binutils implementation (I believe it is the case for darwin's libLTO). With binutils you only need to pass explicit --plugin argument into all tools that care (ld/ar/nm/ranlib) > > There is another need that I have felt in LTO for quite some time. > Currently, it has a non-partitioned mode or a partitioned mode but > this decision is taken before the compilation begins. It would be > nice to have a mode that allows dynamic loading of function bodies > so that a flow and context sensitive IPA could load functions bodies > on demand, and unload them when they are not needed. I implemented on-demand loading of function bodies into GCC-4.8 if I recall correctly. Currently I thinko only Martin Liska's code unification pass uses it to verify that two function bdoes it thinks are equivalent are actually equivalent. Hopefully it will be merged into 4.10. > > Uday. > > > > >Since both toolchains do the magic, binutils has no incentive to > >create any automatic detection of objects. > > > >The part that I didn't get is when you say about backward > >compatibility. Would LTO work on a newer binutils with the liblto but > >on an older compiler that knew nothing about LTO? > > > >Your proposal is, then, to get binutils: > > > >- recognizing LTO logic in the objects > >- automatically loading liblto if recognized > >- warning if not I basically think that binutils should have a way for installed compiler to register a plugin and load all plugins by default (or perhaps for performance or upon detecking an compatible LTO object file in some way, perhaps also by information given in the config file) and let them claim the LTO objects they understand to. With the backward compatibility I mean that if we release a new version of compiler that can no longer read the LTO objects of older compiler, one can just install both versions and have their plugins to claim only LTO objects they understand. Just if they were two different compilers. Finally I think we can make binutils to recognize GCC/LLVM LTO objects as a special case and produce friendly message when user try to handle them witout plugin as oposed to today strange errors about file formats or missing symbols. Honza > > > >I'm assuming the extra symbols would be discarded if no library is > >found, together with the warning, right? Maybe an error if -Wall or > >whatever. > > > >Can we get someone from the binutils community to opine on that? > > > >cheers, > >--renato > > > >On 11 February 2014 02:29, Jan Hubicka wrote: > >>One practical experience I have with LLVM developers is sharing experiences > >>about getting Firefox to work with LTO with Rafael Espindola and I think it > >>was > >>useful for both of us. I am definitly open to more discussion. > >> > >>Lets try a specific topic that is on my TODO list for some time. > >> > >>I would like to make it possible for mutliple compilers to be used to LTO a > >>single binary. As we are all making LTO more useful, I think it is matter of > >>time until people will start shipping LTO object files by default and users > >>will end up feeding them into different compilers or incompatible version of > >>the same compiler. We probably want to make this work, even thought the > >>cross-module optimization will not happen in this case. > >> > >>The plugin interface in binutils seems to do its job well both for GCC and > >>LLVM > >>and I hope that open64 and ICC will eventually join, too. > >> > >>The trouble however is that one needs to pass explicit --plugin argument > >>specifying the particular plugin to load and so GCC ships with its own > >>wrappers > >>(gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar > >>thing. > >> > >>It may be smoother if binutils was able to load multiple plugins at once and > >>grab plugins from system and user installed compilers without explicit > >>--plugin > >>argument. > >> > >>Binutils probably should also have a way to detect LTO object files and > >>produce > >>more useful diagnostic than they do now, when there is no plugin claiming > >>them. > >> > >>There are some PRs filled on the topic > >>http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300 > >>http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227 > >>but not much progress on them. > >> > >>I wonder if we can get this designed and implemented. > >> > >>On the other hand, GCC current maintains non-plugin path for LT
Re: Fwd: LLVM collaboration?
On Tuesday 11 February 2014 09:30 PM, Jan Hubicka wrote: On Tuesday 11 February 2014 03:25 PM, Renato Golin wrote: Hi Jan, I think this is a very good example where we could all collaborate (including binutils). I'll leave your reply intact, so that Chandler (CC'd) can get a bit more context. I'm copying him because he (and I believe Diego) had more contact with LTO than I had. If I got it right, LTO today: - needs the drivers to explicitly declare the plugin Yes. - needs the library available somewhere - may have to change the library loading semantics (via LD_PRELOAD) Not in binutils implementation (I believe it is the case for darwin's libLTO). With binutils you only need to pass explicit --plugin argument into all tools that care (ld/ar/nm/ranlib) There is another need that I have felt in LTO for quite some time. Currently, it has a non-partitioned mode or a partitioned mode but this decision is taken before the compilation begins. It would be nice to have a mode that allows dynamic loading of function bodies so that a flow and context sensitive IPA could load functions bodies on demand, and unload them when they are not needed. I implemented on-demand loading of function bodies into GCC-4.8 if I recall correctly. Currently I thinko only Martin Liska's code unification pass uses it to verify that two function bdoes it thinks are equivalent are actually equivalent. Hopefully it will be merged into 4.10. Great. We will experiment with it. Uday. Uday. Since both toolchains do the magic, binutils has no incentive to create any automatic detection of objects. The part that I didn't get is when you say about backward compatibility. Would LTO work on a newer binutils with the liblto but on an older compiler that knew nothing about LTO? Your proposal is, then, to get binutils: - recognizing LTO logic in the objects - automatically loading liblto if recognized - warning if not I basically think that binutils should have a way for installed compiler to register a plugin and load all plugins by default (or perhaps for performance or upon detecking an compatible LTO object file in some way, perhaps also by information given in the config file) and let them claim the LTO objects they understand to. With the backward compatibility I mean that if we release a new version of compiler that can no longer read the LTO objects of older compiler, one can just install both versions and have their plugins to claim only LTO objects they understand. Just if they were two different compilers. Finally I think we can make binutils to recognize GCC/LLVM LTO objects as a special case and produce friendly message when user try to handle them witout plugin as oposed to today strange errors about file formats or missing symbols. Honza I'm assuming the extra symbols would be discarded if no library is found, together with the warning, right? Maybe an error if -Wall or whatever. Can we get someone from the binutils community to opine on that? cheers, --renato On 11 February 2014 02:29, Jan Hubicka wrote: One practical experience I have with LLVM developers is sharing experiences about getting Firefox to work with LTO with Rafael Espindola and I think it was useful for both of us. I am definitly open to more discussion. Lets try a specific topic that is on my TODO list for some time. I would like to make it possible for mutliple compilers to be used to LTO a single binary. As we are all making LTO more useful, I think it is matter of time until people will start shipping LTO object files by default and users will end up feeding them into different compilers or incompatible version of the same compiler. We probably want to make this work, even thought the cross-module optimization will not happen in this case. The plugin interface in binutils seems to do its job well both for GCC and LLVM and I hope that open64 and ICC will eventually join, too. The trouble however is that one needs to pass explicit --plugin argument specifying the particular plugin to load and so GCC ships with its own wrappers (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing. It may be smoother if binutils was able to load multiple plugins at once and grab plugins from system and user installed compilers without explicit --plugin argument. Binutils probably should also have a way to detect LTO object files and produce more useful diagnostic than they do now, when there is no plugin claiming them. There are some PRs filled on the topic http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300 http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227 but not much progress on them. I wonder if we can get this designed and implemented. On the other hand, GCC current maintains non-plugin path for LTO that is now only used by darwin port due to lack of plugin enabled LD there. It seems that liblto used by darwin is losely compatible with the plugin API, but it makes it harder to have different
Re: Fwd: LLVM collaboration?
On 11 February 2014 16:00, Jan Hubicka wrote: > I basically think that binutils should have a way for installed compiler to > register a plugin and load all plugins by default (or perhaps for performance > or upon detecking an compatible LTO object file in some way, perhaps also by > information given in the config file) and let them claim the LTO objects they > understand to. Right, so this would be not necessarily related to LTO, but with the binutils plugin system. In my very limited experience with LTO and binutils, I can't see how that would be different from just adding a --plugin option on the compiler, unless it's something that the linker would detect automatically without the interference of any compiler. > With the backward compatibility I mean that if we release a new version of > compiler that can no longer read the LTO objects of older compiler, one can > just install both versions and have their plugins to claim only LTO objects > they understand. Just if they were two different compilers. Yes, this makes total sense. > Finally I think we can make binutils to recognize GCC/LLVM LTO objects > as a special case and produce friendly message when user try to handle > them witout plugin as oposed to today strange errors about file formats > or missing symbols. Yes, that as well seems pretty obvious, and mostly orthogonal to the other two proposals. cheers, --renato PS: Removing Chandler, as he was not the right person to look at this. I'll check with others in the LLVM list to chime in on this thread.
Re: i370 port
Hello all. I have previously succeeded in getting configure to work for gcc 3.4.6. Unfortunately gcc 3.4.6 is too buggy to use and needs to wait for Dave Pitts or someone to fix. gcc 3.2.3 has no known bugs for the i370 target, but it has not been done using "configure". I am now trying to get gcc 3.2.3 to build via configure using the same technique I used for gcc 3.4.6. Some differences I found so far are as follows: I needed to define the size of short etc which I didn't need to do with 3.4.6: export ac_cv_func_strncmp_works=yes export ac_cv_c_bigendian=yes export ac_cv_c_compile_endian=big-endian export ac_cv_sizeof_short=2 export ac_cv_sizeof_int=4 export ac_cv_sizeof_long=4 export ac_cv_c_float_format='IBM 370 hex' And "make", after this configure: ./configure --build=x86_64-unknown-linux-gnu --host=i370-mvspdp --target=i370-mvspdp --prefix=/devel/mvshost --enable-languages=c --disable-nls is failing here: make[2]: Leaving directory `/home/users/k/ke/kerravon86/devel/gcc/x86_64-unknown -linux-gnu/libiberty' rm -f *~ Makefile config.status xhost-mkfrag TAGS multilib.out rm -f config.log rmdir testsuite 2>/dev/null make[1]: [distclean] Error 1 (ignored) make[1]: Leaving directory `/home/users/k/ke/kerravon86/devel/gcc/x86_64-unknown -linux-gnu/libiberty' loading cache ../config.cache configure: error: can not find install-sh or install.sh in ./.. ././.. make: *** [configure-build-libiberty] Error 1 The file in question seems to exist: ~/devel/gcc>find . -name install-sh ./boehm-gc/install-sh ./install-sh ./fastjar/install-sh ~/devel/gcc>find . -name install.sh ~/devel/gcc> and is executable. Any suggestions? Thanks. Paul.
Re: Fwd: LLVM collaboration?
Now copying Rafael, which can give us some more insight on the LLVM LTO side. cheers, --renato On 11 February 2014 09:55, Renato Golin wrote: > Hi Jan, > > I think this is a very good example where we could all collaborate > (including binutils). > > I'll leave your reply intact, so that Chandler (CC'd) can get a bit > more context. I'm copying him because he (and I believe Diego) had > more contact with LTO than I had. > > If I got it right, LTO today: > > - needs the drivers to explicitly declare the plugin > - needs the library available somewhere > - may have to change the library loading semantics (via LD_PRELOAD) > > Since both toolchains do the magic, binutils has no incentive to > create any automatic detection of objects. > > The part that I didn't get is when you say about backward > compatibility. Would LTO work on a newer binutils with the liblto but > on an older compiler that knew nothing about LTO? > > Your proposal is, then, to get binutils: > > - recognizing LTO logic in the objects > - automatically loading liblto if recognized > - warning if not > > I'm assuming the extra symbols would be discarded if no library is > found, together with the warning, right? Maybe an error if -Wall or > whatever. > > Can we get someone from the binutils community to opine on that? > > cheers, > --renato > > On 11 February 2014 02:29, Jan Hubicka wrote: >> One practical experience I have with LLVM developers is sharing experiences >> about getting Firefox to work with LTO with Rafael Espindola and I think it >> was >> useful for both of us. I am definitly open to more discussion. >> >> Lets try a specific topic that is on my TODO list for some time. >> >> I would like to make it possible for mutliple compilers to be used to LTO a >> single binary. As we are all making LTO more useful, I think it is matter of >> time until people will start shipping LTO object files by default and users >> will end up feeding them into different compilers or incompatible version of >> the same compiler. We probably want to make this work, even thought the >> cross-module optimization will not happen in this case. >> >> The plugin interface in binutils seems to do its job well both for GCC and >> LLVM >> and I hope that open64 and ICC will eventually join, too. >> >> The trouble however is that one needs to pass explicit --plugin argument >> specifying the particular plugin to load and so GCC ships with its own >> wrappers >> (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar >> thing. >> >> It may be smoother if binutils was able to load multiple plugins at once and >> grab plugins from system and user installed compilers without explicit >> --plugin >> argument. >> >> Binutils probably should also have a way to detect LTO object files and >> produce >> more useful diagnostic than they do now, when there is no plugin claiming >> them. >> >> There are some PRs filled on the topic >> http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300 >> http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227 >> but not much progress on them. >> >> I wonder if we can get this designed and implemented. >> >> On the other hand, GCC current maintains non-plugin path for LTO that is now >> only used by darwin port due to lack of plugin enabled LD there. It seems >> that liblto used by darwin is losely compatible with the plugin API, but it >> makes >> it harder to have different compilers share it (one has to LD_PRELOAD liblto >> to different one prior executing the linker?) >> >> I wonder, is there chance to implement linker plugin API to libLTO glue or >> add >> plugin support to native Darwin tools? >> >> Honza
Re: Fwd: LLVM collaboration?
On 11 February 2014 12:28, Renato Golin wrote: > Now copying Rafael, which can give us some more insight on the LLVM LTO side. Thanks. > On 11 February 2014 09:55, Renato Golin wrote: >> Hi Jan, >> >> I think this is a very good example where we could all collaborate >> (including binutils). It is. Both LTO models (LLVM and GCC) were considered form the start of the API design and I think we got a better plugin model as a result. >> If I got it right, LTO today: >> >> - needs the drivers to explicitly declare the plugin >> - needs the library available somewhere True. >> - may have to change the library loading semantics (via LD_PRELOAD) That depends on the library being loaded. RPATH works just fine too. >> Since both toolchains do the magic, binutils has no incentive to >> create any automatic detection of objects. It is mostly a historical decision. At the time the design was for the plugin to be matched to the compiler, and so the compiler could pass that information down to the linker. > The trouble however is that one needs to pass explicit --plugin argument > specifying the particular plugin to load and so GCC ships with its own > wrappers > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar > thing. These wrappers should not be necessary. While the linker currently requires a command line option, bfd has support for searching for a plugin. It will search /lib/bfd-plugin. See for example the instructions at http://llvm.org/docs/GoldPlugin.html. This was done because ar and nm are not normally bound to any compiler. Had we realized this issue earlier we would probably have supported searching for plugins in the linker too. So it seems that what you want could be done by * having bfd-ld and gold search bfd-plugins (maybe rename the directory?) * support loading multiple plugins, and asking each to see if it supports a given file. That ways we could LTO when having a part GCC and part LLVM build. * maybe be smart about version and load new ones first? (libLLVM-3.4 before libLLVM-3.3 for example). Probably the first one should always be the one given in the command line. For OS X the situation is a bit different. There instead of a plugin the linker loads a library: libLTO.dylib. When doing LTO with a newer llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting that from clang some time ago, but I don't remember the outcome. In theory GCC could implement a libLTO.dylib and set DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a API mapping the other way, so the job would be inverting it. The LTO model ld64 is a bit more strict about knowing all symbol definitions and uses (including inline asm), so there would be work to be done to cover that, but the simple cases shouldn't be too hard. Cheers, Rafael
Re: Fwd: LLVM collaboration?
On 2014.02.11 at 13:02 -0500, Rafael Espíndola wrote: > On 11 February 2014 12:28, Renato Golin wrote: > > Now copying Rafael, which can give us some more insight on the LLVM LTO > > side. > > Thanks. > > > On 11 February 2014 09:55, Renato Golin wrote: > >> Hi Jan, > >> > >> I think this is a very good example where we could all collaborate > >> (including binutils). > > It is. Both LTO models (LLVM and GCC) were considered form the start > of the API design and I think we got a better plugin model as a > result. > > >> If I got it right, LTO today: > >> > >> - needs the drivers to explicitly declare the plugin > >> - needs the library available somewhere > > True. > > >> - may have to change the library loading semantics (via LD_PRELOAD) > > That depends on the library being loaded. RPATH works just fine too. > > >> Since both toolchains do the magic, binutils has no incentive to > >> create any automatic detection of objects. > > It is mostly a historical decision. At the time the design was for the > plugin to be matched to the compiler, and so the compiler could pass > that information down to the linker. > > > The trouble however is that one needs to pass explicit --plugin argument > > specifying the particular plugin to load and so GCC ships with its own > > wrappers > > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar > > thing. > > These wrappers should not be necessary. While the linker currently > requires a command line option, bfd has support for searching for a > plugin. It will search /lib/bfd-plugin. See for example the > instructions at http://llvm.org/docs/GoldPlugin.html. Please note that this automatic loading of the plugin only happens for non-ELF files. So the LLVM GoldPlugin gets loaded fine, but automatic loading of gcc's liblto_plugin.so doesn't work at the moment. A basic implementation to support both plugins seamlessly should be pretty straightforward, because LLVM's bitstream file format (non-ELF) is easily distinguishable from gcc's output (standard ELF with special sections). -- Markus
Zero-cost toolchain "standardization" process
Hi Folks, First of all, I'd like to thank everyone for their great responses and heart warming encouragement for such an enterprise. This will be my last email about this subject on these lists, so I'd like to just let everyone know what (and where) I'll be heading next with this topic. Feel free to reply to me personally, I don't want to span an ugly two-list thread. As many of you noted, not everyone is actively interested in this, and for good reasons. The last thing we want is yet-another standard getting in the way of actually implementing features and innovating, which both LLVM and GCC are good at. Following the comments on the GCC list, slashdot and Phoronix forums, I think the only sensible thing is to do what everyone said we should: talk. Also, just this week, we got GCC developers having patches accepted in LLVM (sanitizers) and LLVM developers discussing LTO strategies on the GCC list. Both interactions have already shown need for improvements on both sides. This is a *really* good start! The proposal, then, is to have a zero-cost process, where only the interested parties need to take action. A reactive system where standards are agreed *after* implementation. 1. A new feature request / implementation on one of the toolchains outlines what's being done on a shared place. Basically, copy and past from the active thread's summary into a shared area. 2. Interested parties (pre-registered) get a reminder that new content is available. From here, two things can happen: 2.1 The other toolchain has the feature, in which case developers should: 2.1.1 Agree that this is, indeed, the way they have implemented, and check the box: "standard agreed". 2.1.2 Disagree on the implementation and describe what they've done instead. 2.2 The other toolchain doesn't have it: 2.2.1 Agree with the implementation and mark as "standard agreed" and "future work". 2.2.2 Disagree on the implementation and mark as "to discuss". On both disagreement cases, pre-registered developers of both toolchains would receive emails outlining the conflict and they can discuss as much as they want until a common ground is decided, or not. It's perfectly fine to "agree to disagree", when no "common standard" is reached. Some important notes: * No toolchain should be forced to accommodate to the standard, but would be good to *at least* describe what they do instead, so that users don't get surprised. * No toolchain should be forced to keep with the agreed standard, and discussions to migrate to a better implementation would naturally happen on a cross-toolchain forum. * No toolchain should be forced to implement a feature just because the other toolchain did. It's perfectly fine to never implement it, if the need never arises. * No developer should be forced to follow the emails or even care about the process. Other developers on their own communities should, if necessary, enforce their own standards, on their own pace, that it could, or not, agree with the shared one. What is that different than doing nothing? First, and most important, it will log our cross-toolchain actions. Today, we only have two good examples of cross-interactions, and neither are visible from the other side. When (if) we start having more, it'd be good to be able to search through them, or contribute to them on an ad-hoc basis, if a new feature is proposed. We'll have a documented way of the non-standard things that we're doing, before they go on other standards, too. Second, it'll expose what we already have as "standard", and enable a common channel for like-minded people to solve problems on both toolchains. It'll also off-load both lists of having to follow any development, but will still have a way for those interested to discuss and agree to a common standard. Finally, once this "database" of implementation details is big enough, it would be easy to spot the conflicts, and it'll serve as a good TODO list for commoning up implementation details, or even for future compilers to choose one or the other. Entire projects, or thesis could be written based on them, fostering more innovation in the process. What now? Well, people really interested in building such a system should (for now) email me directly. If I get enough feedback, we can start discussing in private (or another list) about how we're going to progress. During the brainstorm phase, or if not enough people are interested, I still think we shouldn't stop talking. The interaction is already happening and it's really good, I think we should just continue and see where this takes us. Maybe by the GNU Cauldron, enough people would want to contribute, maybe later, maybe never. Whatever works! To be honest, I'm already really happy with the outcome, so for me, my target was achieved! I will report what happens during the next few months on the GCC+LLVM BoF, so if you're at least mildly interested, please do attend. For those looking for a few more answers to all the
Re: [buildrobot] spu / avr: Fallout from r207335
Hi Marek, On Sun, 2014-02-02 23:59:16 +0100, Jan-Benedict Glaw wrote: > Hi Marek, > > it seems your patch produced some fallout for > > avr: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=111296 > spu-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=111360 Current build logs: avr: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=132848 spu: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=135159 > Just some missed calls: > > g++ -c -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g -O2 -DIN_GCC > -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti > -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual > -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros > -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. > -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. > -I/home/jbglaw/repos/gcc/gcc/../include > -I/home/jbglaw/repos/gcc/gcc/../libcpp/include > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-I. -I. > -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. > -I/home/jbglaw/repos/gcc/gcc/../include > -I/home/jbglaw/repos/gcc/gcc/../libcpp/include > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libbacktrace > /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c > /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c: In function ‘tree_node* > avr_resolve_overloaded_builtin(unsigned int, tree_node*, void*)’: > /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c:118: error: conversion from > ‘tree_node*’ to non-scalar type ‘vec’ requested > /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c:184: error: conversion from > ‘tree_node*’ to non-scalar type ‘vec’ requested > /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c:241: error: conversion from > ‘tree_node*’ to non-scalar type ‘vec’ requested > make[1]: *** [avr-c.o] Error 1 > make[1]: Leaving directory `/home/jbglaw/build/avr/build-gcc/gcc' > make: *** [all-gcc] Error 2 > > > g++ -c -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g -O2 -DIN_GCC > -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti > -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual > -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros > -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. > -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. > -I/home/jbglaw/repos/gcc/gcc/../include > -I/home/jbglaw/repos/gcc/gcc/../libcpp/include > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-I. -I. > -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. > -I/home/jbglaw/repos/gcc/gcc/../include > -I/home/jbglaw/repos/gcc/gcc/../libcpp/include > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber > -I/home/jbglaw/repos/gcc/gcc/../libbacktrace \ > /home/jbglaw/repos/gcc/gcc/config/spu/spu-c.c > /home/jbglaw/repos/gcc/gcc/config/spu/spu-c.c: In function ‘tree_node* > spu_resolve_overloaded_builtin(location_t, tree_node*, void*)’: > /home/jbglaw/repos/gcc/gcc/config/spu/spu-c.c:184: error: conversion from > ‘tree_node*’ to non-scalar type ‘vec’ requested > make[1]: *** [spu-c.o] Error 1 > make[1]: Leaving directory `/home/jbglaw/build/spu-elf/build-gcc/gcc' > make: *** [all-gcc] Error 2 This isn't fixed up to now. Do you intend to work on the fallout? MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: http://perl.plover.com/Questions.html the second : signature.asc Description: Digital signature
Re: Fwd: LLVM collaboration?
> On 2014.02.11 at 13:02 -0500, Rafael Espíndola wrote: > > On 11 February 2014 12:28, Renato Golin wrote: > > > Now copying Rafael, which can give us some more insight on the LLVM LTO > > > side. > > > > Thanks. > > > > > On 11 February 2014 09:55, Renato Golin wrote: > > >> Hi Jan, > > >> > > >> I think this is a very good example where we could all collaborate > > >> (including binutils). > > > > It is. Both LTO models (LLVM and GCC) were considered form the start > > of the API design and I think we got a better plugin model as a > > result. > > > > >> If I got it right, LTO today: > > >> > > >> - needs the drivers to explicitly declare the plugin > > >> - needs the library available somewhere > > > > True. > > > > >> - may have to change the library loading semantics (via LD_PRELOAD) > > > > That depends on the library being loaded. RPATH works just fine too. > > > > >> Since both toolchains do the magic, binutils has no incentive to > > >> create any automatic detection of objects. > > > > It is mostly a historical decision. At the time the design was for the > > plugin to be matched to the compiler, and so the compiler could pass > > that information down to the linker. > > > > > The trouble however is that one needs to pass explicit --plugin argument > > > specifying the particular plugin to load and so GCC ships with its own > > > wrappers > > > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar > > > thing. > > > > These wrappers should not be necessary. While the linker currently > > requires a command line option, bfd has support for searching for a > > plugin. It will search /lib/bfd-plugin. See for example the > > instructions at http://llvm.org/docs/GoldPlugin.html. > > Please note that this automatic loading of the plugin only happens for > non-ELF files. So the LLVM GoldPlugin gets loaded fine, but automatic > loading of gcc's liblto_plugin.so doesn't work at the moment. Hmm, something that ought to be fixed. Binutils can probably know about GCC's LTO symbols it uses as a distniguisher. Is there a PR about this? > > A basic implementation to support both plugins seamlessly should be > pretty straightforward, because LLVM's bitstream file format (non-ELF) > is easily distinguishable from gcc's output (standard ELF with special > sections). I think it is easy even with two plugins for same file format - all ld need is to load the plugins and then do the file claiming for each of them. GCC plugin then should not claim files from LLVM or incompatible GCC version and vice versa. Honza > > -- > Markus
Re: Conditional execution over emit_move_insn
Hello, > I'd like to hardcode conditional execution of emit_move_insn based on the > predicate checking that the address in the destination argument is non-NULL. > The platform supports conditional execution, but doesn't have explicitly > defined conditional moves (target=tic6x). > I have already tried to find any look-alike pieces in the gcc code tree but > without success - I am new here. > As for the background - I am trying to work around the bug I submitted > (id=60123) before there's an official patch for it available. I have figured this out on my own. Please disregard. With best, Wojciech
Re: Fwd: LLVM collaboration?
> >> Since both toolchains do the magic, binutils has no incentive to > >> create any automatic detection of objects. > > It is mostly a historical decision. At the time the design was for the > plugin to be matched to the compiler, and so the compiler could pass > that information down to the linker. > > > The trouble however is that one needs to pass explicit --plugin argument > > specifying the particular plugin to load and so GCC ships with its own > > wrappers > > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar > > thing. > > These wrappers should not be necessary. While the linker currently > requires a command line option, bfd has support for searching for a > plugin. It will search /lib/bfd-plugin. See for example the > instructions at http://llvm.org/docs/GoldPlugin.html. My reading of bfd/plugin.c is that it basically walks the directory and looks for first plugin that returns OK for onload. (that is always the case for GCC/LLVM plugins). So if I instlal GCC and llvm plugin there it will depend who will end up being first and only that plugin will be used. We need multiple plugin support as suggested by the directory name ;) Also it sems that currently plugin is not used if file is ELF for ar/nm/ranlib (as mentioned by Markus) and also GNU-ld seems to choke on LLVM object files even if it has plugin. This probably needs ot be sanitized. > > This was done because ar and nm are not normally bound to any > compiler. Had we realized this issue earlier we would probably have > supported searching for plugins in the linker too. > > So it seems that what you want could be done by > > * having bfd-ld and gold search bfd-plugins (maybe rename the directory?) > * support loading multiple plugins, and asking each to see if it > supports a given file. That ways we could LTO when having a part GCC > and part LLVM build. Yes, that is what I have in mind. Plus perhaps additional configuration file to avoid loading everything. Say user instealls 3 versions of LLVM, open64 and ICC. If all of them loads as a shared library, like LLVM does, it will probably slow down the tools measurably. > * maybe be smart about version and load new ones first? (libLLVM-3.4 > before libLLVM-3.3 for example). Probably the first one should always > be the one given in the command line. Yes, i think we may want to prioritize the list. So user can prevail his own version of GCC over the system one, for example. > > For OS X the situation is a bit different. There instead of a plugin > the linker loads a library: libLTO.dylib. When doing LTO with a newer > llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting > that from clang some time ago, but I don't remember the outcome. > > In theory GCC could implement a libLTO.dylib and set > DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a > API mapping the other way, so the job would be inverting it. The LTO > model ld64 is a bit more strict about knowing all symbol definitions > and uses (including inline asm), so there would be work to be done to > cover that, but the simple cases shouldn't be too hard. I would not care that much about symbols in asm definitions to start with. Even if we will force users to non-LTO those object files, it would be an improvement over what we have now. One problem is that we need a volunteer to implement the reverse glue (libLTO->plugin API), since I do not have an OS X box (well, have an old G5, but even that is quite far from me right now) Why complete symbol tables are required? Can't ld64 be changed to ignore unresolved symbols in the first stage just like gold/gnu-ld does? Honza > > Cheers, > Rafael
Re: Fwd: LLVM collaboration?
> My reading of bfd/plugin.c is that it basically walks the directory and looks > for first plugin that returns OK for onload. (that is always the case for > GCC/LLVM plugins). So if I instlal GCC and llvm plugin there it will > depend who will end up being first and only that plugin will be used. > > We need multiple plugin support as suggested by the directory name ;) > > Also it sems that currently plugin is not used if file is ELF for ar/nm/ranlib > (as mentioned by Markus) and also GNU-ld seems to choke on LLVM object files > even if it has plugin. > > This probably needs ot be sanitized. CCing Hal Finkel. He got this to work some time ago. Not sure if he ever ported the patches to bfd trunk. >> For OS X the situation is a bit different. There instead of a plugin >> the linker loads a library: libLTO.dylib. When doing LTO with a newer >> llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting >> that from clang some time ago, but I don't remember the outcome. >> >> In theory GCC could implement a libLTO.dylib and set >> DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a >> API mapping the other way, so the job would be inverting it. The LTO >> model ld64 is a bit more strict about knowing all symbol definitions >> and uses (including inline asm), so there would be work to be done to >> cover that, but the simple cases shouldn't be too hard. > > I would not care that much about symbols in asm definitions to start with. > Even if we will force users to non-LTO those object files, it would be an > improvement over what we have now. > > One problem is that we need a volunteer to implement the reverse glue > (libLTO->plugin API), since I do not have an OS X box (well, have an old G5, > but even that is quite far from me right now) > > Why complete symbol tables are required? Can't ld64 be changed to ignore > unresolved symbols in the first stage just like gold/gnu-ld does? I am not sure about this. My *guess* is that it does dead stripping computation before asking libLTO for the object file. I noticed the issue while trying to LTO firefox some time ago. Cheers, Rafael
Re: LLVM collaboration?
- Original Message - > From: "Rafael Espíndola" > To: "Jan Hubicka" > Cc: "Renato Golin" , "gcc" , "Hal > Finkel" > Sent: Tuesday, February 11, 2014 3:38:40 PM > Subject: Re: Fwd: LLVM collaboration? > > > My reading of bfd/plugin.c is that it basically walks the directory > > and looks > > for first plugin that returns OK for onload. (that is always the > > case for > > GCC/LLVM plugins). So if I instlal GCC and llvm plugin there it > > will > > depend who will end up being first and only that plugin will be > > used. > > > > We need multiple plugin support as suggested by the directory name > > ;) > > > > Also it sems that currently plugin is not used if file is ELF for > > ar/nm/ranlib > > (as mentioned by Markus) and also GNU-ld seems to choke on LLVM > > object files > > even if it has plugin. > > > > This probably needs ot be sanitized. > > CCing Hal Finkel. He got this to work some time ago. Not sure if he > ever ported the patches to bfd trunk. I have a patch for binutils 2.24 (attached -- I think this works, I hand isolated it from my BG/Q patchset). I would not consider it to be of upstream quality, but I'd obviously appreciate any assistance on making everything clean and proper ;) -Hal > > >> For OS X the situation is a bit different. There instead of a > >> plugin > >> the linker loads a library: libLTO.dylib. When doing LTO with a > >> newer > >> llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed > >> setting > >> that from clang some time ago, but I don't remember the outcome. > >> > >> In theory GCC could implement a libLTO.dylib and set > >> DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically > >> a > >> API mapping the other way, so the job would be inverting it. The > >> LTO > >> model ld64 is a bit more strict about knowing all symbol > >> definitions > >> and uses (including inline asm), so there would be work to be done > >> to > >> cover that, but the simple cases shouldn't be too hard. > > > > I would not care that much about symbols in asm definitions to > > start with. > > Even if we will force users to non-LTO those object files, it would > > be an > > improvement over what we have now. > > > > One problem is that we need a volunteer to implement the reverse > > glue > > (libLTO->plugin API), since I do not have an OS X box (well, have > > an old G5, > > but even that is quite far from me right now) > > > > Why complete symbol tables are required? Can't ld64 be changed to > > ignore > > unresolved symbols in the first stage just like gold/gnu-ld does? > > I am not sure about this. My *guess* is that it does dead stripping > computation before asking libLTO for the object file. I noticed the > issue while trying to LTO firefox some time ago. > > Cheers, > Rafael > -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory diff --git a/bfd/elflink.c b/bfd/elflink.c index 99b7ca1..c2bf9c3 100644 --- a/bfd/elflink.c +++ b/bfd/elflink.c @@ -5054,7 +5054,9 @@ elf_link_add_archive_symbols (bfd *abfd, struct bfd_link_info *info) goto error_return; if (! bfd_check_format (element, bfd_object)) - goto error_return; + /* goto error_return; */ +/* this might be an object understood only by an LTO plugin */ +bfd_elf_make_object (element); /* Doublecheck that we have not included this object already--it should be impossible, but there may be diff --git a/ld/ldfile.c b/ld/ldfile.c index 16baef8..159a60c 100644 --- a/ld/ldfile.c +++ b/ld/ldfile.c @@ -38,6 +38,7 @@ #ifdef ENABLE_PLUGINS #include "plugin-api.h" #include "plugin.h" +#include "elf-bfd.h" #endif /* ENABLE_PLUGINS */ bfd_boolean ldfile_assumed_script = FALSE; @@ -124,6 +125,7 @@ bfd_boolean ldfile_try_open_bfd (const char *attempt, lang_input_statement_type *entry) { + int is_obj = 0; entry->the_bfd = bfd_openr (attempt, entry->target); if (verbose) @@ -168,6 +170,34 @@ ldfile_try_open_bfd (const char *attempt, { if (! bfd_check_format (check, bfd_object)) { +#ifdef ENABLE_PLUGINS + if (check == entry->the_bfd + && bfd_get_error () == bfd_error_file_not_recognized + && ! ldemul_unrecognized_file (entry)) + { + if (plugin_active_plugins_p () + && !no_more_claiming) +{ + int fd = open (attempt, O_RDONLY | O_BINARY); + if (fd >= 0) +{ + struct ld_plugin_input_file file; + + bfd_elf_make_object (entry->the_bfd); + + file.name = attempt; + file.offset = 0; + file.filesize = lseek (fd, 0, SEEK_END); + file.fd = fd; + plugin_maybe_claim (&file, entry); + + if (entry->flags.claimed) +return
Re: [RFC][PATCH 0/5] arch: atomic rework
On Sun, 2014-02-09 at 19:51 -0800, Paul E. McKenney wrote: > On Mon, Feb 10, 2014 at 01:06:48AM +0100, Torvald Riegel wrote: > > On Thu, 2014-02-06 at 20:20 -0800, Paul E. McKenney wrote: > > > On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote: > > > > On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote: > > > > > On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote: > > > > > > On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote: > > > > > > > On Thu, Feb 06, 2014 at 06:59:10PM +, Will Deacon wrote: > > > > > > > > There are also so many ways to blow your head off it's untrue. > > > > > > > > For example, > > > > > > > > cmpxchg takes a separate memory model parameter for failure and > > > > > > > > success, but > > > > > > > > then there are restrictions on the sets you can use for each. > > > > > > > > It's not hard > > > > > > > > to find well-known memory-ordering experts shouting "Just use > > > > > > > > memory_model_seq_cst for everything, it's too hard otherwise". > > > > > > > > Then there's > > > > > > > > the fun of load-consume vs load-acquire (arm64 GCC completely > > > > > > > > ignores consume > > > > > > > > atm and optimises all of the data dependencies away) as well as > > > > > > > > the definition > > > > > > > > of "data races", which seem to be used as an excuse to > > > > > > > > miscompile a program > > > > > > > > at the earliest opportunity. > > > > > > > > > > > > > > Trust me, rcu_dereference() is not going to be defined in terms of > > > > > > > memory_order_consume until the compilers implement it both > > > > > > > correctly and > > > > > > > efficiently. They are not there yet, and there is currently no > > > > > > > shortage > > > > > > > of compiler writers who would prefer to ignore > > > > > > > memory_order_consume. > > > > > > > > > > > > Do you have any input on > > > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448? In particular, > > > > > > the > > > > > > language standard's definition of dependencies? > > > > > > > > > > Let's see... 1.10p9 says that a dependency must be carried unless: > > > > > > > > > > — B is an invocation of any specialization of std::kill_dependency > > > > > (29.3), or > > > > > — A is the left operand of a built-in logical AND (&&, see 5.14) or > > > > > logical OR (||, see 5.15) operator, > > > > > or > > > > > — A is the left operand of a conditional (?:, see 5.16) operator, or > > > > > — A is the left operand of the built-in comma (,) operator (5.18); > > > > > > > > > > So the use of "flag" before the "?" is ignored. But the "flag - flag" > > > > > after the "?" will carry a dependency, so the code fragment in 59448 > > > > > needs to do the ordering rather than just optimizing "flag - flag" out > > > > > of existence. One way to do that on both ARM and Power is to actually > > > > > emit code for "flag - flag", but there are a number of other ways to > > > > > make that work. > > > > > > > > And that's what would concern me, considering that these requirements > > > > seem to be able to creep out easily. Also, whereas the other atomics > > > > just constrain compilers wrt. reordering across atomic accesses or > > > > changes to the atomic accesses themselves, the dependencies are new > > > > requirements on pieces of otherwise non-synchronizing code. The latter > > > > seems far more involved to me. > > > > > > Well, the wording of 1.10p9 is pretty explicit on this point. > > > There are only a few exceptions to the rule that dependencies from > > > memory_order_consume loads must be tracked. And to your point about > > > requirements being placed on pieces of otherwise non-synchronizing code, > > > we already have that with plain old load acquire and store release -- > > > both of these put ordering constraints that affect the surrounding > > > non-synchronizing code. > > > > I think there's a significant difference. With acquire/release or more > > general memory orders, it's true that we can't order _across_ the atomic > > access. However, we can reorder and optimize without additional > > constraints if we do not reorder. This is not the case with consume > > memory order, as the (p + flag - flag) example shows. > > Agreed, memory_order_consume does introduce additional restrictions. > > > > This issue got a lot of discussion, and the compromise is that > > > dependencies cannot leak into or out of functions unless the relevant > > > parameters or return values are annotated with [[carries_dependency]]. > > > This means that the compiler can see all the places where dependencies > > > must be tracked. This is described in 7.6.4. > > > > I wasn't aware of 7.6.4 (but it isn't referred to as an additional > > constraint--what it is--in 1.10, so I guess at least that should be > > fixed). > > Also, AFAIU, 7.6.4p3 is wrong in that the attribute does make a semantic > > difference, at least if one is assuming that normal optimization of > > sequential c
Re: [RFC][PATCH 0/5] arch: atomic rework
On Mon, 2014-02-10 at 11:09 -0800, Linus Torvalds wrote: > On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel wrote: > > > > Intuitively, this is wrong because this let's the program take a step > > the abstract machine wouldn't do. This is different to the sequential > > code that Peter posted because it uses atomics, and thus one can't > > easily assume that the difference is not observable. > > Btw, what is the definition of "observable" for the atomics? > > Because I'm hoping that it's not the same as for volatiles, where > "observable" is about the virtual machine itself, and as such volatile > accesses cannot be combined or optimized at all. No, atomics aren't an observable behavior of the abstract machine (unless they are volatile). See 1.8.p8 (citing the C++ standard). > Now, I claim that atomic accesses cannot be done speculatively for > writes, and not re-done for reads (because the value could change), Agreed, unless the compiler can prove that this doesn't make a difference in the program at hand and it's not volatile atomics. In general, that will be hard and thus won't happen often I suppose, but if correctly proved it would fall under the as-if rule I think. > but *combining* them would be possible and good. Agreed. > For example, we often have multiple independent atomic accesses that > could certainly be combined: testing the individual bits of an atomic > value with helper functions, causing things like "load atomic, test > bit, load same atomic, test another bit". The two atomic loads could > be done as a single load without possibly changing semantics on a real > machine, but if "visibility" is defined in the same way it is for > "volatile", that wouldn't be a valid transformation. Right now we use > "volatile" semantics for these kinds of things, and they really can > hurt. Agreed. In your example, the compiler would have to prove that the abstract machine would always be able to run the two loads atomically (ie, as one load) without running into impossible/disallowed behavior of the program. But if there's no loop or branch or such in-between, this should be straight-forward because any hardware oddity or similar could merge those loads and it wouldn't be disallowed by the standard (considering that we're talking about a finite number of loads), so the compiler would be allowed to do it as well. > Same goes for multiple writes (possibly due to setting bits): > combining multiple accesses into a single one is generally fine, it's > *adding* write accesses speculatively that is broken by design.. Agreed. As Paul points out, this being correct assumes that there are no other ordering guarantees or memory accesses "interfering", but if the stores are to the same memory location and adjacent to each other in the program, then I don't see a reason why they wouldn't be combinable. > At the same time, you can't combine atomic loads or stores infinitely > - "visibility" on a real machine definitely is about timeliness. > Removing all but the last write when there are multiple consecutive > writes is generally fine, even if you unroll a loop to generate those > writes. But if what remains is a loop, it might be a busy-loop > basically waiting for something, so it would be wrong ("untimely") to > hoist a store in a loop entirely past the end of the loop, or hoist a > load in a loop to before the loop. Agreed. That's what 1.10p24 and 1.10p25 are meant to specify for loads, although those might not be bullet-proof as Paul points out. Forward progress is rather vaguely specified in the standard, but at least parts of the committee (and people in ISO C++ SG1, in particular) are working on trying to improve this. > Does the standard allow for that kind of behavior? I think the standard requires (or intends to require) the behavior that you (and I) seem to prefer in these examples.
Re: [RFC][PATCH 0/5] arch: atomic rework
On Tue, 2014-02-11 at 07:59 -0800, Paul E. McKenney wrote: > On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote: > > On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel wrote: > > > > > > Intuitively, this is wrong because this let's the program take a step > > > the abstract machine wouldn't do. This is different to the sequential > > > code that Peter posted because it uses atomics, and thus one can't > > > easily assume that the difference is not observable. > > > > Btw, what is the definition of "observable" for the atomics? > > > > Because I'm hoping that it's not the same as for volatiles, where > > "observable" is about the virtual machine itself, and as such volatile > > accesses cannot be combined or optimized at all. > > > > Now, I claim that atomic accesses cannot be done speculatively for > > writes, and not re-done for reads (because the value could change), > > but *combining* them would be possible and good. > > > > For example, we often have multiple independent atomic accesses that > > could certainly be combined: testing the individual bits of an atomic > > value with helper functions, causing things like "load atomic, test > > bit, load same atomic, test another bit". The two atomic loads could > > be done as a single load without possibly changing semantics on a real > > machine, but if "visibility" is defined in the same way it is for > > "volatile", that wouldn't be a valid transformation. Right now we use > > "volatile" semantics for these kinds of things, and they really can > > hurt. > > > > Same goes for multiple writes (possibly due to setting bits): > > combining multiple accesses into a single one is generally fine, it's > > *adding* write accesses speculatively that is broken by design.. > > > > At the same time, you can't combine atomic loads or stores infinitely > > - "visibility" on a real machine definitely is about timeliness. > > Removing all but the last write when there are multiple consecutive > > writes is generally fine, even if you unroll a loop to generate those > > writes. But if what remains is a loop, it might be a busy-loop > > basically waiting for something, so it would be wrong ("untimely") to > > hoist a store in a loop entirely past the end of the loop, or hoist a > > load in a loop to before the loop. > > > > Does the standard allow for that kind of behavior? > > You asked! ;-) > > So the current standard allows merging of both loads and stores, unless of > course ordring constraints prevent the merging. Volatile semantics may be > used to prevent this merging, if desired, for example, for real-time code. Agreed. > Infinite merging is intended to be prohibited, but I am not certain that > the current wording is bullet-proof (1.10p24 and 1.10p25). Yeah, maybe not. But it at least seems to rather clearly indicate the intent ;) > The only prohibition against speculative stores that I can see is in a > non-normative note, and it can be argued to apply only to things that are > not atomics (1.10p22). I think this one is specifically about speculative stores that would affect memory locations that the abstract machine would not write to, and that might be observable or create data races. While a compiler could potentially prove that such stores aren't leading to a difference in the behavior of the program (e.g., by proving that there are no observers anywhere and this isn't overlapping with any volatile locations), I think that this is hard in general and most compilers will just not do such things. In GCC, bugs in that category were fixed after researchers doing fuzz-testing found them (IIRC, speculative stores by loops). > I don't see any prohibition against reordering > a store to precede a load preceding a conditional branch -- which would > not be speculative if the branch was know to be taken and the load > hit in the store buffer. In a system where stores could be reordered, > some other CPU might perceive the store as happening before the load > that controlled the conditional branch. This needs to be addressed. I don't know the specifics of your example, but from how I understand it, I don't see a problem if the compiler can prove that the store will always happen. To be more specific, if the compiler can prove that the store will happen anyway, and the region of code can be assumed to always run atomically (e.g., there's no loop or such in there), then it is known that we have one atomic region of code that will always perform the store, so we might as well do the stuff in the region in some order. Now, if any of the memory accesses are atomic, then the whole region of code containing those accesses is often not atomic because other threads might observe intermediate results in a data-race-free way. (I know that this isn't a very precise formulation, but I hope it brings my line of reasoning across.) > Why this hole? At the time, the current formalizations of popular > CPU architectures did not exist, and it was n
sparse overlapping structs for vectorization
I had a problem that got solved in an ugly way. I think gcc ought to provide a few ways to make a nicer solution. There was an array of structs roughly like so: struct{int w;float x;char y[4];short z[2];}foo[512][4]; The types within the struct are 4 bytes each; I don't actually remember anything else and it doesn't matter except that they are distinct. I think it was bitfields actually, neatly grouped into groups of 32 bits. In other words, like 4 4-byte values but with more-or-less incompatible types. Note that 4 of the structs neatly fill a 64-byte cache line. An alignment attribute was used to ensure 64-byte alignment. The most common operation needed on this array is to compare the first struct member of 4 of the structs against a given value, looking to see if there is a match. SSE would be good. This would then be followed by using the matching entry if there is one, else picking one of the 4 to recycle and thus use. First bad solution: One could load up 4 SSE registers, shuffle things around... NO. Second bad solution: One could simply have 4 distinct arrays. This is bad because there are different cache lines for w, x, y, and z. Third bad solution: The array can be viewed as "int foo[512][4][4]" instead, with the struct forming the third array index. Note that the last two array indexes are both 4, so you can kind of swap them around. This groups 4 fields of each type together, allowing SSE. The problem here is loss of type safety; one must use array indexes instead of struct field names. Like so: foo[idx][WHERE_W_IS][i] Fourth bad solution: We lay things out as in the third solution, but we cast pointers to effectively lay sparse structs over each other like shingles. { int w; int pad_wx[3]; float x; int pad_xy[3]; char y[4]; int pad_yz[3]; short z[2]; } Performance is hurt by the need for __may_alias__ and of course the result is painful to look at. We went with this anyway, using SSE intrinsics, and performance was great. Maintainability... not so much. BTW, an array of 512 structs containing 4-entry arrays was not used because we wanted to have a simple normal pointer to indicate the item being operated on. We didn't want to need a pointer,index pair. Can something be done to help out here? The first thing that pops into mind is the ability to tell gcc that the struct-to-struct byte offset for array indexing is a user-specified value instead of simply the struct size. It's possible we could have safely ignored the warning about aliasing. I don't know. Perhaps that would give even better performance, but the casting would still be very ugly. Solutions that that be defined away for non-gcc compilers are better.