date:20140211

Re: [MIPS] Avoiding FP operations/register usage

2014-02-11 Thread Richard Sandiford

Matthew Fortune  writes:
> I'm still interested in how successfully the MIPS backend is managing to
> avoid floating point but I am also convinced there are bugs in ld.so
> entry points for MIPS.

It uses the standard mechanism to avoid it, which is marking uses of
FP registers for integer moves, loads and stores with "*".  This tells
the register allocator to ignore those alternatives.  AFAIK it is
effective and I think any cases where it doesn't work would be fair
bug reports.

It becomes a lot more difficult to define with things like the Loongson
extensions though, since some of those are also useful as scalar integer
operations.  And of course the same goes for MSA.

Thanks,
Richard

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Renato Golin

Hi Jan,

I think this is a very good example where we could all collaborate
(including binutils).

I'll leave your reply intact, so that Chandler (CC'd) can get a bit
more context. I'm copying him because he (and I believe Diego) had
more contact with LTO than I had.

If I got it right, LTO today:

- needs the drivers to explicitly declare the plugin
- needs the library available somewhere
- may have to change the library loading semantics (via LD_PRELOAD)

Since both toolchains do the magic, binutils has no incentive to
create any automatic detection of objects.

The part that I didn't get is when you say about backward
compatibility. Would LTO work on a newer binutils with the liblto but
on an older compiler that knew nothing about LTO?

Your proposal is, then, to get binutils:

- recognizing LTO logic in the objects
- automatically loading liblto if recognized
- warning if not

I'm assuming the extra symbols would be discarded if no library is
found, together with the warning, right? Maybe an error if -Wall or
whatever.

Can we get someone from the binutils community to opine on that?

cheers,
--renato

On 11 February 2014 02:29, Jan Hubicka  wrote:
> One practical experience I have with LLVM developers is sharing experiences
> about getting Firefox to work with LTO with Rafael Espindola and I think it 
> was
> useful for both of us. I am definitly open to more discussion.
>
> Lets try a specific topic that is on my TODO list for some time.
>
> I would like to make it possible for mutliple compilers to be used to LTO a
> single binary. As we are all making LTO more useful, I think it is matter of
> time until people will start shipping LTO object files by default and users
> will end up feeding them into different compilers or incompatible version of
> the same compiler. We probably want to make this work, even thought the
> cross-module optimization will not happen in this case.
>
> The plugin interface in binutils seems to do its job well both for GCC and 
> LLVM
> and I hope that open64 and ICC will eventually join, too.
>
> The trouble however is that one needs to pass explicit --plugin argument
> specifying the particular plugin to load and so GCC ships with its own 
> wrappers
> (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
> thing.
>
> It may be smoother if binutils was able to load multiple plugins at once and
> grab plugins from system and user installed compilers without explicit 
> --plugin
> argument.
>
> Binutils probably should also have a way to detect LTO object files and 
> produce
> more useful diagnostic than they do now, when there is no plugin claiming 
> them.
>
> There are some PRs filled on the topic
> http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300
> http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227
> but not much progress on them.
>
> I wonder if we can get this designed and implemented.
>
> On the other hand, GCC current maintains non-plugin path for LTO that is now
> only used by darwin port due to lack of plugin enabled LD there.  It seems
> that liblto used by darwin is losely compatible with the plugin API, but it 
> makes
> it harder to have different compilers share it (one has to LD_PRELOAD liblto
> to different one prior executing the linker?)
>
> I wonder, is there chance to implement linker plugin API to libLTO glue or add
> plugin support to native Darwin tools?
>
> Honza

RE: [MIPS] Avoiding FP operations/register usage

2014-02-11 Thread Matthew Fortune

> Matthew Fortune  writes:
> > I'm still interested in how successfully the MIPS backend is managing
> > to avoid floating point but I am also convinced there are bugs in
> > ld.so entry points for MIPS.
> 
> It uses the standard mechanism to avoid it, which is marking uses of FP
> registers for integer moves, loads and stores with "*".  This tells the 
> register
> allocator to ignore those alternatives.  AFAIK it is effective and I think any
> cases where it doesn't work would be fair bug reports.

I understand that '*' has no effect on whether reload/LRA will use the 
alternative though so I take that to mean they could still allocate FP regs as 
part of an integer move?

> It becomes a lot more difficult to define with things like the Loongson
> extensions though, since some of those are also useful as scalar integer
> operations.  And of course the same goes for MSA.

Indeed.

Avoiding FP registers 99.9% of time is fine for performance, it's the potential 
0.1% I'm concerned about for correctness. I'm tending towards accounting for 
potential FPU usage even from integer only source just to be safe. I don't want 
to ever be the one debugging something like ld.so in the face of this kind of 
bug.

I'll move the discussion to glibc regarding ld.so.

Regards,
Matthew

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Uday Khedker






On Tuesday 11 February 2014 03:25 PM, Renato Golin wrote:

Hi Jan,

I think this is a very good example where we could all collaborate
(including binutils).

I'll leave your reply intact, so that Chandler (CC'd) can get a bit
more context. I'm copying him because he (and I believe Diego) had
more contact with LTO than I had.

If I got it right, LTO today:

- needs the drivers to explicitly declare the plugin
- needs the library available somewhere
- may have to change the library loading semantics (via LD_PRELOAD)


There is another need that I have felt in LTO for quite some time. 
Currently, it has a non-partitioned mode or a partitioned mode but this 
decision is taken before the compilation begins. It would be nice to 
have a mode that allows dynamic loading of function bodies so that a 
flow and context sensitive IPA could load functions bodies on demand, 
and unload them when they are not needed.


Uday.



Since both toolchains do the magic, binutils has no incentive to
create any automatic detection of objects.

The part that I didn't get is when you say about backward
compatibility. Would LTO work on a newer binutils with the liblto but
on an older compiler that knew nothing about LTO?

Your proposal is, then, to get binutils:

- recognizing LTO logic in the objects
- automatically loading liblto if recognized
- warning if not

I'm assuming the extra symbols would be discarded if no library is
found, together with the warning, right? Maybe an error if -Wall or
whatever.

Can we get someone from the binutils community to opine on that?

cheers,
--renato

On 11 February 2014 02:29, Jan Hubicka  wrote:

One practical experience I have with LLVM developers is sharing experiences
about getting Firefox to work with LTO with Rafael Espindola and I think it was
useful for both of us. I am definitly open to more discussion.

Lets try a specific topic that is on my TODO list for some time.

I would like to make it possible for mutliple compilers to be used to LTO a
single binary. As we are all making LTO more useful, I think it is matter of
time until people will start shipping LTO object files by default and users
will end up feeding them into different compilers or incompatible version of
the same compiler. We probably want to make this work, even thought the
cross-module optimization will not happen in this case.

The plugin interface in binutils seems to do its job well both for GCC and LLVM
and I hope that open64 and ICC will eventually join, too.

The trouble however is that one needs to pass explicit --plugin argument
specifying the particular plugin to load and so GCC ships with its own wrappers
(gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing.

It may be smoother if binutils was able to load multiple plugins at once and
grab plugins from system and user installed compilers without explicit --plugin
argument.

Binutils probably should also have a way to detect LTO object files and produce
more useful diagnostic than they do now, when there is no plugin claiming them.

There are some PRs filled on the topic
http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300
http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227
but not much progress on them.

I wonder if we can get this designed and implemented.

On the other hand, GCC current maintains non-plugin path for LTO that is now
only used by darwin port due to lack of plugin enabled LD there.  It seems
that liblto used by darwin is losely compatible with the plugin API, but it 
makes
it harder to have different compilers share it (one has to LD_PRELOAD liblto
to different one prior executing the linker?)

I wonder, is there chance to implement linker plugin API to libLTO glue or add
plugin support to native Darwin tools?

Honza

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-11 Thread Paul E. McKenney

On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote:
> On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel  wrote:
> >
> > Intuitively, this is wrong because this let's the program take a step
> > the abstract machine wouldn't do.  This is different to the sequential
> > code that Peter posted because it uses atomics, and thus one can't
> > easily assume that the difference is not observable.
> 
> Btw, what is the definition of "observable" for the atomics?
> 
> Because I'm hoping that it's not the same as for volatiles, where
> "observable" is about the virtual machine itself, and as such volatile
> accesses cannot be combined or optimized at all.
> 
> Now, I claim that atomic accesses cannot be done speculatively for
> writes, and not re-done for reads (because the value could change),
> but *combining* them would be possible and good.
> 
> For example, we often have multiple independent atomic accesses that
> could certainly be combined: testing the individual bits of an atomic
> value with helper functions, causing things like "load atomic, test
> bit, load same atomic, test another bit". The two atomic loads could
> be done as a single load without possibly changing semantics on a real
> machine, but if "visibility" is defined in the same way it is for
> "volatile", that wouldn't be a valid transformation. Right now we use
> "volatile" semantics for these kinds of things, and they really can
> hurt.
> 
> Same goes for multiple writes (possibly due to setting bits):
> combining multiple accesses into a single one is generally fine, it's
> *adding* write accesses speculatively that is broken by design..
> 
> At the same time, you can't combine atomic loads or stores infinitely
> - "visibility" on a real machine definitely is about timeliness.
> Removing all but the last write when there are multiple consecutive
> writes is generally fine, even if you unroll a loop to generate those
> writes. But if what remains is a loop, it might be a busy-loop
> basically waiting for something, so it would be wrong ("untimely") to
> hoist a store in a loop entirely past the end of the loop, or hoist a
> load in a loop to before the loop.
> 
> Does the standard allow for that kind of behavior?

You asked!  ;-)

So the current standard allows merging of both loads and stores, unless of
course ordring constraints prevent the merging.  Volatile semantics may be
used to prevent this merging, if desired, for example, for real-time code.
Infinite merging is intended to be prohibited, but I am not certain that
the current wording is bullet-proof (1.10p24 and 1.10p25).

The only prohibition against speculative stores that I can see is in a
non-normative note, and it can be argued to apply only to things that are
not atomics (1.10p22).  I don't see any prohibition against reordering
a store to precede a load preceding a conditional branch -- which would
not be speculative if the branch was know to be taken and the load
hit in the store buffer.  In a system where stores could be reordered,
some other CPU might perceive the store as happening before the load
that controlled the conditional branch.  This needs to be addressed.

Why this hole?  At the time, the current formalizations of popular
CPU architectures did not exist, and it was not clear that all popular
hardware avoided speculative stores.  

There is also fun with "out of thin air" values, which everyone agrees
should be prohibited, but where there is not agreement on how to prohibit
them in a mathematically constructive manner.  The current draft contains
a clause simply stating that out-of-thin-air values are prohibited,
which doesn't help someone constructing tools to analyze C++ code.
One proposal requires that subsequent atomic stores never be reordered
before prior atomic loads, which requires useless ordering code to be
emitted on ARM and PowerPC (you may have seen Will Deacon's and Peter
Zijlstra's reaction to this proposal a few days ago).  Note that Itanium
already pays this price in order to provide full single-variable cache
coherence.  This out-of-thin-air discussion is also happening in the
Java community in preparation for a new rev of the Java memory model.

There will also be some discussions on memory_order_consume, which is
intended to (eventually) implement rcu_dereference().  The compiler
writers don't like tracking dependencies, but there may be some ways
of constraining optimizations to preserve the common dependencies that,
while providing some syntax to force preservation of dependencies that
would normally be optimized out.  One example of this is where you have an
RCU-protected array that might sometimes contain only a single element.
In the single-element case, the compiler knows a priori which element
will be used, and will therefore optimize the dependency away, so that
the reader might see pre-initialization state.  But this is rare, so
if added syntax needs to be added in this case, I believe we should be
OK with it.  (If syntax is

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Jan Hubicka

> 
> 
> 
> 
> On Tuesday 11 February 2014 03:25 PM, Renato Golin wrote:
> >Hi Jan,
> >
> >I think this is a very good example where we could all collaborate
> >(including binutils).
> >
> >I'll leave your reply intact, so that Chandler (CC'd) can get a bit
> >more context. I'm copying him because he (and I believe Diego) had
> >more contact with LTO than I had.
> >
> >If I got it right, LTO today:
> >
> >- needs the drivers to explicitly declare the plugin

Yes.
> >- needs the library available somewhere
> >- may have to change the library loading semantics (via LD_PRELOAD)

Not in binutils implementation (I believe it is the case for darwin's libLTO).
With binutils you only need to pass explicit --plugin argument into all
tools that care (ld/ar/nm/ranlib)
> 
> There is another need that I have felt in LTO for quite some time.
> Currently, it has a non-partitioned mode or a partitioned mode but
> this decision is taken before the compilation begins. It would be
> nice to have a mode that allows dynamic loading of function bodies
> so that a flow and context sensitive IPA could load functions bodies
> on demand, and unload them when they are not needed.

I implemented on-demand loading of function bodies into GCC-4.8 if I recall
correctly. Currently I thinko only Martin Liska's code unification pass uses it
to verify that two function bdoes it thinks are equivalent are actually
equivalent. Hopefully it will be merged into 4.10.
> 
> Uday.
> 
> >
> >Since both toolchains do the magic, binutils has no incentive to
> >create any automatic detection of objects.
> >
> >The part that I didn't get is when you say about backward
> >compatibility. Would LTO work on a newer binutils with the liblto but
> >on an older compiler that knew nothing about LTO?
> >
> >Your proposal is, then, to get binutils:
> >
> >- recognizing LTO logic in the objects
> >- automatically loading liblto if recognized
> >- warning if not

I basically think that binutils should have a way for installed compiler to
register a plugin and load all plugins by default (or perhaps for performance
or upon detecking an compatible LTO object file in some way, perhaps also by
information given in the config file) and let them claim the LTO objects they
understand to.

With the backward compatibility I mean that if we release a new version of
compiler that can no longer read the LTO objects of older compiler, one can
just install both versions and have their plugins to claim only LTO objects
they understand. Just if they were two different compilers.

Finally I think we can make binutils to recognize GCC/LLVM LTO objects
as a special case and produce friendly message when user try to handle
them witout plugin as oposed to today strange errors about file formats
or missing symbols.

Honza
> >
> >I'm assuming the extra symbols would be discarded if no library is
> >found, together with the warning, right? Maybe an error if -Wall or
> >whatever.
> >
> >Can we get someone from the binutils community to opine on that?
> >
> >cheers,
> >--renato
> >
> >On 11 February 2014 02:29, Jan Hubicka  wrote:
> >>One practical experience I have with LLVM developers is sharing experiences
> >>about getting Firefox to work with LTO with Rafael Espindola and I think it 
> >>was
> >>useful for both of us. I am definitly open to more discussion.
> >>
> >>Lets try a specific topic that is on my TODO list for some time.
> >>
> >>I would like to make it possible for mutliple compilers to be used to LTO a
> >>single binary. As we are all making LTO more useful, I think it is matter of
> >>time until people will start shipping LTO object files by default and users
> >>will end up feeding them into different compilers or incompatible version of
> >>the same compiler. We probably want to make this work, even thought the
> >>cross-module optimization will not happen in this case.
> >>
> >>The plugin interface in binutils seems to do its job well both for GCC and 
> >>LLVM
> >>and I hope that open64 and ICC will eventually join, too.
> >>
> >>The trouble however is that one needs to pass explicit --plugin argument
> >>specifying the particular plugin to load and so GCC ships with its own 
> >>wrappers
> >>(gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
> >>thing.
> >>
> >>It may be smoother if binutils was able to load multiple plugins at once and
> >>grab plugins from system and user installed compilers without explicit 
> >>--plugin
> >>argument.
> >>
> >>Binutils probably should also have a way to detect LTO object files and 
> >>produce
> >>more useful diagnostic than they do now, when there is no plugin claiming 
> >>them.
> >>
> >>There are some PRs filled on the topic
> >>http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300
> >>http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227
> >>but not much progress on them.
> >>
> >>I wonder if we can get this designed and implemented.
> >>
> >>On the other hand, GCC current maintains non-plugin path for LT

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Uday Khedker






On Tuesday 11 February 2014 09:30 PM, Jan Hubicka wrote:





On Tuesday 11 February 2014 03:25 PM, Renato Golin wrote:

Hi Jan,

I think this is a very good example where we could all collaborate
(including binutils).

I'll leave your reply intact, so that Chandler (CC'd) can get a bit
more context. I'm copying him because he (and I believe Diego) had
more contact with LTO than I had.

If I got it right, LTO today:

- needs the drivers to explicitly declare the plugin


Yes.

- needs the library available somewhere
- may have to change the library loading semantics (via LD_PRELOAD)


Not in binutils implementation (I believe it is the case for darwin's libLTO).
With binutils you only need to pass explicit --plugin argument into all
tools that care (ld/ar/nm/ranlib)


There is another need that I have felt in LTO for quite some time.
Currently, it has a non-partitioned mode or a partitioned mode but
this decision is taken before the compilation begins. It would be
nice to have a mode that allows dynamic loading of function bodies
so that a flow and context sensitive IPA could load functions bodies
on demand, and unload them when they are not needed.


I implemented on-demand loading of function bodies into GCC-4.8 if I recall
correctly. Currently I thinko only Martin Liska's code unification pass uses it
to verify that two function bdoes it thinks are equivalent are actually
equivalent. Hopefully it will be merged into 4.10.


Great. We will experiment with it.

Uday.



Uday.



Since both toolchains do the magic, binutils has no incentive to
create any automatic detection of objects.

The part that I didn't get is when you say about backward
compatibility. Would LTO work on a newer binutils with the liblto but
on an older compiler that knew nothing about LTO?

Your proposal is, then, to get binutils:

- recognizing LTO logic in the objects
- automatically loading liblto if recognized
- warning if not


I basically think that binutils should have a way for installed compiler to
register a plugin and load all plugins by default (or perhaps for performance
or upon detecking an compatible LTO object file in some way, perhaps also by
information given in the config file) and let them claim the LTO objects they
understand to.

With the backward compatibility I mean that if we release a new version of
compiler that can no longer read the LTO objects of older compiler, one can
just install both versions and have their plugins to claim only LTO objects
they understand. Just if they were two different compilers.

Finally I think we can make binutils to recognize GCC/LLVM LTO objects
as a special case and produce friendly message when user try to handle
them witout plugin as oposed to today strange errors about file formats
or missing symbols.

Honza


I'm assuming the extra symbols would be discarded if no library is
found, together with the warning, right? Maybe an error if -Wall or
whatever.

Can we get someone from the binutils community to opine on that?

cheers,
--renato

On 11 February 2014 02:29, Jan Hubicka  wrote:

One practical experience I have with LLVM developers is sharing experiences
about getting Firefox to work with LTO with Rafael Espindola and I think it was
useful for both of us. I am definitly open to more discussion.

Lets try a specific topic that is on my TODO list for some time.

I would like to make it possible for mutliple compilers to be used to LTO a
single binary. As we are all making LTO more useful, I think it is matter of
time until people will start shipping LTO object files by default and users
will end up feeding them into different compilers or incompatible version of
the same compiler. We probably want to make this work, even thought the
cross-module optimization will not happen in this case.

The plugin interface in binutils seems to do its job well both for GCC and LLVM
and I hope that open64 and ICC will eventually join, too.

The trouble however is that one needs to pass explicit --plugin argument
specifying the particular plugin to load and so GCC ships with its own wrappers
(gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar thing.

It may be smoother if binutils was able to load multiple plugins at once and
grab plugins from system and user installed compilers without explicit --plugin
argument.

Binutils probably should also have a way to detect LTO object files and produce
more useful diagnostic than they do now, when there is no plugin claiming them.

There are some PRs filled on the topic
http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300
http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227
but not much progress on them.

I wonder if we can get this designed and implemented.

On the other hand, GCC current maintains non-plugin path for LTO that is now
only used by darwin port due to lack of plugin enabled LD there.  It seems
that liblto used by darwin is losely compatible with the plugin API, but it 
makes
it harder to have different

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Renato Golin

On 11 February 2014 16:00, Jan Hubicka  wrote:
> I basically think that binutils should have a way for installed compiler to
> register a plugin and load all plugins by default (or perhaps for performance
> or upon detecking an compatible LTO object file in some way, perhaps also by
> information given in the config file) and let them claim the LTO objects they
> understand to.

Right, so this would be not necessarily related to LTO, but with the
binutils plugin system. In my very limited experience with LTO and
binutils, I can't see how that would be different from just adding a
--plugin option on the compiler, unless it's something that the linker
would detect automatically without the interference of any compiler.


> With the backward compatibility I mean that if we release a new version of
> compiler that can no longer read the LTO objects of older compiler, one can
> just install both versions and have their plugins to claim only LTO objects
> they understand. Just if they were two different compilers.

Yes, this makes total sense.


> Finally I think we can make binutils to recognize GCC/LLVM LTO objects
> as a special case and produce friendly message when user try to handle
> them witout plugin as oposed to today strange errors about file formats
> or missing symbols.

Yes, that as well seems pretty obvious, and mostly orthogonal to the
other two proposals.

cheers,
--renato

PS: Removing Chandler, as he was not the right person to look at this.
I'll check with others in the LLVM list to chime in on this thread.

Re: i370 port

2014-02-11 Thread Paul Edwards


Hello all.

I have previously succeeded in getting configure to
work for gcc 3.4.6. Unfortunately gcc 3.4.6 is too
buggy to use and needs to wait for Dave Pitts or
someone to fix.

gcc 3.2.3 has no known bugs for the i370 target,
but it has not been done using "configure".

I am now trying to get gcc 3.2.3 to build via configure
using the same technique I used for gcc 3.4.6.

Some differences I found so far are as follows:

I needed to define the size of short etc which I
didn't need to do with 3.4.6:

export ac_cv_func_strncmp_works=yes
export ac_cv_c_bigendian=yes
export ac_cv_c_compile_endian=big-endian
export ac_cv_sizeof_short=2
export ac_cv_sizeof_int=4
export ac_cv_sizeof_long=4
export ac_cv_c_float_format='IBM 370 hex'

And "make", after this configure:

./configure --build=x86_64-unknown-linux-gnu --host=i370-mvspdp --target=i370-mvspdp 
--prefix=/devel/mvshost --enable-languages=c --disable-nls


is failing here:

make[2]: Leaving directory 
`/home/users/k/ke/kerravon86/devel/gcc/x86_64-unknown

-linux-gnu/libiberty'
rm -f *~ Makefile config.status xhost-mkfrag TAGS multilib.out
rm -f config.log
rmdir testsuite 2>/dev/null
make[1]: [distclean] Error 1 (ignored)
make[1]: Leaving directory 
`/home/users/k/ke/kerravon86/devel/gcc/x86_64-unknown

-linux-gnu/libiberty'
loading cache ../config.cache
configure: error: can not find install-sh or install.sh in ./.. ././..
make: *** [configure-build-libiberty] Error 1

The file in question seems to exist:

~/devel/gcc>find . -name install-sh
./boehm-gc/install-sh
./install-sh
./fastjar/install-sh
~/devel/gcc>find . -name install.sh
~/devel/gcc>

and is executable.

Any suggestions?

Thanks. Paul.

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Renato Golin

Now copying Rafael, which can give us some more insight on the LLVM LTO side.

cheers,
--renato

On 11 February 2014 09:55, Renato Golin  wrote:
> Hi Jan,
>
> I think this is a very good example where we could all collaborate
> (including binutils).
>
> I'll leave your reply intact, so that Chandler (CC'd) can get a bit
> more context. I'm copying him because he (and I believe Diego) had
> more contact with LTO than I had.
>
> If I got it right, LTO today:
>
> - needs the drivers to explicitly declare the plugin
> - needs the library available somewhere
> - may have to change the library loading semantics (via LD_PRELOAD)
>
> Since both toolchains do the magic, binutils has no incentive to
> create any automatic detection of objects.
>
> The part that I didn't get is when you say about backward
> compatibility. Would LTO work on a newer binutils with the liblto but
> on an older compiler that knew nothing about LTO?
>
> Your proposal is, then, to get binutils:
>
> - recognizing LTO logic in the objects
> - automatically loading liblto if recognized
> - warning if not
>
> I'm assuming the extra symbols would be discarded if no library is
> found, together with the warning, right? Maybe an error if -Wall or
> whatever.
>
> Can we get someone from the binutils community to opine on that?
>
> cheers,
> --renato
>
> On 11 February 2014 02:29, Jan Hubicka  wrote:
>> One practical experience I have with LLVM developers is sharing experiences
>> about getting Firefox to work with LTO with Rafael Espindola and I think it 
>> was
>> useful for both of us. I am definitly open to more discussion.
>>
>> Lets try a specific topic that is on my TODO list for some time.
>>
>> I would like to make it possible for mutliple compilers to be used to LTO a
>> single binary. As we are all making LTO more useful, I think it is matter of
>> time until people will start shipping LTO object files by default and users
>> will end up feeding them into different compilers or incompatible version of
>> the same compiler. We probably want to make this work, even thought the
>> cross-module optimization will not happen in this case.
>>
>> The plugin interface in binutils seems to do its job well both for GCC and 
>> LLVM
>> and I hope that open64 and ICC will eventually join, too.
>>
>> The trouble however is that one needs to pass explicit --plugin argument
>> specifying the particular plugin to load and so GCC ships with its own 
>> wrappers
>> (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
>> thing.
>>
>> It may be smoother if binutils was able to load multiple plugins at once and
>> grab plugins from system and user installed compilers without explicit 
>> --plugin
>> argument.
>>
>> Binutils probably should also have a way to detect LTO object files and 
>> produce
>> more useful diagnostic than they do now, when there is no plugin claiming 
>> them.
>>
>> There are some PRs filled on the topic
>> http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=15300
>> http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=13227
>> but not much progress on them.
>>
>> I wonder if we can get this designed and implemented.
>>
>> On the other hand, GCC current maintains non-plugin path for LTO that is now
>> only used by darwin port due to lack of plugin enabled LD there.  It seems
>> that liblto used by darwin is losely compatible with the plugin API, but it 
>> makes
>> it harder to have different compilers share it (one has to LD_PRELOAD liblto
>> to different one prior executing the linker?)
>>
>> I wonder, is there chance to implement linker plugin API to libLTO glue or 
>> add
>> plugin support to native Darwin tools?
>>
>> Honza

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Rafael Espíndola

On 11 February 2014 12:28, Renato Golin  wrote:
> Now copying Rafael, which can give us some more insight on the LLVM LTO side.

Thanks.

> On 11 February 2014 09:55, Renato Golin  wrote:
>> Hi Jan,
>>
>> I think this is a very good example where we could all collaborate
>> (including binutils).

It is. Both LTO models (LLVM and GCC) were considered form the start
of the API design and I think we got a better plugin model as a
result.

>> If I got it right, LTO today:
>>
>> - needs the drivers to explicitly declare the plugin
>> - needs the library available somewhere

True.

>> - may have to change the library loading semantics (via LD_PRELOAD)

That depends on the library being loaded. RPATH works just fine too.

>> Since both toolchains do the magic, binutils has no incentive to
>> create any automatic detection of objects.

It is mostly a historical decision. At the time the design was for the
plugin to be matched to the compiler, and so the compiler could pass
that information down to the linker.

> The trouble however is that one needs to pass explicit --plugin argument
> specifying the particular plugin to load and so GCC ships with its own 
> wrappers
> (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
> thing.

These wrappers should not be necessary. While the linker currently
requires a command line option, bfd has support for searching for a
plugin. It will search /lib/bfd-plugin. See for example the
instructions at http://llvm.org/docs/GoldPlugin.html.

This was done because ar and nm are not normally bound to any
compiler. Had we realized this issue earlier we would probably have
supported searching for plugins in the linker too.

So it seems that what you want could be done by

* having bfd-ld and gold search bfd-plugins (maybe rename the directory?)
* support loading multiple plugins, and asking each to see if it
supports a given file. That ways we could LTO when having a part GCC
and part LLVM build.
* maybe be smart about version and load new ones first? (libLLVM-3.4
before libLLVM-3.3 for example). Probably the first one should always
be the one given in the command line.

For OS X the situation is a bit different. There instead of a plugin
the linker loads a library: libLTO.dylib. When doing LTO with a newer
llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting
that from clang some time ago, but I don't remember the outcome.

In theory GCC could implement a libLTO.dylib and set
DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a
API mapping the other way, so the job would be inverting it. The LTO
model ld64 is a bit more strict about knowing all symbol definitions
and uses (including inline asm), so there would be work to be done to
cover that, but the simple cases shouldn't be too hard.

Cheers,
Rafael

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Markus Trippelsdorf

On 2014.02.11 at 13:02 -0500, Rafael Espíndola wrote:
> On 11 February 2014 12:28, Renato Golin  wrote:
> > Now copying Rafael, which can give us some more insight on the LLVM LTO 
> > side.
> 
> Thanks.
> 
> > On 11 February 2014 09:55, Renato Golin  wrote:
> >> Hi Jan,
> >>
> >> I think this is a very good example where we could all collaborate
> >> (including binutils).
> 
> It is. Both LTO models (LLVM and GCC) were considered form the start
> of the API design and I think we got a better plugin model as a
> result.
> 
> >> If I got it right, LTO today:
> >>
> >> - needs the drivers to explicitly declare the plugin
> >> - needs the library available somewhere
> 
> True.
> 
> >> - may have to change the library loading semantics (via LD_PRELOAD)
> 
> That depends on the library being loaded. RPATH works just fine too.
> 
> >> Since both toolchains do the magic, binutils has no incentive to
> >> create any automatic detection of objects.
> 
> It is mostly a historical decision. At the time the design was for the
> plugin to be matched to the compiler, and so the compiler could pass
> that information down to the linker.
> 
> > The trouble however is that one needs to pass explicit --plugin argument
> > specifying the particular plugin to load and so GCC ships with its own 
> > wrappers
> > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
> > thing.
> 
> These wrappers should not be necessary. While the linker currently
> requires a command line option, bfd has support for searching for a
> plugin. It will search /lib/bfd-plugin. See for example the
> instructions at http://llvm.org/docs/GoldPlugin.html.

Please note that this automatic loading of the plugin only happens for
non-ELF files. So the LLVM GoldPlugin gets loaded fine, but automatic
loading of gcc's liblto_plugin.so doesn't work at the moment.

A basic implementation to support both plugins seamlessly should be
pretty straightforward, because LLVM's bitstream file format (non-ELF)
is easily distinguishable from gcc's output (standard ELF with special
sections).

-- 
Markus

Zero-cost toolchain "standardization" process

2014-02-11 Thread Renato Golin

Hi Folks,

First of all, I'd like to thank everyone for their great responses and
heart warming encouragement for such an enterprise. This will be my
last email about this subject on these lists, so I'd like to just let
everyone know what (and where) I'll be heading next with this topic.
Feel free to reply to me personally, I don't want to span an ugly
two-list thread.

As many of you noted, not everyone is actively interested in this, and
for good reasons. The last thing we want is yet-another standard
getting in the way of actually implementing features and innovating,
which both LLVM and GCC are good at. Following the comments on the GCC
list, slashdot and Phoronix forums, I think the only sensible thing is
to do what everyone said we should: talk.

Also, just this week, we got GCC developers having patches accepted in
LLVM (sanitizers) and LLVM developers discussing LTO strategies on the
GCC list. Both interactions have already shown need for improvements
on both sides. This is a *really* good start!

The proposal, then, is to have a zero-cost process, where only the
interested parties need to take action. A reactive system where
standards are agreed *after* implementation.

1. A new feature request / implementation on one of the toolchains
outlines what's being done on a shared place. Basically, copy and past
from the active thread's summary into a shared area.

2. Interested parties (pre-registered) get a reminder that new content
is available. From here, two things can happen:

  2.1 The other toolchain has the feature, in which case developers should:
2.1.1 Agree that this is, indeed, the way they have implemented,
and check the box: "standard agreed".
2.1.2 Disagree on the implementation and describe what they've done instead.

  2.2 The other toolchain doesn't have it:
2.2.1 Agree with the implementation and mark as "standard agreed"
and "future work".
2.2.2 Disagree on the implementation and mark as "to discuss".

On both disagreement cases, pre-registered developers of both
toolchains would receive emails outlining the conflict and they can
discuss as much as they want until a common ground is decided, or not.
It's perfectly fine to "agree to disagree", when no "common standard"
is reached.

Some important notes:

* No toolchain should be forced to accommodate to the standard, but
would be good to *at least* describe what they do instead, so that
users don't get surprised.
* No toolchain should be forced to keep with the agreed standard, and
discussions to migrate to a better implementation would naturally
happen on a cross-toolchain forum.
* No toolchain should be forced to implement a feature just because
the other toolchain did. It's perfectly fine to never implement it, if
the need never arises.
* No developer should be forced to follow the emails or even care
about the process. Other developers on their own communities should,
if necessary, enforce their own standards, on their own pace, that it
could, or not, agree with the shared one.

What is that different than doing nothing?

First, and most important, it will log our cross-toolchain actions.
Today, we only have two good examples of cross-interactions, and
neither are visible from the other side. When (if) we start having
more, it'd be good to be able to search through them, or contribute to
them on an ad-hoc basis, if a new feature is proposed. We'll have a
documented way of the non-standard things that we're doing, before
they go on other standards, too.

Second, it'll expose what we already have as "standard", and enable a
common channel for like-minded people to solve problems on both
toolchains. It'll also off-load both lists of having to follow any
development, but will still have a way for those interested to discuss
and agree to a common standard.

Finally, once this "database" of implementation details is big enough,
it would be easy to spot the conflicts, and it'll serve as a good TODO
list for commoning up implementation details, or even for future
compilers to choose one or the other. Entire projects, or thesis could
be written based on them, fostering more innovation in the process.

What now?

Well, people really interested in building such a system should (for
now) email me directly. If I get enough feedback, we can start
discussing in private (or another list) about how we're going to
progress.

During the brainstorm phase, or if not enough people are interested, I
still think we shouldn't stop talking. The interaction is already
happening and it's really good, I think we should just continue and
see where this takes us. Maybe by the GNU Cauldron, enough people
would want to contribute, maybe later, maybe never. Whatever works!

To be honest, I'm already really happy with the outcome, so for me, my
target was achieved!

I will report what happens during the next few months on the GCC+LLVM
BoF, so if you're at least mildly interested, please do attend. For
those looking for a few more answers to all the

Re: [buildrobot] spu / avr: Fallout from r207335

2014-02-11 Thread Jan-Benedict Glaw

Hi Marek,

On Sun, 2014-02-02 23:59:16 +0100, Jan-Benedict Glaw  wrote:
> Hi Marek,
> 
> it seems your patch produced some fallout for
> 
> avr: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=111296
> spu-elf: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=111360

Current build logs:

avr: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=132848
spu: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=135159

> Just some missed calls:
> 
> g++ -c  -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g -O2 -DIN_GCC  
> -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
> -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
> -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
> -I/home/jbglaw/repos/gcc/gcc/../include 
> -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-I. -I. 
> -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
> -I/home/jbglaw/repos/gcc/gcc/../include 
> -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libbacktrace   
> /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c
> /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c: In function ‘tree_node* 
> avr_resolve_overloaded_builtin(unsigned int, tree_node*, void*)’:
> /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c:118: error: conversion from 
> ‘tree_node*’ to non-scalar type ‘vec’ requested
> /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c:184: error: conversion from 
> ‘tree_node*’ to non-scalar type ‘vec’ requested
> /home/jbglaw/repos/gcc/gcc/config/avr/avr-c.c:241: error: conversion from 
> ‘tree_node*’ to non-scalar type ‘vec’ requested
> make[1]: *** [avr-c.o] Error 1
> make[1]: Leaving directory `/home/jbglaw/build/avr/build-gcc/gcc'
> make: *** [all-gcc] Error 2
> 
> 
> g++ -c  -DIN_GCC_FRONTEND -DIN_GCC_FRONTEND -g -O2 -DIN_GCC  
> -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions -fno-rtti 
> -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
> -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros 
> -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. 
> -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
> -I/home/jbglaw/repos/gcc/gcc/../include 
> -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-I. -I. 
> -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
> -I/home/jbglaw/repos/gcc/gcc/../include 
> -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
> -I/home/jbglaw/repos/gcc/gcc/../libbacktrace   \
> /home/jbglaw/repos/gcc/gcc/config/spu/spu-c.c
> /home/jbglaw/repos/gcc/gcc/config/spu/spu-c.c: In function ‘tree_node* 
> spu_resolve_overloaded_builtin(location_t, tree_node*, void*)’:
> /home/jbglaw/repos/gcc/gcc/config/spu/spu-c.c:184: error: conversion from 
> ‘tree_node*’ to non-scalar type ‘vec’ requested
> make[1]: *** [spu-c.o] Error 1
> make[1]: Leaving directory `/home/jbglaw/build/spu-elf/build-gcc/gcc'
> make: *** [all-gcc] Error 2

This isn't fixed up to now. Do you intend to work on the fallout?

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
 Signature of:  http://perl.plover.com/Questions.html
 the second  :


signature.asc
Description: Digital signature

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Jan Hubicka

> On 2014.02.11 at 13:02 -0500, Rafael Espíndola wrote:
> > On 11 February 2014 12:28, Renato Golin  wrote:
> > > Now copying Rafael, which can give us some more insight on the LLVM LTO 
> > > side.
> > 
> > Thanks.
> > 
> > > On 11 February 2014 09:55, Renato Golin  wrote:
> > >> Hi Jan,
> > >>
> > >> I think this is a very good example where we could all collaborate
> > >> (including binutils).
> > 
> > It is. Both LTO models (LLVM and GCC) were considered form the start
> > of the API design and I think we got a better plugin model as a
> > result.
> > 
> > >> If I got it right, LTO today:
> > >>
> > >> - needs the drivers to explicitly declare the plugin
> > >> - needs the library available somewhere
> > 
> > True.
> > 
> > >> - may have to change the library loading semantics (via LD_PRELOAD)
> > 
> > That depends on the library being loaded. RPATH works just fine too.
> > 
> > >> Since both toolchains do the magic, binutils has no incentive to
> > >> create any automatic detection of objects.
> > 
> > It is mostly a historical decision. At the time the design was for the
> > plugin to be matched to the compiler, and so the compiler could pass
> > that information down to the linker.
> > 
> > > The trouble however is that one needs to pass explicit --plugin argument
> > > specifying the particular plugin to load and so GCC ships with its own 
> > > wrappers
> > > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
> > > thing.
> > 
> > These wrappers should not be necessary. While the linker currently
> > requires a command line option, bfd has support for searching for a
> > plugin. It will search /lib/bfd-plugin. See for example the
> > instructions at http://llvm.org/docs/GoldPlugin.html.
> 
> Please note that this automatic loading of the plugin only happens for
> non-ELF files. So the LLVM GoldPlugin gets loaded fine, but automatic
> loading of gcc's liblto_plugin.so doesn't work at the moment.

Hmm, something that ought to be fixed.  Binutils can probably know about GCC's
LTO symbols it uses as a distniguisher.  Is there a PR about this?
> 
> A basic implementation to support both plugins seamlessly should be
> pretty straightforward, because LLVM's bitstream file format (non-ELF)
> is easily distinguishable from gcc's output (standard ELF with special
> sections).

I think it is easy even with two plugins for same file format - all ld need is
to load the plugins and then do the file claiming for each of them.
GCC plugin then should not claim files from LLVM or incompatible GCC version
and vice versa.

Honza
> 
> -- 
> Markus

Re: Conditional execution over emit_move_insn

2014-02-11 Thread Wojciech Migda

Hello,

> I'd like to hardcode conditional execution of emit_move_insn based on the 
> predicate checking that the address in the destination argument is non-NULL.
> The platform supports conditional execution, but doesn't have explicitly 
> defined conditional moves (target=tic6x).
> I have already tried to find any look-alike pieces in the gcc code tree but 
> without success - I am new here.
> As for the background - I am trying to work around the bug I submitted 
> (id=60123) before there's an official patch for it available.
 
I have figured this out on my own. Please disregard.

With best,

Wojciech

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Jan Hubicka

> >> Since both toolchains do the magic, binutils has no incentive to
> >> create any automatic detection of objects.
> 
> It is mostly a historical decision. At the time the design was for the
> plugin to be matched to the compiler, and so the compiler could pass
> that information down to the linker.
> 
> > The trouble however is that one needs to pass explicit --plugin argument
> > specifying the particular plugin to load and so GCC ships with its own 
> > wrappers
> > (gcc-nm/gcc-ld/gcc-ar and the gcc driver itself) while LLVM does similar 
> > thing.
> 
> These wrappers should not be necessary. While the linker currently
> requires a command line option, bfd has support for searching for a
> plugin. It will search /lib/bfd-plugin. See for example the
> instructions at http://llvm.org/docs/GoldPlugin.html.

My reading of bfd/plugin.c is that it basically walks the directory and looks
for first plugin that returns OK for onload. (that is always the case for
GCC/LLVM plugins).  So if I instlal GCC and llvm plugin there it will
depend who will end up being first and only that plugin will be used.

We need multiple plugin support as suggested by the directory name ;)

Also it sems that currently plugin is not used if file is ELF for ar/nm/ranlib
(as mentioned by Markus) and also GNU-ld seems to choke on LLVM object files
even if it has plugin.

This probably needs ot be sanitized.

> 
> This was done because ar and nm are not normally bound to any
> compiler. Had we realized this issue earlier we would probably have
> supported searching for plugins in the linker too.
> 
> So it seems that what you want could be done by
> 
> * having bfd-ld and gold search bfd-plugins (maybe rename the directory?)
> * support loading multiple plugins, and asking each to see if it
> supports a given file. That ways we could LTO when having a part GCC
> and part LLVM build.

Yes, that is what I have in mind.

Plus perhaps additional configuration file to avoid loading everything.  Say
user instealls 3 versions of LLVM, open64 and ICC. If all of them loads as a
shared library, like LLVM does, it will probably slow down the tools
measurably.

> * maybe be smart about version and load new ones first? (libLLVM-3.4
> before libLLVM-3.3 for example). Probably the first one should always
> be the one given in the command line.

Yes, i think we may want to prioritize the list.  So user can prevail
his own version of GCC over the system one, for example.
> 
> For OS X the situation is a bit different. There instead of a plugin
> the linker loads a library: libLTO.dylib. When doing LTO with a newer
> llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting
> that from clang some time ago, but I don't remember the outcome.
> 
> In theory GCC could implement a libLTO.dylib and set
> DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a
> API mapping the other way, so the job would be inverting it. The LTO
> model ld64 is a bit more strict about knowing all symbol definitions
> and uses (including inline asm), so there would be work to be done to
> cover that, but the simple cases shouldn't be too hard.

I would not care that much about symbols in asm definitions to start with.
Even if we will force users to non-LTO those object files, it would be an
improvement over what we have now.

One problem is that we need a volunteer to implement the reverse glue
(libLTO->plugin API), since I do not have an OS X box (well, have an old G5,
but even that is quite far from me right now)

Why complete symbol tables are required? Can't ld64 be changed to ignore
unresolved symbols in the first stage just like gold/gnu-ld does?

Honza
> 
> Cheers,
> Rafael

Re: Fwd: LLVM collaboration?

2014-02-11 Thread Rafael Espíndola

> My reading of bfd/plugin.c is that it basically walks the directory and looks
> for first plugin that returns OK for onload. (that is always the case for
> GCC/LLVM plugins).  So if I instlal GCC and llvm plugin there it will
> depend who will end up being first and only that plugin will be used.
>
> We need multiple plugin support as suggested by the directory name ;)
>
> Also it sems that currently plugin is not used if file is ELF for ar/nm/ranlib
> (as mentioned by Markus) and also GNU-ld seems to choke on LLVM object files
> even if it has plugin.
>
> This probably needs ot be sanitized.

CCing Hal Finkel. He got this to work some time ago. Not sure if he
ever ported the patches to bfd trunk.

>> For OS X the situation is a bit different. There instead of a plugin
>> the linker loads a library: libLTO.dylib. When doing LTO with a newer
>> llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed setting
>> that from clang some time ago, but I don't remember the outcome.
>>
>> In theory GCC could implement a libLTO.dylib and set
>> DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically a
>> API mapping the other way, so the job would be inverting it. The LTO
>> model ld64 is a bit more strict about knowing all symbol definitions
>> and uses (including inline asm), so there would be work to be done to
>> cover that, but the simple cases shouldn't be too hard.
>
> I would not care that much about symbols in asm definitions to start with.
> Even if we will force users to non-LTO those object files, it would be an
> improvement over what we have now.
>
> One problem is that we need a volunteer to implement the reverse glue
> (libLTO->plugin API), since I do not have an OS X box (well, have an old G5,
> but even that is quite far from me right now)
>
> Why complete symbol tables are required? Can't ld64 be changed to ignore
> unresolved symbols in the first stage just like gold/gnu-ld does?

I am not sure about this. My *guess* is that it does dead stripping
computation before asking libLTO for the object file. I noticed the
issue while trying to LTO firefox some time ago.

Cheers,
Rafael

Re: LLVM collaboration?

2014-02-11 Thread Hal Finkel

- Original Message -
> From: "Rafael Espíndola" 
> To: "Jan Hubicka" 
> Cc: "Renato Golin" , "gcc" , "Hal 
> Finkel" 
> Sent: Tuesday, February 11, 2014 3:38:40 PM
> Subject: Re: Fwd: LLVM collaboration?
> 
> > My reading of bfd/plugin.c is that it basically walks the directory
> > and looks
> > for first plugin that returns OK for onload. (that is always the
> > case for
> > GCC/LLVM plugins).  So if I instlal GCC and llvm plugin there it
> > will
> > depend who will end up being first and only that plugin will be
> > used.
> >
> > We need multiple plugin support as suggested by the directory name
> > ;)
> >
> > Also it sems that currently plugin is not used if file is ELF for
> > ar/nm/ranlib
> > (as mentioned by Markus) and also GNU-ld seems to choke on LLVM
> > object files
> > even if it has plugin.
> >
> > This probably needs ot be sanitized.
> 
> CCing Hal Finkel. He got this to work some time ago. Not sure if he
> ever ported the patches to bfd trunk.

I have a patch for binutils 2.24 (attached -- I think this works, I hand 
isolated it from my BG/Q patchset). I would not consider it to be of upstream 
quality, but I'd obviously appreciate any assistance on making everything clean 
and proper ;)

 -Hal

> 
> >> For OS X the situation is a bit different. There instead of a
> >> plugin
> >> the linker loads a library: libLTO.dylib. When doing LTO with a
> >> newer
> >> llvm, one needs to set DYLD_LIBRARY_PATH. I think I proposed
> >> setting
> >> that from clang some time ago, but I don't remember the outcome.
> >>
> >> In theory GCC could implement a libLTO.dylib and set
> >> DYLD_LIBRARY_PATH. The gold/bfd plugin that LLVM uses is basically
> >> a
> >> API mapping the other way, so the job would be inverting it. The
> >> LTO
> >> model ld64 is a bit more strict about knowing all symbol
> >> definitions
> >> and uses (including inline asm), so there would be work to be done
> >> to
> >> cover that, but the simple cases shouldn't be too hard.
> >
> > I would not care that much about symbols in asm definitions to
> > start with.
> > Even if we will force users to non-LTO those object files, it would
> > be an
> > improvement over what we have now.
> >
> > One problem is that we need a volunteer to implement the reverse
> > glue
> > (libLTO->plugin API), since I do not have an OS X box (well, have
> > an old G5,
> > but even that is quite far from me right now)
> >
> > Why complete symbol tables are required? Can't ld64 be changed to
> > ignore
> > unresolved symbols in the first stage just like gold/gnu-ld does?
> 
> I am not sure about this. My *guess* is that it does dead stripping
> computation before asking libLTO for the object file. I noticed the
> issue while trying to LTO firefox some time ago.
> 
> Cheers,
> Rafael
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
diff --git a/bfd/elflink.c b/bfd/elflink.c
index 99b7ca1..c2bf9c3 100644
--- a/bfd/elflink.c
+++ b/bfd/elflink.c
@@ -5054,7 +5054,9 @@ elf_link_add_archive_symbols (bfd *abfd, struct bfd_link_info *info)
 	goto error_return;
 
 	  if (! bfd_check_format (element, bfd_object))
-	goto error_return;
+	/* goto error_return; */
+/* this might be an object understood only by an LTO plugin */
+bfd_elf_make_object (element);
 
 	  /* Doublecheck that we have not included this object
 	 already--it should be impossible, but there may be
diff --git a/ld/ldfile.c b/ld/ldfile.c
index 16baef8..159a60c 100644
--- a/ld/ldfile.c
+++ b/ld/ldfile.c
@@ -38,6 +38,7 @@
 #ifdef ENABLE_PLUGINS
 #include "plugin-api.h"
 #include "plugin.h"
+#include "elf-bfd.h"
 #endif /* ENABLE_PLUGINS */
 
 bfd_boolean  ldfile_assumed_script = FALSE;
@@ -124,6 +125,7 @@ bfd_boolean
 ldfile_try_open_bfd (const char *attempt,
 		 lang_input_statement_type *entry)
 {
+  int is_obj = 0;
   entry->the_bfd = bfd_openr (attempt, entry->target);
 
   if (verbose)
@@ -168,6 +170,34 @@ ldfile_try_open_bfd (const char *attempt,
 	{
 	  if (! bfd_check_format (check, bfd_object))
 	{
+#ifdef ENABLE_PLUGINS
+	  if (check == entry->the_bfd
+		  && bfd_get_error () == bfd_error_file_not_recognized
+		  && ! ldemul_unrecognized_file (entry))
+		{
+  if (plugin_active_plugins_p ()
+  && !no_more_claiming)
+{
+  int fd = open (attempt, O_RDONLY | O_BINARY);
+  if (fd >= 0)
+{
+  struct ld_plugin_input_file file;
+
+  bfd_elf_make_object (entry->the_bfd);
+ 
+  file.name = attempt;
+  file.offset = 0;
+  file.filesize = lseek (fd, 0, SEEK_END);
+  file.fd = fd;
+  plugin_maybe_claim (&file, entry);
+
+  if (entry->flags.claimed)
+return

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-11 Thread Torvald Riegel

On Sun, 2014-02-09 at 19:51 -0800, Paul E. McKenney wrote:
> On Mon, Feb 10, 2014 at 01:06:48AM +0100, Torvald Riegel wrote:
> > On Thu, 2014-02-06 at 20:20 -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote:
> > > > On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote:
> > > > > On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote:
> > > > > > On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote:
> > > > > > > On Thu, Feb 06, 2014 at 06:59:10PM +, Will Deacon wrote:
> > > > > > > > There are also so many ways to blow your head off it's untrue. 
> > > > > > > > For example,
> > > > > > > > cmpxchg takes a separate memory model parameter for failure and 
> > > > > > > > success, but
> > > > > > > > then there are restrictions on the sets you can use for each. 
> > > > > > > > It's not hard
> > > > > > > > to find well-known memory-ordering experts shouting "Just use
> > > > > > > > memory_model_seq_cst for everything, it's too hard otherwise". 
> > > > > > > > Then there's
> > > > > > > > the fun of load-consume vs load-acquire (arm64 GCC completely 
> > > > > > > > ignores consume
> > > > > > > > atm and optimises all of the data dependencies away) as well as 
> > > > > > > > the definition
> > > > > > > > of "data races", which seem to be used as an excuse to 
> > > > > > > > miscompile a program
> > > > > > > > at the earliest opportunity.
> > > > > > > 
> > > > > > > Trust me, rcu_dereference() is not going to be defined in terms of
> > > > > > > memory_order_consume until the compilers implement it both 
> > > > > > > correctly and
> > > > > > > efficiently.  They are not there yet, and there is currently no 
> > > > > > > shortage
> > > > > > > of compiler writers who would prefer to ignore 
> > > > > > > memory_order_consume.
> > > > > > 
> > > > > > Do you have any input on
> > > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448?  In particular, 
> > > > > > the
> > > > > > language standard's definition of dependencies?
> > > > > 
> > > > > Let's see...  1.10p9 says that a dependency must be carried unless:
> > > > > 
> > > > > — B is an invocation of any specialization of std::kill_dependency 
> > > > > (29.3), or
> > > > > — A is the left operand of a built-in logical AND (&&, see 5.14) or 
> > > > > logical OR (||, see 5.15) operator,
> > > > > or
> > > > > — A is the left operand of a conditional (?:, see 5.16) operator, or
> > > > > — A is the left operand of the built-in comma (,) operator (5.18);
> > > > > 
> > > > > So the use of "flag" before the "?" is ignored.  But the "flag - flag"
> > > > > after the "?" will carry a dependency, so the code fragment in 59448
> > > > > needs to do the ordering rather than just optimizing "flag - flag" out
> > > > > of existence.  One way to do that on both ARM and Power is to actually
> > > > > emit code for "flag - flag", but there are a number of other ways to
> > > > > make that work.
> > > > 
> > > > And that's what would concern me, considering that these requirements
> > > > seem to be able to creep out easily.  Also, whereas the other atomics
> > > > just constrain compilers wrt. reordering across atomic accesses or
> > > > changes to the atomic accesses themselves, the dependencies are new
> > > > requirements on pieces of otherwise non-synchronizing code.  The latter
> > > > seems far more involved to me.
> > > 
> > > Well, the wording of 1.10p9 is pretty explicit on this point.
> > > There are only a few exceptions to the rule that dependencies from
> > > memory_order_consume loads must be tracked.  And to your point about
> > > requirements being placed on pieces of otherwise non-synchronizing code,
> > > we already have that with plain old load acquire and store release --
> > > both of these put ordering constraints that affect the surrounding
> > > non-synchronizing code.
> > 
> > I think there's a significant difference.  With acquire/release or more
> > general memory orders, it's true that we can't order _across_ the atomic
> > access.  However, we can reorder and optimize without additional
> > constraints if we do not reorder.  This is not the case with consume
> > memory order, as the (p + flag - flag) example shows.
> 
> Agreed, memory_order_consume does introduce additional restrictions.
> 
> > > This issue got a lot of discussion, and the compromise is that
> > > dependencies cannot leak into or out of functions unless the relevant
> > > parameters or return values are annotated with [[carries_dependency]].
> > > This means that the compiler can see all the places where dependencies
> > > must be tracked.  This is described in 7.6.4.
> > 
> > I wasn't aware of 7.6.4 (but it isn't referred to as an additional
> > constraint--what it is--in 1.10, so I guess at least that should be
> > fixed).
> > Also, AFAIU, 7.6.4p3 is wrong in that the attribute does make a semantic
> > difference, at least if one is assuming that normal optimization of
> > sequential c

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-11 Thread Torvald Riegel

On Mon, 2014-02-10 at 11:09 -0800, Linus Torvalds wrote:
> On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel  wrote:
> >
> > Intuitively, this is wrong because this let's the program take a step
> > the abstract machine wouldn't do.  This is different to the sequential
> > code that Peter posted because it uses atomics, and thus one can't
> > easily assume that the difference is not observable.
> 
> Btw, what is the definition of "observable" for the atomics?
> 
> Because I'm hoping that it's not the same as for volatiles, where
> "observable" is about the virtual machine itself, and as such volatile
> accesses cannot be combined or optimized at all.

No, atomics aren't an observable behavior of the abstract machine
(unless they are volatile).  See 1.8.p8 (citing the C++ standard).

> Now, I claim that atomic accesses cannot be done speculatively for
> writes, and not re-done for reads (because the value could change),

Agreed, unless the compiler can prove that this doesn't make a
difference in the program at hand and it's not volatile atomics.  In
general, that will be hard and thus won't happen often I suppose, but if
correctly proved it would fall under the as-if rule I think.

> but *combining* them would be possible and good.

Agreed.

> For example, we often have multiple independent atomic accesses that
> could certainly be combined: testing the individual bits of an atomic
> value with helper functions, causing things like "load atomic, test
> bit, load same atomic, test another bit". The two atomic loads could
> be done as a single load without possibly changing semantics on a real
> machine, but if "visibility" is defined in the same way it is for
> "volatile", that wouldn't be a valid transformation. Right now we use
> "volatile" semantics for these kinds of things, and they really can
> hurt.

Agreed.  In your example, the compiler would have to prove that the
abstract machine would always be able to run the two loads atomically
(ie, as one load) without running into impossible/disallowed behavior of
the program.  But if there's no loop or branch or such in-between, this
should be straight-forward because any hardware oddity or similar could
merge those loads and it wouldn't be disallowed by the standard
(considering that we're talking about a finite number of loads), so the
compiler would be allowed to do it as well.

> Same goes for multiple writes (possibly due to setting bits):
> combining multiple accesses into a single one is generally fine, it's
> *adding* write accesses speculatively that is broken by design..

Agreed.  As Paul points out, this being correct assumes that there are
no other ordering guarantees or memory accesses "interfering", but if
the stores are to the same memory location and adjacent to each other in
the program, then I don't see a reason why they wouldn't be combinable.

> At the same time, you can't combine atomic loads or stores infinitely
> - "visibility" on a real machine definitely is about timeliness.
> Removing all but the last write when there are multiple consecutive
> writes is generally fine, even if you unroll a loop to generate those
> writes. But if what remains is a loop, it might be a busy-loop
> basically waiting for something, so it would be wrong ("untimely") to
> hoist a store in a loop entirely past the end of the loop, or hoist a
> load in a loop to before the loop.

Agreed.  That's what 1.10p24 and 1.10p25 are meant to specify for loads,
although those might not be bullet-proof as Paul points out.  Forward
progress is rather vaguely specified in the standard, but at least parts
of the committee (and people in ISO C++ SG1, in particular) are working
on trying to improve this.

> Does the standard allow for that kind of behavior?

I think the standard requires (or intends to require) the behavior that
you (and I) seem to prefer in these examples.

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-11 Thread Torvald Riegel

On Tue, 2014-02-11 at 07:59 -0800, Paul E. McKenney wrote:
> On Mon, Feb 10, 2014 at 11:09:24AM -0800, Linus Torvalds wrote:
> > On Sun, Feb 9, 2014 at 4:27 PM, Torvald Riegel  wrote:
> > >
> > > Intuitively, this is wrong because this let's the program take a step
> > > the abstract machine wouldn't do.  This is different to the sequential
> > > code that Peter posted because it uses atomics, and thus one can't
> > > easily assume that the difference is not observable.
> > 
> > Btw, what is the definition of "observable" for the atomics?
> > 
> > Because I'm hoping that it's not the same as for volatiles, where
> > "observable" is about the virtual machine itself, and as such volatile
> > accesses cannot be combined or optimized at all.
> > 
> > Now, I claim that atomic accesses cannot be done speculatively for
> > writes, and not re-done for reads (because the value could change),
> > but *combining* them would be possible and good.
> > 
> > For example, we often have multiple independent atomic accesses that
> > could certainly be combined: testing the individual bits of an atomic
> > value with helper functions, causing things like "load atomic, test
> > bit, load same atomic, test another bit". The two atomic loads could
> > be done as a single load without possibly changing semantics on a real
> > machine, but if "visibility" is defined in the same way it is for
> > "volatile", that wouldn't be a valid transformation. Right now we use
> > "volatile" semantics for these kinds of things, and they really can
> > hurt.
> > 
> > Same goes for multiple writes (possibly due to setting bits):
> > combining multiple accesses into a single one is generally fine, it's
> > *adding* write accesses speculatively that is broken by design..
> > 
> > At the same time, you can't combine atomic loads or stores infinitely
> > - "visibility" on a real machine definitely is about timeliness.
> > Removing all but the last write when there are multiple consecutive
> > writes is generally fine, even if you unroll a loop to generate those
> > writes. But if what remains is a loop, it might be a busy-loop
> > basically waiting for something, so it would be wrong ("untimely") to
> > hoist a store in a loop entirely past the end of the loop, or hoist a
> > load in a loop to before the loop.
> > 
> > Does the standard allow for that kind of behavior?
> 
> You asked!  ;-)
> 
> So the current standard allows merging of both loads and stores, unless of
> course ordring constraints prevent the merging.  Volatile semantics may be
> used to prevent this merging, if desired, for example, for real-time code.

Agreed.

> Infinite merging is intended to be prohibited, but I am not certain that
> the current wording is bullet-proof (1.10p24 and 1.10p25).

Yeah, maybe not.  But it at least seems to rather clearly indicate the
intent ;)

> The only prohibition against speculative stores that I can see is in a
> non-normative note, and it can be argued to apply only to things that are
> not atomics (1.10p22).

I think this one is specifically about speculative stores that would
affect memory locations that the abstract machine would not write to,
and that might be observable or create data races.  While a compiler
could potentially prove that such stores aren't leading to a difference
in the behavior of the program (e.g., by proving that there are no
observers anywhere and this isn't overlapping with any volatile
locations), I think that this is hard in general and most compilers will
just not do such things.  In GCC, bugs in that category were fixed after
researchers doing fuzz-testing found them (IIRC, speculative stores by
loops).

> I don't see any prohibition against reordering
> a store to precede a load preceding a conditional branch -- which would
> not be speculative if the branch was know to be taken and the load
> hit in the store buffer.  In a system where stores could be reordered,
> some other CPU might perceive the store as happening before the load
> that controlled the conditional branch.  This needs to be addressed.

I don't know the specifics of your example, but from how I understand
it, I don't see a problem if the compiler can prove that the store will
always happen.

To be more specific, if the compiler can prove that the store will
happen anyway, and the region of code can be assumed to always run
atomically (e.g., there's no loop or such in there), then it is known
that we have one atomic region of code that will always perform the
store, so we might as well do the stuff in the region in some order.

Now, if any of the memory accesses are atomic, then the whole region of
code containing those accesses is often not atomic because other threads
might observe intermediate results in a data-race-free way.

(I know that this isn't a very precise formulation, but I hope it brings
my line of reasoning across.)

> Why this hole?  At the time, the current formalizations of popular
> CPU architectures did not exist, and it was n

sparse overlapping structs for vectorization

2014-02-11 Thread Albert Cahalan

I had a problem that got solved in an ugly way. I think gcc ought
to provide a few ways to make a nicer solution.

There was an array of structs roughly like so:

struct{int w;float x;char y[4];short z[2];}foo[512][4];

The types within the struct are 4 bytes each; I don't actually
remember anything else and it doesn't matter except that they
are distinct. I think it was bitfields actually, neatly grouped
into groups of 32 bits. In other words, like 4 4-byte values
but with more-or-less incompatible types.

Note that 4 of the structs neatly fill a 64-byte cache line.
An alignment attribute was used to ensure 64-byte alignment.

The most common operation needed on this array is to compare
the first struct member of 4 of the structs against a given
value, looking to see if there is a match. SSE would be good.
This would then be followed by using the matching entry if
there is one, else picking one of the 4 to recycle and thus use.

First bad solution:

One could load up 4 SSE registers, shuffle things around... NO.

Second bad solution:

One could simply have 4 distinct arrays. This is bad because
there are different cache lines for w, x, y, and z.

Third bad solution:

The array can be viewed as "int foo[512][4][4]" instead, with
the struct forming the third array index. Note that the last two
array indexes are both 4, so you can kind of swap them around.
This groups 4 fields of each type together, allowing SSE. The
problem here is loss of type safety; one must use array indexes
instead of struct field names. Like so: foo[idx][WHERE_W_IS][i]

Fourth bad solution:

We lay things out as in the third solution, but we cast pointers
to effectively lay sparse structs over each other like shingles.
{
int w;
int pad_wx[3];
float x;
int pad_xy[3];
char y[4];
int pad_yz[3];
short z[2];
}
Performance is hurt by the need for __may_alias__ and of course
the result is painful to look at. We went with this anyway, using
SSE intrinsics, and performance was great. Maintainability... not
so much.

BTW, an array of 512 structs containing 4-entry arrays was not used
because we wanted to have a simple normal pointer to indicate the
item being operated on. We didn't want to need a pointer,index pair.

Can something be done to help out here? The first thing that pops
into mind is the ability to tell gcc that the struct-to-struct
byte offset for array indexing is a user-specified value instead
of simply the struct size.

It's possible we could have safely ignored the warning about aliasing.
I don't know. Perhaps that would give even better performance, but
the casting would still be very ugly.

Solutions that that be defined away for non-gcc compilers are better.

Re: [MIPS] Avoiding FP operations/register usage

Re: Fwd: LLVM collaboration?

RE: [MIPS] Avoiding FP operations/register usage

Re: Fwd: LLVM collaboration?

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: Fwd: LLVM collaboration?

Re: Fwd: LLVM collaboration?

Re: Fwd: LLVM collaboration?

Re: i370 port

Re: Fwd: LLVM collaboration?

Re: Fwd: LLVM collaboration?

Re: Fwd: LLVM collaboration?

Zero-cost toolchain "standardization" process

Re: [buildrobot] spu / avr: Fallout from r207335

Re: Fwd: LLVM collaboration?

Re: Conditional execution over emit_move_insn

Re: Fwd: LLVM collaboration?

Re: Fwd: LLVM collaboration?

Re: LLVM collaboration?

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

Re: [RFC][PATCH 0/5] arch: atomic rework

sparse overlapping structs for vectorization

23 matches

Site Navigation

Mail list logo

Footer information