Dream of being healthier?

2008-06-05 Thread wrenchn

Bring more pleasure to your xlife! http://pdo.



Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Rafael Espindola
>> In ELF you have to think about symbol overriding.  Let's say you link
>> a.o b.o c.o.  a.o has a reference to symbol S.  b.o has a strong
>> definition.  c.o has a weak definition.  a.o and c.o have LTO
>> information, b.o does not.  ELF requires that a.o call the symbol from
>> b.o, not the symbol from c.o.  I don't see how to make that work with
>> the LLVM interface.
>
> This does work.  There are two parts to it.  First the linker's master
> symbol
> table sees the strong definition of S in b.o and the weak in c.o and
> decides to use the strong one from b.o.  Second (because of that) the linker
> calls  lto_codegen_add_must_preserve_symbol("S"). The LTO engine then
> sees it has a weak global function S and it cannot inline those.  Put
> together
> the LTO engine does generate a copy of S, but the linker throws it away
> and uses the one from b.o.

Interesting. The use of lto_codegen_add_must_preserve_symbol is kind
of the opposite of what I had understood. What do you do in this case:

a.o: IL file that contains a reference to "f"
b.o: IL file that has a weak def of "f"

There is no strong definition. Can you inline f into the use in a.o?

> -Nick
>
>

Cheers,
-- 
Rafael Avila de Espindola

Google Ireland Ltd.
Gordon House
Barrow Street
Dublin 4
Ireland

Registered in Dublin, Ireland
Registration Number: 368047


Re: Is this a typo in setup_incoming_varargs_64?

2008-06-05 Thread Jan Hubicka
> Hi,
> 
> setup_incoming_varargs_64 in i386.c has
> 
>   /* Compute address to jump to :
>  label - 5*eax + nnamed_sse_arguments*5  */
> 
> The comments don't match the code. Shout the comments be
> 
>  /* Compute address to jump to :
>  label - 4*eax + nnamed_sse_arguments*4  */

Yes, this is most likely type caused by originally using different
register than eax that resulted in different length of encoding.
Thanks for noticing it!
Honza
> 
> Thanks.
> 
> -- 
> H.J.


Re: [lto] Streaming out language-specific DECL/TYPEs

2008-06-05 Thread Jan Hubicka
> Jan Hubicka wrote:
> 
> >Sure if it works, we should be lowering the types during gimplification
> >so we don't need to store all this in memory...
> >But C++ FE still use its local data later in stuff like thunks, but we
> >will need to cgraphize them anyway.
> 
> I agree.  The only use of language-specific DECLs and TYPEs after 
> gimplification should be for generating debug information.  And if 
> that's already been done, then you shouldn't need it at all.

For LTO with debug info we will probably need some frontend neutral
debug info representaiton in longer run, since optimization modifying
the data types and such will need to compensate.

We can translate stuff to in-memory dwarf and update it but that would
limit amount of debug info format we will want to support probably.

Honza
> 
> -- 
> Mark Mitchell
> CodeSourcery
> [EMAIL PROTECTED]
> (650) 331-3385 x713


How to build on AMD64/Debian under x86 32bits chroot?

2008-06-05 Thread Basile STARYNKEVITCH

Hello All

As (I imagine) many developers I have a 64 bits machine - running Debian 
(Sid) Linux AMD64.


I want to test my MELT branch on x86 (32 bits). So I set up (using 
debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32) 
which has most of the *-dev packages installed.


In this chroot-ed environment I am able to compile several software 
without issues. For example, I just compiled there the PPL.


The point is that even after schroot the uname system call (& the uname 
command) still return x86_64 as the machine. I suppose there is no easy 
trick to circumvent this.



I thought that
   ../configure  --build=x86-linux --target=x86-linux --host=x86-linux
(with other MELT specific options) should be enough, but apparently not; 
make fails with


checking for struct tms... yes
checking for clock_t... yes
checking for .preinit_array/.init_array/.fini_array support... yes
checking if mkdir takes one argument... no
*** Configuration x86-unknown-linux-gnu not supported
make[1]: *** [configure-gcc] Error 1
make[1]: Leaving directory `/usr/src/Lang/_MeltObj32'

and gcc/config.log does indeed show

hostname = glinka
uname -m = x86_64
uname -r = 2.6.24-1-amd64
uname -s = Linux
uname -v = #1 SMP Fri Apr 18 23:08:22 UTC 2008

/usr/bin/uname -p = unknown
/bin/uname -X = unknown

/bin/arch  = unknown
/usr/bin/arch -k   = unknown
/usr/convex/getsysinfo = unknown
/usr/bin/hostinfo  = unknown
/bin/machine   = unknown
/usr/bin/oslevel   = unknown
/bin/universe  = unknown

Any hints are welcome. If possible, I would like to avoid to have to 
install a virtual machine...



Regards
--
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***


Re: How to build on AMD64/Debian under x86 32bits chroot?

2008-06-05 Thread Andrew Haley
Basile STARYNKEVITCH wrote:
> Hello All
> 
> As (I imagine) many developers I have a 64 bits machine - running Debian
> (Sid) Linux AMD64.
> 
> I want to test my MELT branch on x86 (32 bits). So I set up (using
> debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32)
> which has most of the *-dev packages installed.
> 
> In this chroot-ed environment I am able to compile several software
> without issues. For example, I just compiled there the PPL.
> 
> The point is that even after schroot the uname system call (& the uname
> command) still return x86_64 as the machine. I suppose there is no easy
> trick to circumvent this.
> 
> 
> I thought that
>../configure  --build=x86-linux --target=x86-linux --host=x86-linux
> (with other MELT specific options) should be enough, but apparently not;
> make fails with

--target=i386-linux

Andrew.


Re: How to build on AMD64/Debian under x86 32bits chroot?

2008-06-05 Thread Richard Guenther
On Thu, Jun 5, 2008 at 3:14 PM, Basile STARYNKEVITCH
<[EMAIL PROTECTED]> wrote:
> Hello All
>
> As (I imagine) many developers I have a 64 bits machine - running Debian
> (Sid) Linux AMD64.
>
> I want to test my MELT branch on x86 (32 bits). So I set up (using
> debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32)
> which has most of the *-dev packages installed.
>
> In this chroot-ed environment I am able to compile several software without
> issues. For example, I just compiled there the PPL.
>
> The point is that even after schroot the uname system call (& the uname
> command) still return x86_64 as the machine. I suppose there is no easy
> trick to circumvent this.

Usually there is a command called 'linux32' which fixes this.

>
> I thought that
>   ../configure  --build=x86-linux --target=x86-linux --host=x86-linux
> (with other MELT specific options) should be enough, but apparently not;
> make fails with

and it is i686-pc-linux-gnu, x86-linux is not a valid target triplet.

Richard.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Ian Lance Taylor
"Rafael Espindola" <[EMAIL PROTECTED]> writes:

> Interesting. The use of lto_codegen_add_must_preserve_symbol is kind
> of the opposite of what I had understood. What do you do in this case:
>
> a.o: IL file that contains a reference to "f"
> b.o: IL file that has a weak def of "f"
>
> There is no strong definition. Can you inline f into the use in a.o?

I don't know what LLVM does, but in principle, in ELF, you can do this
inlining when linking an executable, but not when linking a shared
library.  Actually, when linking a shared library, what matters is not
whether the definition of "f" is weak or not, but what the visibility
of 'f" is (default, hidden, protected, or internal).  And, of course,
the visibility of "f" can be set by link-time options (e.g.,
-Bsymbolic).

Ian


Re: [whopr] plugin interface design

2008-06-05 Thread Ian Lance Taylor
Chris Lattner <[EMAIL PROTECTED]> writes:

> I don't know how closely your plans follow this model.  If you think
> this approach is reasonable, you really do need to reflect things like
> symbol versions in your IR somehow.  This compiler must know about
> versions, and when it does, it is easy to avoid optimizations that are
> invalid for them.

Sure.  But here's the thing: the gcc LTO approach involves having a
regular object with a regular symbol table, and the IR is embedded in
the object.  In other words, we do know the symbol version
information: it's in the symbol table of the object.  And so what I'm
discussing is a way for the linker to communicate the relevant part of
that information to the compiler plugin.  The relevant part is: "this
undefined symbol reference in a.o is bound to this symbol definition
in b.o."  There is nothing else that the compiler needs to know.
(Actually, when we move on to applying LTO across shared library
boundaries we may also want to say something about the strength of the
binding.)

I appreciate the cleanliness and simplicity of your description.  I'm
trying to fill in an ugly edge.  The reality is that symbol versions
are expressed via assembly language pseudo-ops, both in C/C++ files
and in assembly code, and also via version scripts passed to the
linker.  To the limited extent that the compiler needs to be aware of
them, the linker needs to convey that information.  If we decree that
the information must be expressed directly in the compiler IR, then I
think we're looking at a considerably larger degree of ugliness.

Ian


RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-05 Thread H.J. Lu
Hi,

x86-64 psABI defines

typedef struct
{
  unsigned int gp_offset;
  unsigned int fp_offset;
  void *overflow_arg_area;
  void *reg_save_area;
} va_list[1];

for variable argument list. "va_list" is used to access variable argument
list:

void
bar (const char *format, va_list ap)
{
  if (va_arg (ap, int) != 0)
abort ();
}

void
foo(char *fmt, ...)
{
  va_list ap;
  va_start (fmt, ap);
  bar (fmt, ap);
  va_end (ap);
}

foo and bar may be compiled with different compilers. We have to keep
the current layout for va_list so that we can mix va_list codes compiled
with AVX and non-AVX compilers. We need to extend the variable argument
handling in the x86-64 psABI to support passing __m256/__m256d/__m256i
on the variable argument list. We propose 2 ways to extend the register
save area to add 256bit AVX registers support:

1. Extend the register save area to put upper 128bit at the end.
  Pros:
Aligned access.
Save stack space if 256bit registers are used.
  Cons
Split access. Require more split access beyond 256bit.

2. Extend the register save area to put full 265bit YMMs at the end.
The first DWORD after the register save area has the offset of
the extended array for YMM registers. The next DWORD has the
element size of the extended array. Unaligned access will be used.
  Pros:
No split access.
Easily extendable beyond 256bit.
Limited unaligned access penalty if stack is aligned at 32byte.
  Cons:
May require store both the lower 128bit and full 256bit register
content. We may avoid saving the lower 128bit if correct type
is required when accessing variable argument list, similar to int
vs. double.
Waste 272 byte on stack when 256bit registers are used.
Unaligned load and store.

We should agree on one approach to ensure compatibility between
different compilers.

Personally, I prefer #2 for its simplicity. Does anyone else have a
preference?

Thanks.

-- 
H.J.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-05 Thread Richard Guenther
On Thu, Jun 5, 2008 at 4:31 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
> Hi,
>
> x86-64 psABI defines
>
> typedef struct
> {
>  unsigned int gp_offset;
>  unsigned int fp_offset;
>  void *overflow_arg_area;
>  void *reg_save_area;
> } va_list[1];
>
> for variable argument list. "va_list" is used to access variable argument
> list:
>
> void
> bar (const char *format, va_list ap)
> {
>  if (va_arg (ap, int) != 0)
>abort ();
> }
>
> void
> foo(char *fmt, ...)
> {
>  va_list ap;
>  va_start (fmt, ap);
>  bar (fmt, ap);
>  va_end (ap);
> }
>
> foo and bar may be compiled with different compilers. We have to keep
> the current layout for va_list so that we can mix va_list codes compiled
> with AVX and non-AVX compilers. We need to extend the variable argument
> handling in the x86-64 psABI to support passing __m256/__m256d/__m256i
> on the variable argument list. We propose 2 ways to extend the register
> save area to add 256bit AVX registers support:
>
> 1. Extend the register save area to put upper 128bit at the end.
>  Pros:
>Aligned access.
>Save stack space if 256bit registers are used.
>  Cons
>Split access. Require more split access beyond 256bit.
>
> 2. Extend the register save area to put full 265bit YMMs at the end.
> The first DWORD after the register save area has the offset of
> the extended array for YMM registers. The next DWORD has the
> element size of the extended array. Unaligned access will be used.
>  Pros:
>No split access.
>Easily extendable beyond 256bit.
>Limited unaligned access penalty if stack is aligned at 32byte.
>  Cons:
>May require store both the lower 128bit and full 256bit register
>content. We may avoid saving the lower 128bit if correct type
>is required when accessing variable argument list, similar to int
>vs. double.
>Waste 272 byte on stack when 256bit registers are used.
>Unaligned load and store.
>
> We should agree on one approach to ensure compatibility between
> different compilers.
>
> Personally, I prefer #2 for its simplicity. Does anyone else have a
> preference?

If you want to mix AVX and non-AVX code then you need a way to
detect if AVX information was saved at runtime.  What is it in those
both cases?

If you don't want to mix AVX and non-AVX code then basically you
can declare the ABIs incompatible anyway?

There is also a third option of passing AVX values by reference.

For simplicity I would also prefer 2) - after all we don't need to fill
in the XMM area / the AVX area if the value is unused.

Richard.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Jan Hubicka
Hi,
I am jumping in somewhat late, as yesterday I was on meetings without
internet access. (and I probably will be offline again tomorrow)

I think that in basic terms we all mostly agree (we want to implement
optimization scheme that does not get everything into memory, we want to
parallelize the post-IPA copmilation).  Linker interface seems very fine
too.
> 
> WHOPR simply adds another alternative, if you are willing to only run
> summary-based transformations, we can split the analysis and
> transformation phases in two such that you can parallelize the work
> over a cluster or a large SMP.  That's it.  Nothing more.

I think one problem is that both repackaging and cherry picking as
described is very centric about application on inlining.  It is probably
quite clear now, that the list of optimizations we want to perform on
LTO scale is going to grow from basic inlining + aliasing combo quite
soon.  Especially that datastructure changes are starting to kick in.
We also would need to sanely support partial offlining, clonning, etc.

This IMO should be somehow considered.  It is quite possible to
implement all this based on summaries, but we need to think of
flexibility of the whole scheme and not overly limit it at least in the
current stages of implementation.  If, for example, we would end up with
difficulties to do struct-reorg style transformation that mvoes fields
within structure, we would run into problems very soon.

I personally always leaned to kind of repackaging scheme.  I've hoped
that with sanely designed LTO dumping scheme, this will be relatively
straighforward to implement: simply you re-use same serialized functions
as they are in the original .o files and replace function summaries by
transformation summaries, so we might pretty much re-use same
infrastructure.   With sane caching mechanizm to keeping unmodified
function bodies in memory in cooperation in GGC, the repackaging stage
should be possible to implement as simple pass through the callgraph
writting the selected functions to the output file.

One advantage also is that local but non-trivial changes to program can
be done at LTO decision time that would simplify the inter-IPA-pass
iteraction that seems the most scary issue here.

Honza


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-05 Thread Jan Hubicka
> 
> 1. Extend the register save area to put upper 128bit at the end.
>   Pros:
> Aligned access.
> Save stack space if 256bit registers are used.
>   Cons
> Split access. Require more split access beyond 256bit.
> 
> 2. Extend the register save area to put full 265bit YMMs at the end.
> The first DWORD after the register save area has the offset of
> the extended array for YMM registers. The next DWORD has the
> element size of the extended array. Unaligned access will be used.
>   Pros:
> No split access.
> Easily extendable beyond 256bit.
> Limited unaligned access penalty if stack is aligned at 32byte.
>   Cons:
> May require store both the lower 128bit and full 256bit register
> content. We may avoid saving the lower 128bit if correct type
> is required when accessing variable argument list, similar to int
> vs. double.
> Waste 272 byte on stack when 256bit registers are used.
> Unaligned load and store.
> 
> We should agree on one approach to ensure compatibility between
> different compilers.

This is something that definitly should be hanlded by ABI update.

We probably need to also somehow update the way to specify what to save
to varargs prologue.  Otherwise if you would have YMM aware printf
running on non-AVX hardware, we would end up with invalid instructions.

At the moment, eax is required to specify number of XMM registers, we
probably can extend it to have number of XMM registers in AL and YMM in
AH.

I personally don't have much preferences over 1. or 2.. 1. seems
relatively easy to implement too, or is packaging two 128bit values to
single 256bit difficult in va_arg expansion?

Honza
> 
> Personally, I prefer #2 for its simplicity. Does anyone else have a
> preference?
> 
> Thanks.
> 
> -- 
> H.J.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Diego Novillo
On Thu, Jun 5, 2008 at 11:09, Jan Hubicka <[EMAIL PROTECTED]> wrote:

> I think one problem is that both repackaging and cherry picking as
> described is very centric about application on inlining.

No, that's simply the main application for the initial implementation.
 Any other summary-based transformation can be supported the same way.
 Optimizations that are not summary-based can be done the way they're
done today.  All that happens is that they won't be able take
advantage of the partitioning and distribution since WPA and LTRANS
will be executed together.

And of course, even summary-based transformations can be done the same
way they are done today.  The scaling aspects of WHOPR should only
kick in via a special option, or even via heuristics.

> I personally always leaned to kind of repackaging scheme.  I've hoped
> that with sanely designed LTO dumping scheme, this will be relatively
> straighforward to implement: simply you re-use same serialized functions
> as they are in the original .o files and replace function summaries by
> transformation summaries, so we might pretty much re-use same
> infrastructure.   With sane caching mechanizm to keeping unmodified
> function bodies in memory in cooperation in GGC, the repackaging stage
> should be possible to implement as simple pass through the callgraph
> writting the selected functions to the output file.

Sure.  All this is possible and we shouldn't break it.


Diego.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-05 Thread H.J. Lu
On Thu, Jun 5, 2008 at 7:49 AM, Richard Guenther
<[EMAIL PROTECTED]> wrote:
> On Thu, Jun 5, 2008 at 4:31 PM, H.J. Lu <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> x86-64 psABI defines
>>
>> typedef struct
>> {
>>  unsigned int gp_offset;
>>  unsigned int fp_offset;
>>  void *overflow_arg_area;
>>  void *reg_save_area;
>> } va_list[1];
>>
>> for variable argument list. "va_list" is used to access variable argument
>> list:
>>
>> void
>> bar (const char *format, va_list ap)
>> {
>>  if (va_arg (ap, int) != 0)
>>abort ();
>> }
>>
>> void
>> foo(char *fmt, ...)
>> {
>>  va_list ap;
>>  va_start (fmt, ap);
>>  bar (fmt, ap);
>>  va_end (ap);
>> }
>>
>> foo and bar may be compiled with different compilers. We have to keep
>> the current layout for va_list so that we can mix va_list codes compiled
>> with AVX and non-AVX compilers. We need to extend the variable argument
>> handling in the x86-64 psABI to support passing __m256/__m256d/__m256i
>> on the variable argument list. We propose 2 ways to extend the register
>> save area to add 256bit AVX registers support:
>>
>> 1. Extend the register save area to put upper 128bit at the end.
>>  Pros:
>>Aligned access.
>>Save stack space if 256bit registers are used.
>>  Cons
>>Split access. Require more split access beyond 256bit.
>>
>> 2. Extend the register save area to put full 265bit YMMs at the end.
>> The first DWORD after the register save area has the offset of
>> the extended array for YMM registers. The next DWORD has the
>> element size of the extended array. Unaligned access will be used.
>>  Pros:
>>No split access.
>>Easily extendable beyond 256bit.
>>Limited unaligned access penalty if stack is aligned at 32byte.
>>  Cons:
>>May require store both the lower 128bit and full 256bit register
>>content. We may avoid saving the lower 128bit if correct type
>>is required when accessing variable argument list, similar to int
>>vs. double.
>>Waste 272 byte on stack when 256bit registers are used.
>>Unaligned load and store.
>>
>> We should agree on one approach to ensure compatibility between
>> different compilers.
>>
>> Personally, I prefer #2 for its simplicity. Does anyone else have a
>> preference?
>
> If you want to mix AVX and non-AVX code then you need a way to
> detect if AVX information was saved at runtime.  What is it in those
> both cases?
>
> If you don't want to mix AVX and non-AVX code then basically you
> can declare the ABIs incompatible anyway?

We want to extend the psABI in such a way that we can link
AVX enabled code to call vfprintf in glibc which is compiled
with the older compiler and doesn't use YMM registers.
That is if bar, in the example above, doesn't use YMM
registers, it can be compiled by any compilers. bar doesn't
need to know if YMM  registers are used in caller at all.
All necessary information for YMM registers are specified
in the psABI. If  a compiler doesn't use YMM registers,
it  doesn't have to do anything.

>
> There is also a third option of passing AVX values by reference.
>
> For simplicity I would also prefer 2) - after all we don't need to fill
> in the XMM area / the AVX area if the value is unused.
>

That is what I believe.

Thanks.


-- 
H.J.


Re: RFC: Extend x86-64 psABI for 256bit AVX register

2008-06-05 Thread H.J. Lu
On Thu, Jun 5, 2008 at 8:15 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote:
>>
>> 1. Extend the register save area to put upper 128bit at the end.
>>   Pros:
>> Aligned access.
>> Save stack space if 256bit registers are used.
>>   Cons
>> Split access. Require more split access beyond 256bit.
>>
>> 2. Extend the register save area to put full 265bit YMMs at the end.
>> The first DWORD after the register save area has the offset of
>> the extended array for YMM registers. The next DWORD has the
>> element size of the extended array. Unaligned access will be used.
>>   Pros:
>> No split access.
>> Easily extendable beyond 256bit.
>> Limited unaligned access penalty if stack is aligned at 32byte.
>>   Cons:
>> May require store both the lower 128bit and full 256bit register
>> content. We may avoid saving the lower 128bit if correct type
>> is required when accessing variable argument list, similar to int
>> vs. double.
>> Waste 272 byte on stack when 256bit registers are used.
>> Unaligned load and store.
>>
>> We should agree on one approach to ensure compatibility between
>> different compilers.
>
> This is something that definitly should be hanlded by ABI update.
>
> We probably need to also somehow update the way to specify what to save
> to varargs prologue.  Otherwise if you would have YMM aware printf

Yes, but I believe that is compiler specific. Different compilers may
have different approaches for varargs prologue, as long as they follow
the psABI.

> running on non-AVX hardware, we would end up with invalid instructions.

That is nothing new. The same applies to SSE on ia32. Basically, you
shouldn't call YMM aware printf on non-AVX hardware.  You can have
/lib64/avx/libc.so.6 if necessary.

>
> At the moment, eax is required to specify number of XMM registers, we
> probably can extend it to have number of XMM registers in AL and YMM in
> AH.

ymm0 and xmm0 are the same register. xmm0 is the lower 128bit
of xmm0. I am not sure if we need separate XMM registers from
YMM registers.

>
> I personally don't have much preferences over 1. or 2.. 1. seems
> relatively easy to implement too, or is packaging two 128bit values to
> single 256bit difficult in va_arg expansion?
>

Access to 256bit register as lower and upper 128bits needs 2
instructions. For store

vmovaps   %xmm7, -143(%rax)
vextractf128 $1, %ymm7, -15(%rax)

For load

vmovaps  -143(%rax),%xmm7
vinsert128 $1, -15(%rax),%ymm7,%ymm7

If we go beyond 256bit, we need more instructions to access
the full register. For 512bit, it will be split into lower 128bit,
middle 128bit and upper 256bit. 1024bit will have 4 parts.

For #2, only one instruction will be needed for 256bit and
beyond.

Thanks.


-- 
H.J.


Re: [whopr] plugin interface design

2008-06-05 Thread Chris Lattner


On Jun 5, 2008, at 6:51 AM, Ian Lance Taylor wrote:


Chris Lattner <[EMAIL PROTECTED]> writes:


I don't know how closely your plans follow this model.  If you think
this approach is reasonable, you really do need to reflect things  
like

symbol versions in your IR somehow.  This compiler must know about
versions, and when it does, it is easy to avoid optimizations that  
are

invalid for them.


Sure.  But here's the thing: the gcc LTO approach involves having a
regular object with a regular symbol table, and the IR is embedded in
the object.  In other words, we do know the symbol version
information: it's in the symbol table of the object.


Wow, that seems incredibly limiting.  This means that your LTO either  
has to:


1) treat the object header as part of the IR, or
2) avoid making any changes that would affect exported symbols

Is that right?  Why doesn't the "LTO reader" just read the symbol info  
from the ELF header and reflect it into the trees somehow?


-Chris


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Chris Lattner


On Jun 5, 2008, at 6:59 AM, Ian Lance Taylor wrote:


"Rafael Espindola" <[EMAIL PROTECTED]> writes:


Interesting. The use of lto_codegen_add_must_preserve_symbol is kind
of the opposite of what I had understood. What do you do in this  
case:


a.o: IL file that contains a reference to "f"
b.o: IL file that has a weak def of "f"

There is no strong definition. Can you inline f into the use in a.o?


I don't know what LLVM does, but in principle, in ELF, you can do this
inlining when linking an executable, but not when linking a shared
library.  Actually, when linking a shared library, what matters is not
whether the definition of "f" is weak or not, but what the visibility
of 'f" is (default, hidden, protected, or internal).  And, of course,
the visibility of "f" can be set by link-time options (e.g.,
-Bsymbolic).


In LLVM LTO, the model is that the linker is the one that knows about  
visibility.  The problem is that 'hidden' is not sufficient to capture  
visibility info when mixing LTO modules with native ones.  If you  
have: [a-c].c and compile [ab].c with LTO and c.c without, any hidden  
symbols should be visible outside the [ab].o LTO region.


LLVM LTO handles this by marking symbols "internal" (aka static, aka  
not TREE_PUBLIC, whatever) when the symbol is not visible outside the  
LTO scope.  This allows the optimizers to go crazy and hack away at  
the symbols, but only when safe.


'Weakness' only matters when a symbol is exported from the LTO scope,  
so 'weak' and 'visibility' are orthogonal.


-Chris

 


Re: Question regarding C++ frontend

2008-06-05 Thread Peter Collingbourne
On Sat, May 03, 2008 at 08:29:25AM -0400, Doug Gregor wrote:
> INNERMOST_TEMPLATE_ARGS can be used to get at the "innermost" TREE_VEC
> of template arguments for a class template specialzation such as
> foo::bar. CLASSTYPE_USE_TEMPLATE != 0 tells you whether a
> RECORD_TYPE is actually a template

Doug,

Thank you for your response and sorry for the delay.  Unfortunately
CLASSTYPE_USE_TEMPLATE does not seem to have this property when the
non-template is an inner class of a template.  For example, the
record_type t pertaining to a class outer::inner_noargs :

(gdb) pt
 
no-binfo use_template=1 interface-unknown
chain >
(gdb) print t->type.lang_specific->u.c.use_template
$4 = 1

Thanks,
-- 
Peter


signature.asc
Description: Digital signature


Re: [whopr] plugin interface design

2008-06-05 Thread Ian Lance Taylor
Chris Lattner <[EMAIL PROTECTED]> writes:

> On Jun 5, 2008, at 6:51 AM, Ian Lance Taylor wrote:
>
>> Chris Lattner <[EMAIL PROTECTED]> writes:
>>
>>> I don't know how closely your plans follow this model.  If you think
>>> this approach is reasonable, you really do need to reflect things
>>> like
>>> symbol versions in your IR somehow.  This compiler must know about
>>> versions, and when it does, it is easy to avoid optimizations that
>>> are
>>> invalid for them.
>>
>> Sure.  But here's the thing: the gcc LTO approach involves having a
>> regular object with a regular symbol table, and the IR is embedded in
>> the object.  In other words, we do know the symbol version
>> information: it's in the symbol table of the object.
>
> Wow, that seems incredibly limiting.  This means that your LTO either
> has to:
>
> 1) treat the object header as part of the IR, or
> 2) avoid making any changes that would affect exported symbols
>
> Is that right?  Why doesn't the "LTO reader" just read the symbol info  
> from the ELF header and reflect it into the trees somehow?

That would be fine.  It would require teaching the compiler about
symbol versioning and resolution rules which the linker already knows.
I sort of think that is unnecessary.  But I'm not opposed to it.

Of course there is the issue that some of this information also comes
from linker command line options.  That also has to be fed into the IR.

For example, earlier Nick suggested that LLVM will not inline a weak
symbol.  With ELF it is actually OK to inline a weak symbol when
generating an executable.  It is not OK when generating a shared
library, unless -Bsymbolic was used on the linker command line.  We
could represent these sorts of details directly in the compiler IR.
But I don't see a big advantage to doing so.

I'm proposing, instead, that the linker inform the compiler plugin
about this information based on link-time information.  That is a way
of representing it in the IR, of course.  But it seems to me to be
somewhat more pragmatic.

Incidentally, your choice 2 above doesn't follow.  The LTO compiler is
going to pass a new object file(s) back to the linker.  It doesn't
have to have the same set of exported symbols, except in cases where
the linker has directed that some symbol must be available.

Ian


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Ian Lance Taylor
Chris Lattner <[EMAIL PROTECTED]> writes:

> LLVM LTO handles this by marking symbols "internal" (aka static, aka
> not TREE_PUBLIC, whatever) when the symbol is not visible outside the
> LTO scope.  This allows the optimizers to go crazy and hack away at
> the symbols, but only when safe.

How does the linker do this?  Are you saying that when generating a
shared library, the linker calls lto_codegen_add_must_preserve_symbol
for every externally visible symbol?

How does the linker tell LTO that a symbol may be inlined, but must
also be externally visible?

Ian


Re: Development process for i386 machine descriptions

2008-06-05 Thread Uros Bizjak

Hello!

1.) The processor_costs structure seems very limited, but seem very 
easily to "fill in" but are these costs supposed to be best or worst 
case? For instance, many instructions with different sized operands 
vary in latency.


Instruction costs are further refined in config/i386.c, ix86_rtx_costs 
and the cost for various operand types is determined in several *_cost 
functions, scattered around i386.c file.


2.) I don't understand the meaning of the stringop_algs, scalar, 
vector, and branching costs at the end of the processor_cost 
structure. Could someone give me an accurate description?


stringop_algs is a structure that defines various algorithms for string 
processing functions (memcpy, memset, ...). This structure also defines 
size thresholds for various algorithms.


The costs at the end of a cost structure are used in autovectorization 
decisions, when -fvect-cost-model is in effect (please look at the ehd 
of i386.h where these values are used).


3.) The processor I am currently attempting to model is 
single-issue/in-order with a simple pipeline. Stalls can occasionally 
occur in the fetch/decode/translate, but the core is the latency of 
instructions in the functional units in the execute stage. What 
recommendations can anyone make to me for designing the DFA? Should it 
just directly model the functional units latencies for certain insn types?
Hm, perhaps you should look into {athlon, geode, k6, pentium, ppro}.md 
files first. All these files define scheduling for various processors. 
I'm sure that quite some ideas can be harvested there.


Uros.



extend gthr-posix.h with rwlock

2008-06-05 Thread Luke Dalessandro
We have code that fails to scale do to the object_mutex lock in 
unwind-dw2-fde.c. This mutex protects two lists local to the file. The primary 
list is used in "read-mostly" mode, with the secondary list used rarely when 
writing needs to happen.


I am trying to change this locking scheme to use a reader/writer lock (I'd 
prefer something even more scalable, like an RCU style algorithm, or seqlock + 
partially visible reader count, but I don't have time at the moment to do 
anything like that).


I've set up forwarding to pthread_rwlock_t and the corresponding functions in 
gthr-posix.h, just following the template of how pthread_mutex_t is linked in.


My problem is that unwind-dw2-fde.c seems to be compiled multiple times during 
a gcc build, and sometimes my additions are found but other times they are 
not. I am rebuilding again (AIX 5.1), and I'll post more information for 
anyone that needs it.


In the meantime, is there a how-to anywhere that describes adding or modifying 
gthr.h models in gcc?


Thanks,
Luke


Re: extend gthr-posix.h with rwlock

2008-06-05 Thread David Edelsohn
> Luke Dalessandro writes:

Luke> My problem is that unwind-dw2-fde.c seems to be compiled multiple times 
during 
Luke> a gcc build, and sometimes my additions are found but other times they 
are 
Luke> not. I am rebuilding again (AIX 5.1), and I'll post more information for 
Luke> anyone that needs it.

Luke> In the meantime, is there a how-to anywhere that describes adding or 
modifying 
Luke> gthr.h models in gcc?

AIX multilibs pthread support.  Unlike Linux, AIX does not provide
weak versions of the pthread symbols when operating in single-threaded
mode.  AIX uses gthr-aix.h, which includes gthr-posix.h or gthr-single.h
depending on the -pthread option.

David



Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Nick Kledzik


On Jun 5, 2008, at 10:43 AM, Ian Lance Taylor wrote:

Chris Lattner <[EMAIL PROTECTED]> writes:


LLVM LTO handles this by marking symbols "internal" (aka static, aka
not TREE_PUBLIC, whatever) when the symbol is not visible outside the
LTO scope.  This allows the optimizers to go crazy and hack away at
the symbols, but only when safe.


How does the linker do this?  Are you saying that when generating a
shared library, the linker calls lto_codegen_add_must_preserve_symbol
for every externally visible symbol?

Yes.


How does the linker tell LTO that a symbol may be inlined, but must
also be externally visible?

The linker just tells LTO which symbols must remain.  The LTO engine
is free to inline anything that would improve codegen, with the  
exception

that any weak definition that must remain (preserved) cannot be inlined.

-Nick



Question about modifying gcc

2008-06-05 Thread dreese

Could you please direct me to someone who would be willing and able to answer a 
few questions about some of the internal workings of the gcc compiler. 

I am attempting to modify the compiler to instrument function calls and 
returns. The end result that i am trying to achieve is to send the address of 
every called function to a memory mapped file prior to the call and after the 
call send an immediate value to that same file. The target architecture is x86.

Here is an example in pseudo assembly of what i want to accomplish.


regular  modified
instruction  instruction
instruction  instruction
mov $function-name, (eax)
call function-name  call function-name
move $0x1000, (eax)
instruction  instruction
instruction  instruction

where eax is the address of the memory mapped file.


The purpose of this is to collect information about calls and returns in order 
to build call graphs and operating tendencies of software systems. 

So far i have had little success.

I have been trying to change the machine description as well as the target 
description macros and function in order to get the desired functionality. I 
have been able to insert instruction into the compiled code, via 
output_asm_insn(), but not in the correct place. 

Is there someone who would be able to help me with my problem. 

Thank you
Dale Reese


Re: Question about modifying gcc

2008-06-05 Thread Joe Buck
On Thu, Jun 05, 2008 at 12:55:17PM -0700, [EMAIL PROTECTED] wrote:

> I am attempting to modify the compiler to instrument function calls and
> returns. The end result that i am trying to achieve is to send the
> address of every called function to a memory mapped file prior to the
> call and after the call send an immediate value to that same file. The
> target architecture is x86.

You should be able to achieve what you want without modifying the
compiler.  Check the manual for the the -finstrument-functions option.

There's also the existing coverage support: compile with -ftest-coverage
-fprofile-arcs, then run gcov.


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Ian Lance Taylor
Nick Kledzik <[EMAIL PROTECTED]> writes:

>> How does the linker tell LTO that a symbol may be inlined, but must
>> also be externally visible?
> The linker just tells LTO which symbols must remain.  The LTO engine
> is free to inline anything that would improve codegen, with the
> exception
> that any weak definition that must remain (preserved) cannot be inlined.

I'll just note that that isn't optimal for ELF when producing an
executable.

Ian


Re: How to build on AMD64/Debian under x86 32bits chroot?

2008-06-05 Thread Matthias Klose
Basile STARYNKEVITCH writes:
> Hello All
> 
> As (I imagine) many developers I have a 64 bits machine - running Debian 
> (Sid) Linux AMD64.
> 
> I want to test my MELT branch on x86 (32 bits). So I set up (using 
> debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32) 
> which has most of the *-dev packages installed.
> 
> In this chroot-ed environment I am able to compile several software 
> without issues. For example, I just compiled there the PPL.
> 
> The point is that even after schroot the uname system call (& the uname 
> command) still return x86_64 as the machine. I suppose there is no easy 
> trick to circumvent this.

make sure that 'personality=linux32' is set for this chroot in
/etc/schroot/schroot.conf (or as suggested prefix the schroot command
with 'linux32' every time you enter the chroot).

  Matthias


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Chris Lattner


On Jun 5, 2008, at 2:03 PM, Ian Lance Taylor wrote:


Nick Kledzik <[EMAIL PROTECTED]> writes:


How does the linker tell LTO that a symbol may be inlined, but must
also be externally visible?

The linker just tells LTO which symbols must remain.  The LTO engine
is free to inline anything that would improve codegen, with the
exception
that any weak definition that must remain (preserved) cannot be  
inlined.


I'll just note that that isn't optimal for ELF when producing an
executable.


Why? Because you have to touch (worst case) every symbol?  The cost of  
doing LTO *dramatically* dwarfs the cost of touching symbols  
once.  :)  You're right this could be improved, and we're actively  
working on it... but it seems like a strange thing to worry about vs  
correctness in all cases.


-Chris


Re: extend gthr-posix.h with rwlock

2008-06-05 Thread Luke Dalessandro

David Edelsohn wrote:

Luke Dalessandro writes:


Luke> My problem is that unwind-dw2-fde.c seems to be compiled multiple times during 
Luke> a gcc build, and sometimes my additions are found but other times they are 
Luke> not. I am rebuilding again (AIX 5.1), and I'll post more information for 
Luke> anyone that needs it.


Luke> In the meantime, is there a how-to anywhere that describes adding or modifying 
Luke> gthr.h models in gcc?


AIX multilibs pthread support.  Unlike Linux, AIX does not provide
weak versions of the pthread symbols when operating in single-threaded
mode.  AIX uses gthr-aix.h, which includes gthr-posix.h or gthr-single.h
depending on the -pthread option.


Thank you, this was indeed the problem. I added the needed stubbs in 
gthr-single.h and it now compiles fine. Unfortunately there seems to be 
something wrong with my installation of ld as linking fails with a large 
number of errors of the form:


ld: 0711-252 SEVERE ERROR: File auxiliary symbol entry 1 in object _negdi2_s.o:
Field x_offset contains 4. Valid values are between 4 and -1.
The object name is being substituted.

Unfortunately I have almost no experience with AIX. I'll look for a prebuilt 
ld that seems newer than mine to see if this helps the problem.


Thank you for your help.

Luke


gcc-4.3-20080605 is now available

2008-06-05 Thread gccadmin
Snapshot gcc-4.3-20080605 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20080605/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.3 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_3-branch 
revision 136415

You'll find:

gcc-4.3-20080605.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.3-20080605.tar.bz2 C front end and core compiler

gcc-ada-4.3-20080605.tar.bz2  Ada front end and runtime

gcc-fortran-4.3-20080605.tar.bz2  Fortran front end and runtime

gcc-g++-4.3-20080605.tar.bz2  C++ front end and runtime

gcc-java-4.3-20080605.tar.bz2 Java front end and runtime

gcc-objc-4.3-20080605.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.3-20080605.tar.bz2The GCC testsuite

Diffs from 4.3-20080529 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.3
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: extend gthr-posix.h with rwlock

2008-06-05 Thread David Edelsohn
> Luke Dalessandro writes:

Luke> Thank you, this was indeed the problem. I added the needed stubbs in 
Luke> gthr-single.h and it now compiles fine. Unfortunately there seems to be 
Luke> something wrong with my installation of ld as linking fails with a large 
Luke> number of errors of the form:

Luke> ld: 0711-252 SEVERE ERROR: File auxiliary symbol entry 1 in object 
_negdi2_s.o:
Luke> Field x_offset contains 4. Valid values are between 4 and -1.
Luke> The object name is being substituted.

Luke> Unfortunately I have almost no experience with AIX. I'll look for a 
prebuilt 
Luke> ld that seems newer than mine to see if this helps the problem.

Pre-built ld?  AIX ships with ld.  Are you using GNU Binutils
(gas, GNU ld, etc.) on AIX?  Please use the native AIX tools (as, ld, nm,
etc.) with AIX as mentioned in the platform-specific installation notes:

http://gcc.gnu.org/install/specific.html#x-ibm-aix

David



Re: [lto] Streaming out language-specific DECL/TYPEs

2008-06-05 Thread Daniel Berlin
On Thu, Jun 5, 2008 at 5:57 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote:
>> Jan Hubicka wrote:
>>
>> >Sure if it works, we should be lowering the types during gimplification
>> >so we don't need to store all this in memory...
>> >But C++ FE still use its local data later in stuff like thunks, but we
>> >will need to cgraphize them anyway.
>>
>> I agree.  The only use of language-specific DECLs and TYPEs after
>> gimplification should be for generating debug information.  And if
>> that's already been done, then you shouldn't need it at all.
>
> For LTO with debug info we will probably need some frontend neutral
> debug info representaiton in longer run, since optimization modifying
> the data types and such will need to compensate.
>
> We can translate stuff to in-memory dwarf and update it but that would
> limit amount of debug info format we will want to support probably.
DWARF is not exactly memory or space efficient, sadly.
Then again,  what most other compilers have done is bite the bullet
and define their own "debug info" data, then transform that to dwarf2
at the very end.
I"m not sure we want to do that either :(


Re: extend gthr-posix.h with rwlock

2008-06-05 Thread Luke Dalessandro

David Edelsohn wrote:

Luke Dalessandro writes:


Luke> Thank you, this was indeed the problem. I added the needed stubbs in 
Luke> gthr-single.h and it now compiles fine. Unfortunately there seems to be 
Luke> something wrong with my installation of ld as linking fails with a large 
Luke> number of errors of the form:


Luke> ld: 0711-252 SEVERE ERROR: File auxiliary symbol entry 1 in object 
_negdi2_s.o:
Luke> Field x_offset contains 4. Valid values are between 4 and -1.
Luke> The object name is being substituted.

Luke> Unfortunately I have almost no experience with AIX. I'll look for a prebuilt 
Luke> ld that seems newer than mine to see if this helps the problem.


Pre-built ld?  AIX ships with ld.  Are you using GNU Binutils
(gas, GNU ld, etc.) on AIX?  Please use the native AIX tools (as, ld, nm,
etc.) with AIX as mentioned in the platform-specific installation notes:


No, I'm sorry I wasn't clear. I am using all of the AIX tools, not Binutils. I 
just assumed that there was something out-of-date with the ld that came with 
our AIX 5.1 installation.



http://gcc.gnu.org/install/specific.html#x-ibm-aix


I have seen this page before, and I'm not sure that it helps me. I'm running 
into the same behavior posted at 
http://gcc.gnu.org/ml/gcc-bugs/2005-04/msg03175.html, where the advice is also 
to look at this page, but there doesn't seem to be a reply from the original 
poster.


Thanks,
Luke


A request for md5 hashs to be published

2008-06-05 Thread Dennis Clarke
A small request.

Can the md5 sum hash for the various release files be published at the
main GCC release pages ?
If we look at http://gcc.gnu.org/gcc-4.2/ there is no md5 sum there
and while I can find that data at a mirror thus :

ftp://ftp.mirrorservice.org/sites/sources.redhat.com/pub/gcc/releases/gcc-4.2.4/md5.sum

.. there is no statement of the authenticity of that source file.

I can confim that the md5sum from *that* specific mirror is correct
but that does not convince me that I have a valid tar file :

vesta:/mnt/lfs/sources/tarballs# md5sum gcc-4.2.4.tar.bz2
d79f553e7916ea21c556329eacfeaa16  gcc-4.2.4.tar.bz2

The truth is, I can uncompress that tar file and then recompress it
and get a different md5sum for the exact same input file. That would
also be a valid md5 hash but only for my personal internal mirror.
Really, there should be, in my opinion, a single master page with the
md5sum of the uncompressed tar ball and then the average user can
confirm that it is correct from the master signature page.

Dennis


Re: [whopr] Design/implementation alternatives for the driver and WPA

2008-06-05 Thread Ian Lance Taylor
Chris Lattner <[EMAIL PROTECTED]> writes:

> On Jun 5, 2008, at 2:03 PM, Ian Lance Taylor wrote:
>
>> Nick Kledzik <[EMAIL PROTECTED]> writes:
>>
 How does the linker tell LTO that a symbol may be inlined, but must
 also be externally visible?
>>> The linker just tells LTO which symbols must remain.  The LTO engine
>>> is free to inline anything that would improve codegen, with the
>>> exception
>>> that any weak definition that must remain (preserved) cannot be
>>> inlined.
>>
>> I'll just note that that isn't optimal for ELF when producing an
>> executable.
>
> Why? Because you have to touch (worst case) every symbol?  The cost of
> doing LTO *dramatically* dwarfs the cost of touching symbols  once.
> :)  You're right this could be improved, and we're actively  working
> on it... but it seems like a strange thing to worry about vs
> correctness in all cases.

Whoops, sorry, I meant the other thing.  Not inlining any weak
definition that must remain is not optimal.  When linking an
executable, it is perfectly OK to inline a weak function, even if the
weak symbol is required to remain in the final output file.  In
general if the symbol is known to be bound locally, then it is OK to
inline it.  This is separate from the question of whether the symbol
is visible externally.

Ian


Re: A request for md5 hashs to be published

2008-06-05 Thread Joe Buck
On Fri, Jun 06, 2008 at 01:03:19AM +, Dennis Clarke wrote:
> Can the md5 sum hash for the various release files be published at the
> main GCC release pages ?
> If we look at http://gcc.gnu.org/gcc-4.2/ there is no md5 sum there
> and while I can find that data at a mirror thus :
> 
> ftp://ftp.mirrorservice.org/sites/sources.redhat.com/pub/gcc/releases/gcc-4.2.4/md5.sum
> 
> .. there is no statement of the authenticity of that source file.

The versions on ftp.gnu.org are accompanied by digital signatures,
which should give stronger assurance than just an md5 sum.