Dream of being healthier?
Bring more pleasure to your xlife! http://pdo.
Re: [whopr] Design/implementation alternatives for the driver and WPA
>> In ELF you have to think about symbol overriding. Let's say you link >> a.o b.o c.o. a.o has a reference to symbol S. b.o has a strong >> definition. c.o has a weak definition. a.o and c.o have LTO >> information, b.o does not. ELF requires that a.o call the symbol from >> b.o, not the symbol from c.o. I don't see how to make that work with >> the LLVM interface. > > This does work. There are two parts to it. First the linker's master > symbol > table sees the strong definition of S in b.o and the weak in c.o and > decides to use the strong one from b.o. Second (because of that) the linker > calls lto_codegen_add_must_preserve_symbol("S"). The LTO engine then > sees it has a weak global function S and it cannot inline those. Put > together > the LTO engine does generate a copy of S, but the linker throws it away > and uses the one from b.o. Interesting. The use of lto_codegen_add_must_preserve_symbol is kind of the opposite of what I had understood. What do you do in this case: a.o: IL file that contains a reference to "f" b.o: IL file that has a weak def of "f" There is no strong definition. Can you inline f into the use in a.o? > -Nick > > Cheers, -- Rafael Avila de Espindola Google Ireland Ltd. Gordon House Barrow Street Dublin 4 Ireland Registered in Dublin, Ireland Registration Number: 368047
Re: Is this a typo in setup_incoming_varargs_64?
> Hi, > > setup_incoming_varargs_64 in i386.c has > > /* Compute address to jump to : > label - 5*eax + nnamed_sse_arguments*5 */ > > The comments don't match the code. Shout the comments be > > /* Compute address to jump to : > label - 4*eax + nnamed_sse_arguments*4 */ Yes, this is most likely type caused by originally using different register than eax that resulted in different length of encoding. Thanks for noticing it! Honza > > Thanks. > > -- > H.J.
Re: [lto] Streaming out language-specific DECL/TYPEs
> Jan Hubicka wrote: > > >Sure if it works, we should be lowering the types during gimplification > >so we don't need to store all this in memory... > >But C++ FE still use its local data later in stuff like thunks, but we > >will need to cgraphize them anyway. > > I agree. The only use of language-specific DECLs and TYPEs after > gimplification should be for generating debug information. And if > that's already been done, then you shouldn't need it at all. For LTO with debug info we will probably need some frontend neutral debug info representaiton in longer run, since optimization modifying the data types and such will need to compensate. We can translate stuff to in-memory dwarf and update it but that would limit amount of debug info format we will want to support probably. Honza > > -- > Mark Mitchell > CodeSourcery > [EMAIL PROTECTED] > (650) 331-3385 x713
How to build on AMD64/Debian under x86 32bits chroot?
Hello All As (I imagine) many developers I have a 64 bits machine - running Debian (Sid) Linux AMD64. I want to test my MELT branch on x86 (32 bits). So I set up (using debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32) which has most of the *-dev packages installed. In this chroot-ed environment I am able to compile several software without issues. For example, I just compiled there the PPL. The point is that even after schroot the uname system call (& the uname command) still return x86_64 as the machine. I suppose there is no easy trick to circumvent this. I thought that ../configure --build=x86-linux --target=x86-linux --host=x86-linux (with other MELT specific options) should be enough, but apparently not; make fails with checking for struct tms... yes checking for clock_t... yes checking for .preinit_array/.init_array/.fini_array support... yes checking if mkdir takes one argument... no *** Configuration x86-unknown-linux-gnu not supported make[1]: *** [configure-gcc] Error 1 make[1]: Leaving directory `/usr/src/Lang/_MeltObj32' and gcc/config.log does indeed show hostname = glinka uname -m = x86_64 uname -r = 2.6.24-1-amd64 uname -s = Linux uname -v = #1 SMP Fri Apr 18 23:08:22 UTC 2008 /usr/bin/uname -p = unknown /bin/uname -X = unknown /bin/arch = unknown /usr/bin/arch -k = unknown /usr/convex/getsysinfo = unknown /usr/bin/hostinfo = unknown /bin/machine = unknown /usr/bin/oslevel = unknown /bin/universe = unknown Any hints are welcome. If possible, I would like to avoid to have to install a virtual machine... Regards -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mines, sont seulement les miennes} ***
Re: How to build on AMD64/Debian under x86 32bits chroot?
Basile STARYNKEVITCH wrote: > Hello All > > As (I imagine) many developers I have a 64 bits machine - running Debian > (Sid) Linux AMD64. > > I want to test my MELT branch on x86 (32 bits). So I set up (using > debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32) > which has most of the *-dev packages installed. > > In this chroot-ed environment I am able to compile several software > without issues. For example, I just compiled there the PPL. > > The point is that even after schroot the uname system call (& the uname > command) still return x86_64 as the machine. I suppose there is no easy > trick to circumvent this. > > > I thought that >../configure --build=x86-linux --target=x86-linux --host=x86-linux > (with other MELT specific options) should be enough, but apparently not; > make fails with --target=i386-linux Andrew.
Re: How to build on AMD64/Debian under x86 32bits chroot?
On Thu, Jun 5, 2008 at 3:14 PM, Basile STARYNKEVITCH <[EMAIL PROTECTED]> wrote: > Hello All > > As (I imagine) many developers I have a 64 bits machine - running Debian > (Sid) Linux AMD64. > > I want to test my MELT branch on x86 (32 bits). So I set up (using > debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32) > which has most of the *-dev packages installed. > > In this chroot-ed environment I am able to compile several software without > issues. For example, I just compiled there the PPL. > > The point is that even after schroot the uname system call (& the uname > command) still return x86_64 as the machine. I suppose there is no easy > trick to circumvent this. Usually there is a command called 'linux32' which fixes this. > > I thought that > ../configure --build=x86-linux --target=x86-linux --host=x86-linux > (with other MELT specific options) should be enough, but apparently not; > make fails with and it is i686-pc-linux-gnu, x86-linux is not a valid target triplet. Richard.
Re: [whopr] Design/implementation alternatives for the driver and WPA
"Rafael Espindola" <[EMAIL PROTECTED]> writes: > Interesting. The use of lto_codegen_add_must_preserve_symbol is kind > of the opposite of what I had understood. What do you do in this case: > > a.o: IL file that contains a reference to "f" > b.o: IL file that has a weak def of "f" > > There is no strong definition. Can you inline f into the use in a.o? I don't know what LLVM does, but in principle, in ELF, you can do this inlining when linking an executable, but not when linking a shared library. Actually, when linking a shared library, what matters is not whether the definition of "f" is weak or not, but what the visibility of 'f" is (default, hidden, protected, or internal). And, of course, the visibility of "f" can be set by link-time options (e.g., -Bsymbolic). Ian
Re: [whopr] plugin interface design
Chris Lattner <[EMAIL PROTECTED]> writes: > I don't know how closely your plans follow this model. If you think > this approach is reasonable, you really do need to reflect things like > symbol versions in your IR somehow. This compiler must know about > versions, and when it does, it is easy to avoid optimizations that are > invalid for them. Sure. But here's the thing: the gcc LTO approach involves having a regular object with a regular symbol table, and the IR is embedded in the object. In other words, we do know the symbol version information: it's in the symbol table of the object. And so what I'm discussing is a way for the linker to communicate the relevant part of that information to the compiler plugin. The relevant part is: "this undefined symbol reference in a.o is bound to this symbol definition in b.o." There is nothing else that the compiler needs to know. (Actually, when we move on to applying LTO across shared library boundaries we may also want to say something about the strength of the binding.) I appreciate the cleanliness and simplicity of your description. I'm trying to fill in an ugly edge. The reality is that symbol versions are expressed via assembly language pseudo-ops, both in C/C++ files and in assembly code, and also via version scripts passed to the linker. To the limited extent that the compiler needs to be aware of them, the linker needs to convey that information. If we decree that the information must be expressed directly in the compiler IR, then I think we're looking at a considerably larger degree of ugliness. Ian
RFC: Extend x86-64 psABI for 256bit AVX register
Hi, x86-64 psABI defines typedef struct { unsigned int gp_offset; unsigned int fp_offset; void *overflow_arg_area; void *reg_save_area; } va_list[1]; for variable argument list. "va_list" is used to access variable argument list: void bar (const char *format, va_list ap) { if (va_arg (ap, int) != 0) abort (); } void foo(char *fmt, ...) { va_list ap; va_start (fmt, ap); bar (fmt, ap); va_end (ap); } foo and bar may be compiled with different compilers. We have to keep the current layout for va_list so that we can mix va_list codes compiled with AVX and non-AVX compilers. We need to extend the variable argument handling in the x86-64 psABI to support passing __m256/__m256d/__m256i on the variable argument list. We propose 2 ways to extend the register save area to add 256bit AVX registers support: 1. Extend the register save area to put upper 128bit at the end. Pros: Aligned access. Save stack space if 256bit registers are used. Cons Split access. Require more split access beyond 256bit. 2. Extend the register save area to put full 265bit YMMs at the end. The first DWORD after the register save area has the offset of the extended array for YMM registers. The next DWORD has the element size of the extended array. Unaligned access will be used. Pros: No split access. Easily extendable beyond 256bit. Limited unaligned access penalty if stack is aligned at 32byte. Cons: May require store both the lower 128bit and full 256bit register content. We may avoid saving the lower 128bit if correct type is required when accessing variable argument list, similar to int vs. double. Waste 272 byte on stack when 256bit registers are used. Unaligned load and store. We should agree on one approach to ensure compatibility between different compilers. Personally, I prefer #2 for its simplicity. Does anyone else have a preference? Thanks. -- H.J.
Re: RFC: Extend x86-64 psABI for 256bit AVX register
On Thu, Jun 5, 2008 at 4:31 PM, H.J. Lu <[EMAIL PROTECTED]> wrote: > Hi, > > x86-64 psABI defines > > typedef struct > { > unsigned int gp_offset; > unsigned int fp_offset; > void *overflow_arg_area; > void *reg_save_area; > } va_list[1]; > > for variable argument list. "va_list" is used to access variable argument > list: > > void > bar (const char *format, va_list ap) > { > if (va_arg (ap, int) != 0) >abort (); > } > > void > foo(char *fmt, ...) > { > va_list ap; > va_start (fmt, ap); > bar (fmt, ap); > va_end (ap); > } > > foo and bar may be compiled with different compilers. We have to keep > the current layout for va_list so that we can mix va_list codes compiled > with AVX and non-AVX compilers. We need to extend the variable argument > handling in the x86-64 psABI to support passing __m256/__m256d/__m256i > on the variable argument list. We propose 2 ways to extend the register > save area to add 256bit AVX registers support: > > 1. Extend the register save area to put upper 128bit at the end. > Pros: >Aligned access. >Save stack space if 256bit registers are used. > Cons >Split access. Require more split access beyond 256bit. > > 2. Extend the register save area to put full 265bit YMMs at the end. > The first DWORD after the register save area has the offset of > the extended array for YMM registers. The next DWORD has the > element size of the extended array. Unaligned access will be used. > Pros: >No split access. >Easily extendable beyond 256bit. >Limited unaligned access penalty if stack is aligned at 32byte. > Cons: >May require store both the lower 128bit and full 256bit register >content. We may avoid saving the lower 128bit if correct type >is required when accessing variable argument list, similar to int >vs. double. >Waste 272 byte on stack when 256bit registers are used. >Unaligned load and store. > > We should agree on one approach to ensure compatibility between > different compilers. > > Personally, I prefer #2 for its simplicity. Does anyone else have a > preference? If you want to mix AVX and non-AVX code then you need a way to detect if AVX information was saved at runtime. What is it in those both cases? If you don't want to mix AVX and non-AVX code then basically you can declare the ABIs incompatible anyway? There is also a third option of passing AVX values by reference. For simplicity I would also prefer 2) - after all we don't need to fill in the XMM area / the AVX area if the value is unused. Richard.
Re: [whopr] Design/implementation alternatives for the driver and WPA
Hi, I am jumping in somewhat late, as yesterday I was on meetings without internet access. (and I probably will be offline again tomorrow) I think that in basic terms we all mostly agree (we want to implement optimization scheme that does not get everything into memory, we want to parallelize the post-IPA copmilation). Linker interface seems very fine too. > > WHOPR simply adds another alternative, if you are willing to only run > summary-based transformations, we can split the analysis and > transformation phases in two such that you can parallelize the work > over a cluster or a large SMP. That's it. Nothing more. I think one problem is that both repackaging and cherry picking as described is very centric about application on inlining. It is probably quite clear now, that the list of optimizations we want to perform on LTO scale is going to grow from basic inlining + aliasing combo quite soon. Especially that datastructure changes are starting to kick in. We also would need to sanely support partial offlining, clonning, etc. This IMO should be somehow considered. It is quite possible to implement all this based on summaries, but we need to think of flexibility of the whole scheme and not overly limit it at least in the current stages of implementation. If, for example, we would end up with difficulties to do struct-reorg style transformation that mvoes fields within structure, we would run into problems very soon. I personally always leaned to kind of repackaging scheme. I've hoped that with sanely designed LTO dumping scheme, this will be relatively straighforward to implement: simply you re-use same serialized functions as they are in the original .o files and replace function summaries by transformation summaries, so we might pretty much re-use same infrastructure. With sane caching mechanizm to keeping unmodified function bodies in memory in cooperation in GGC, the repackaging stage should be possible to implement as simple pass through the callgraph writting the selected functions to the output file. One advantage also is that local but non-trivial changes to program can be done at LTO decision time that would simplify the inter-IPA-pass iteraction that seems the most scary issue here. Honza
Re: RFC: Extend x86-64 psABI for 256bit AVX register
> > 1. Extend the register save area to put upper 128bit at the end. > Pros: > Aligned access. > Save stack space if 256bit registers are used. > Cons > Split access. Require more split access beyond 256bit. > > 2. Extend the register save area to put full 265bit YMMs at the end. > The first DWORD after the register save area has the offset of > the extended array for YMM registers. The next DWORD has the > element size of the extended array. Unaligned access will be used. > Pros: > No split access. > Easily extendable beyond 256bit. > Limited unaligned access penalty if stack is aligned at 32byte. > Cons: > May require store both the lower 128bit and full 256bit register > content. We may avoid saving the lower 128bit if correct type > is required when accessing variable argument list, similar to int > vs. double. > Waste 272 byte on stack when 256bit registers are used. > Unaligned load and store. > > We should agree on one approach to ensure compatibility between > different compilers. This is something that definitly should be hanlded by ABI update. We probably need to also somehow update the way to specify what to save to varargs prologue. Otherwise if you would have YMM aware printf running on non-AVX hardware, we would end up with invalid instructions. At the moment, eax is required to specify number of XMM registers, we probably can extend it to have number of XMM registers in AL and YMM in AH. I personally don't have much preferences over 1. or 2.. 1. seems relatively easy to implement too, or is packaging two 128bit values to single 256bit difficult in va_arg expansion? Honza > > Personally, I prefer #2 for its simplicity. Does anyone else have a > preference? > > Thanks. > > -- > H.J.
Re: [whopr] Design/implementation alternatives for the driver and WPA
On Thu, Jun 5, 2008 at 11:09, Jan Hubicka <[EMAIL PROTECTED]> wrote: > I think one problem is that both repackaging and cherry picking as > described is very centric about application on inlining. No, that's simply the main application for the initial implementation. Any other summary-based transformation can be supported the same way. Optimizations that are not summary-based can be done the way they're done today. All that happens is that they won't be able take advantage of the partitioning and distribution since WPA and LTRANS will be executed together. And of course, even summary-based transformations can be done the same way they are done today. The scaling aspects of WHOPR should only kick in via a special option, or even via heuristics. > I personally always leaned to kind of repackaging scheme. I've hoped > that with sanely designed LTO dumping scheme, this will be relatively > straighforward to implement: simply you re-use same serialized functions > as they are in the original .o files and replace function summaries by > transformation summaries, so we might pretty much re-use same > infrastructure. With sane caching mechanizm to keeping unmodified > function bodies in memory in cooperation in GGC, the repackaging stage > should be possible to implement as simple pass through the callgraph > writting the selected functions to the output file. Sure. All this is possible and we shouldn't break it. Diego.
Re: RFC: Extend x86-64 psABI for 256bit AVX register
On Thu, Jun 5, 2008 at 7:49 AM, Richard Guenther <[EMAIL PROTECTED]> wrote: > On Thu, Jun 5, 2008 at 4:31 PM, H.J. Lu <[EMAIL PROTECTED]> wrote: >> Hi, >> >> x86-64 psABI defines >> >> typedef struct >> { >> unsigned int gp_offset; >> unsigned int fp_offset; >> void *overflow_arg_area; >> void *reg_save_area; >> } va_list[1]; >> >> for variable argument list. "va_list" is used to access variable argument >> list: >> >> void >> bar (const char *format, va_list ap) >> { >> if (va_arg (ap, int) != 0) >>abort (); >> } >> >> void >> foo(char *fmt, ...) >> { >> va_list ap; >> va_start (fmt, ap); >> bar (fmt, ap); >> va_end (ap); >> } >> >> foo and bar may be compiled with different compilers. We have to keep >> the current layout for va_list so that we can mix va_list codes compiled >> with AVX and non-AVX compilers. We need to extend the variable argument >> handling in the x86-64 psABI to support passing __m256/__m256d/__m256i >> on the variable argument list. We propose 2 ways to extend the register >> save area to add 256bit AVX registers support: >> >> 1. Extend the register save area to put upper 128bit at the end. >> Pros: >>Aligned access. >>Save stack space if 256bit registers are used. >> Cons >>Split access. Require more split access beyond 256bit. >> >> 2. Extend the register save area to put full 265bit YMMs at the end. >> The first DWORD after the register save area has the offset of >> the extended array for YMM registers. The next DWORD has the >> element size of the extended array. Unaligned access will be used. >> Pros: >>No split access. >>Easily extendable beyond 256bit. >>Limited unaligned access penalty if stack is aligned at 32byte. >> Cons: >>May require store both the lower 128bit and full 256bit register >>content. We may avoid saving the lower 128bit if correct type >>is required when accessing variable argument list, similar to int >>vs. double. >>Waste 272 byte on stack when 256bit registers are used. >>Unaligned load and store. >> >> We should agree on one approach to ensure compatibility between >> different compilers. >> >> Personally, I prefer #2 for its simplicity. Does anyone else have a >> preference? > > If you want to mix AVX and non-AVX code then you need a way to > detect if AVX information was saved at runtime. What is it in those > both cases? > > If you don't want to mix AVX and non-AVX code then basically you > can declare the ABIs incompatible anyway? We want to extend the psABI in such a way that we can link AVX enabled code to call vfprintf in glibc which is compiled with the older compiler and doesn't use YMM registers. That is if bar, in the example above, doesn't use YMM registers, it can be compiled by any compilers. bar doesn't need to know if YMM registers are used in caller at all. All necessary information for YMM registers are specified in the psABI. If a compiler doesn't use YMM registers, it doesn't have to do anything. > > There is also a third option of passing AVX values by reference. > > For simplicity I would also prefer 2) - after all we don't need to fill > in the XMM area / the AVX area if the value is unused. > That is what I believe. Thanks. -- H.J.
Re: RFC: Extend x86-64 psABI for 256bit AVX register
On Thu, Jun 5, 2008 at 8:15 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote: >> >> 1. Extend the register save area to put upper 128bit at the end. >> Pros: >> Aligned access. >> Save stack space if 256bit registers are used. >> Cons >> Split access. Require more split access beyond 256bit. >> >> 2. Extend the register save area to put full 265bit YMMs at the end. >> The first DWORD after the register save area has the offset of >> the extended array for YMM registers. The next DWORD has the >> element size of the extended array. Unaligned access will be used. >> Pros: >> No split access. >> Easily extendable beyond 256bit. >> Limited unaligned access penalty if stack is aligned at 32byte. >> Cons: >> May require store both the lower 128bit and full 256bit register >> content. We may avoid saving the lower 128bit if correct type >> is required when accessing variable argument list, similar to int >> vs. double. >> Waste 272 byte on stack when 256bit registers are used. >> Unaligned load and store. >> >> We should agree on one approach to ensure compatibility between >> different compilers. > > This is something that definitly should be hanlded by ABI update. > > We probably need to also somehow update the way to specify what to save > to varargs prologue. Otherwise if you would have YMM aware printf Yes, but I believe that is compiler specific. Different compilers may have different approaches for varargs prologue, as long as they follow the psABI. > running on non-AVX hardware, we would end up with invalid instructions. That is nothing new. The same applies to SSE on ia32. Basically, you shouldn't call YMM aware printf on non-AVX hardware. You can have /lib64/avx/libc.so.6 if necessary. > > At the moment, eax is required to specify number of XMM registers, we > probably can extend it to have number of XMM registers in AL and YMM in > AH. ymm0 and xmm0 are the same register. xmm0 is the lower 128bit of xmm0. I am not sure if we need separate XMM registers from YMM registers. > > I personally don't have much preferences over 1. or 2.. 1. seems > relatively easy to implement too, or is packaging two 128bit values to > single 256bit difficult in va_arg expansion? > Access to 256bit register as lower and upper 128bits needs 2 instructions. For store vmovaps %xmm7, -143(%rax) vextractf128 $1, %ymm7, -15(%rax) For load vmovaps -143(%rax),%xmm7 vinsert128 $1, -15(%rax),%ymm7,%ymm7 If we go beyond 256bit, we need more instructions to access the full register. For 512bit, it will be split into lower 128bit, middle 128bit and upper 256bit. 1024bit will have 4 parts. For #2, only one instruction will be needed for 256bit and beyond. Thanks. -- H.J.
Re: [whopr] plugin interface design
On Jun 5, 2008, at 6:51 AM, Ian Lance Taylor wrote: Chris Lattner <[EMAIL PROTECTED]> writes: I don't know how closely your plans follow this model. If you think this approach is reasonable, you really do need to reflect things like symbol versions in your IR somehow. This compiler must know about versions, and when it does, it is easy to avoid optimizations that are invalid for them. Sure. But here's the thing: the gcc LTO approach involves having a regular object with a regular symbol table, and the IR is embedded in the object. In other words, we do know the symbol version information: it's in the symbol table of the object. Wow, that seems incredibly limiting. This means that your LTO either has to: 1) treat the object header as part of the IR, or 2) avoid making any changes that would affect exported symbols Is that right? Why doesn't the "LTO reader" just read the symbol info from the ELF header and reflect it into the trees somehow? -Chris
Re: [whopr] Design/implementation alternatives for the driver and WPA
On Jun 5, 2008, at 6:59 AM, Ian Lance Taylor wrote: "Rafael Espindola" <[EMAIL PROTECTED]> writes: Interesting. The use of lto_codegen_add_must_preserve_symbol is kind of the opposite of what I had understood. What do you do in this case: a.o: IL file that contains a reference to "f" b.o: IL file that has a weak def of "f" There is no strong definition. Can you inline f into the use in a.o? I don't know what LLVM does, but in principle, in ELF, you can do this inlining when linking an executable, but not when linking a shared library. Actually, when linking a shared library, what matters is not whether the definition of "f" is weak or not, but what the visibility of 'f" is (default, hidden, protected, or internal). And, of course, the visibility of "f" can be set by link-time options (e.g., -Bsymbolic). In LLVM LTO, the model is that the linker is the one that knows about visibility. The problem is that 'hidden' is not sufficient to capture visibility info when mixing LTO modules with native ones. If you have: [a-c].c and compile [ab].c with LTO and c.c without, any hidden symbols should be visible outside the [ab].o LTO region. LLVM LTO handles this by marking symbols "internal" (aka static, aka not TREE_PUBLIC, whatever) when the symbol is not visible outside the LTO scope. This allows the optimizers to go crazy and hack away at the symbols, but only when safe. 'Weakness' only matters when a symbol is exported from the LTO scope, so 'weak' and 'visibility' are orthogonal. -Chris
Re: Question regarding C++ frontend
On Sat, May 03, 2008 at 08:29:25AM -0400, Doug Gregor wrote: > INNERMOST_TEMPLATE_ARGS can be used to get at the "innermost" TREE_VEC > of template arguments for a class template specialzation such as > foo::bar. CLASSTYPE_USE_TEMPLATE != 0 tells you whether a > RECORD_TYPE is actually a template Doug, Thank you for your response and sorry for the delay. Unfortunately CLASSTYPE_USE_TEMPLATE does not seem to have this property when the non-template is an inner class of a template. For example, the record_type t pertaining to a class outer::inner_noargs : (gdb) pt no-binfo use_template=1 interface-unknown chain > (gdb) print t->type.lang_specific->u.c.use_template $4 = 1 Thanks, -- Peter signature.asc Description: Digital signature
Re: [whopr] plugin interface design
Chris Lattner <[EMAIL PROTECTED]> writes: > On Jun 5, 2008, at 6:51 AM, Ian Lance Taylor wrote: > >> Chris Lattner <[EMAIL PROTECTED]> writes: >> >>> I don't know how closely your plans follow this model. If you think >>> this approach is reasonable, you really do need to reflect things >>> like >>> symbol versions in your IR somehow. This compiler must know about >>> versions, and when it does, it is easy to avoid optimizations that >>> are >>> invalid for them. >> >> Sure. But here's the thing: the gcc LTO approach involves having a >> regular object with a regular symbol table, and the IR is embedded in >> the object. In other words, we do know the symbol version >> information: it's in the symbol table of the object. > > Wow, that seems incredibly limiting. This means that your LTO either > has to: > > 1) treat the object header as part of the IR, or > 2) avoid making any changes that would affect exported symbols > > Is that right? Why doesn't the "LTO reader" just read the symbol info > from the ELF header and reflect it into the trees somehow? That would be fine. It would require teaching the compiler about symbol versioning and resolution rules which the linker already knows. I sort of think that is unnecessary. But I'm not opposed to it. Of course there is the issue that some of this information also comes from linker command line options. That also has to be fed into the IR. For example, earlier Nick suggested that LLVM will not inline a weak symbol. With ELF it is actually OK to inline a weak symbol when generating an executable. It is not OK when generating a shared library, unless -Bsymbolic was used on the linker command line. We could represent these sorts of details directly in the compiler IR. But I don't see a big advantage to doing so. I'm proposing, instead, that the linker inform the compiler plugin about this information based on link-time information. That is a way of representing it in the IR, of course. But it seems to me to be somewhat more pragmatic. Incidentally, your choice 2 above doesn't follow. The LTO compiler is going to pass a new object file(s) back to the linker. It doesn't have to have the same set of exported symbols, except in cases where the linker has directed that some symbol must be available. Ian
Re: [whopr] Design/implementation alternatives for the driver and WPA
Chris Lattner <[EMAIL PROTECTED]> writes: > LLVM LTO handles this by marking symbols "internal" (aka static, aka > not TREE_PUBLIC, whatever) when the symbol is not visible outside the > LTO scope. This allows the optimizers to go crazy and hack away at > the symbols, but only when safe. How does the linker do this? Are you saying that when generating a shared library, the linker calls lto_codegen_add_must_preserve_symbol for every externally visible symbol? How does the linker tell LTO that a symbol may be inlined, but must also be externally visible? Ian
Re: Development process for i386 machine descriptions
Hello! 1.) The processor_costs structure seems very limited, but seem very easily to "fill in" but are these costs supposed to be best or worst case? For instance, many instructions with different sized operands vary in latency. Instruction costs are further refined in config/i386.c, ix86_rtx_costs and the cost for various operand types is determined in several *_cost functions, scattered around i386.c file. 2.) I don't understand the meaning of the stringop_algs, scalar, vector, and branching costs at the end of the processor_cost structure. Could someone give me an accurate description? stringop_algs is a structure that defines various algorithms for string processing functions (memcpy, memset, ...). This structure also defines size thresholds for various algorithms. The costs at the end of a cost structure are used in autovectorization decisions, when -fvect-cost-model is in effect (please look at the ehd of i386.h where these values are used). 3.) The processor I am currently attempting to model is single-issue/in-order with a simple pipeline. Stalls can occasionally occur in the fetch/decode/translate, but the core is the latency of instructions in the functional units in the execute stage. What recommendations can anyone make to me for designing the DFA? Should it just directly model the functional units latencies for certain insn types? Hm, perhaps you should look into {athlon, geode, k6, pentium, ppro}.md files first. All these files define scheduling for various processors. I'm sure that quite some ideas can be harvested there. Uros.
extend gthr-posix.h with rwlock
We have code that fails to scale do to the object_mutex lock in unwind-dw2-fde.c. This mutex protects two lists local to the file. The primary list is used in "read-mostly" mode, with the secondary list used rarely when writing needs to happen. I am trying to change this locking scheme to use a reader/writer lock (I'd prefer something even more scalable, like an RCU style algorithm, or seqlock + partially visible reader count, but I don't have time at the moment to do anything like that). I've set up forwarding to pthread_rwlock_t and the corresponding functions in gthr-posix.h, just following the template of how pthread_mutex_t is linked in. My problem is that unwind-dw2-fde.c seems to be compiled multiple times during a gcc build, and sometimes my additions are found but other times they are not. I am rebuilding again (AIX 5.1), and I'll post more information for anyone that needs it. In the meantime, is there a how-to anywhere that describes adding or modifying gthr.h models in gcc? Thanks, Luke
Re: extend gthr-posix.h with rwlock
> Luke Dalessandro writes: Luke> My problem is that unwind-dw2-fde.c seems to be compiled multiple times during Luke> a gcc build, and sometimes my additions are found but other times they are Luke> not. I am rebuilding again (AIX 5.1), and I'll post more information for Luke> anyone that needs it. Luke> In the meantime, is there a how-to anywhere that describes adding or modifying Luke> gthr.h models in gcc? AIX multilibs pthread support. Unlike Linux, AIX does not provide weak versions of the pthread symbols when operating in single-threaded mode. AIX uses gthr-aix.h, which includes gthr-posix.h or gthr-single.h depending on the -pthread option. David
Re: [whopr] Design/implementation alternatives for the driver and WPA
On Jun 5, 2008, at 10:43 AM, Ian Lance Taylor wrote: Chris Lattner <[EMAIL PROTECTED]> writes: LLVM LTO handles this by marking symbols "internal" (aka static, aka not TREE_PUBLIC, whatever) when the symbol is not visible outside the LTO scope. This allows the optimizers to go crazy and hack away at the symbols, but only when safe. How does the linker do this? Are you saying that when generating a shared library, the linker calls lto_codegen_add_must_preserve_symbol for every externally visible symbol? Yes. How does the linker tell LTO that a symbol may be inlined, but must also be externally visible? The linker just tells LTO which symbols must remain. The LTO engine is free to inline anything that would improve codegen, with the exception that any weak definition that must remain (preserved) cannot be inlined. -Nick
Question about modifying gcc
Could you please direct me to someone who would be willing and able to answer a few questions about some of the internal workings of the gcc compiler. I am attempting to modify the compiler to instrument function calls and returns. The end result that i am trying to achieve is to send the address of every called function to a memory mapped file prior to the call and after the call send an immediate value to that same file. The target architecture is x86. Here is an example in pseudo assembly of what i want to accomplish. regular modified instruction instruction instruction instruction mov $function-name, (eax) call function-name call function-name move $0x1000, (eax) instruction instruction instruction instruction where eax is the address of the memory mapped file. The purpose of this is to collect information about calls and returns in order to build call graphs and operating tendencies of software systems. So far i have had little success. I have been trying to change the machine description as well as the target description macros and function in order to get the desired functionality. I have been able to insert instruction into the compiled code, via output_asm_insn(), but not in the correct place. Is there someone who would be able to help me with my problem. Thank you Dale Reese
Re: Question about modifying gcc
On Thu, Jun 05, 2008 at 12:55:17PM -0700, [EMAIL PROTECTED] wrote: > I am attempting to modify the compiler to instrument function calls and > returns. The end result that i am trying to achieve is to send the > address of every called function to a memory mapped file prior to the > call and after the call send an immediate value to that same file. The > target architecture is x86. You should be able to achieve what you want without modifying the compiler. Check the manual for the the -finstrument-functions option. There's also the existing coverage support: compile with -ftest-coverage -fprofile-arcs, then run gcov.
Re: [whopr] Design/implementation alternatives for the driver and WPA
Nick Kledzik <[EMAIL PROTECTED]> writes: >> How does the linker tell LTO that a symbol may be inlined, but must >> also be externally visible? > The linker just tells LTO which symbols must remain. The LTO engine > is free to inline anything that would improve codegen, with the > exception > that any weak definition that must remain (preserved) cannot be inlined. I'll just note that that isn't optimal for ELF when producing an executable. Ian
Re: How to build on AMD64/Debian under x86 32bits chroot?
Basile STARYNKEVITCH writes: > Hello All > > As (I imagine) many developers I have a 64 bits machine - running Debian > (Sid) Linux AMD64. > > I want to test my MELT branch on x86 (32 bits). So I set up (using > debootstrap) a x86 32 bits Debian/Lenny chroot-ed system (in /debian32) > which has most of the *-dev packages installed. > > In this chroot-ed environment I am able to compile several software > without issues. For example, I just compiled there the PPL. > > The point is that even after schroot the uname system call (& the uname > command) still return x86_64 as the machine. I suppose there is no easy > trick to circumvent this. make sure that 'personality=linux32' is set for this chroot in /etc/schroot/schroot.conf (or as suggested prefix the schroot command with 'linux32' every time you enter the chroot). Matthias
Re: [whopr] Design/implementation alternatives for the driver and WPA
On Jun 5, 2008, at 2:03 PM, Ian Lance Taylor wrote: Nick Kledzik <[EMAIL PROTECTED]> writes: How does the linker tell LTO that a symbol may be inlined, but must also be externally visible? The linker just tells LTO which symbols must remain. The LTO engine is free to inline anything that would improve codegen, with the exception that any weak definition that must remain (preserved) cannot be inlined. I'll just note that that isn't optimal for ELF when producing an executable. Why? Because you have to touch (worst case) every symbol? The cost of doing LTO *dramatically* dwarfs the cost of touching symbols once. :) You're right this could be improved, and we're actively working on it... but it seems like a strange thing to worry about vs correctness in all cases. -Chris
Re: extend gthr-posix.h with rwlock
David Edelsohn wrote: Luke Dalessandro writes: Luke> My problem is that unwind-dw2-fde.c seems to be compiled multiple times during Luke> a gcc build, and sometimes my additions are found but other times they are Luke> not. I am rebuilding again (AIX 5.1), and I'll post more information for Luke> anyone that needs it. Luke> In the meantime, is there a how-to anywhere that describes adding or modifying Luke> gthr.h models in gcc? AIX multilibs pthread support. Unlike Linux, AIX does not provide weak versions of the pthread symbols when operating in single-threaded mode. AIX uses gthr-aix.h, which includes gthr-posix.h or gthr-single.h depending on the -pthread option. Thank you, this was indeed the problem. I added the needed stubbs in gthr-single.h and it now compiles fine. Unfortunately there seems to be something wrong with my installation of ld as linking fails with a large number of errors of the form: ld: 0711-252 SEVERE ERROR: File auxiliary symbol entry 1 in object _negdi2_s.o: Field x_offset contains 4. Valid values are between 4 and -1. The object name is being substituted. Unfortunately I have almost no experience with AIX. I'll look for a prebuilt ld that seems newer than mine to see if this helps the problem. Thank you for your help. Luke
gcc-4.3-20080605 is now available
Snapshot gcc-4.3-20080605 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.3-20080605/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.3 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_3-branch revision 136415 You'll find: gcc-4.3-20080605.tar.bz2 Complete GCC (includes all of below) gcc-core-4.3-20080605.tar.bz2 C front end and core compiler gcc-ada-4.3-20080605.tar.bz2 Ada front end and runtime gcc-fortran-4.3-20080605.tar.bz2 Fortran front end and runtime gcc-g++-4.3-20080605.tar.bz2 C++ front end and runtime gcc-java-4.3-20080605.tar.bz2 Java front end and runtime gcc-objc-4.3-20080605.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.3-20080605.tar.bz2The GCC testsuite Diffs from 4.3-20080529 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.3 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: extend gthr-posix.h with rwlock
> Luke Dalessandro writes: Luke> Thank you, this was indeed the problem. I added the needed stubbs in Luke> gthr-single.h and it now compiles fine. Unfortunately there seems to be Luke> something wrong with my installation of ld as linking fails with a large Luke> number of errors of the form: Luke> ld: 0711-252 SEVERE ERROR: File auxiliary symbol entry 1 in object _negdi2_s.o: Luke> Field x_offset contains 4. Valid values are between 4 and -1. Luke> The object name is being substituted. Luke> Unfortunately I have almost no experience with AIX. I'll look for a prebuilt Luke> ld that seems newer than mine to see if this helps the problem. Pre-built ld? AIX ships with ld. Are you using GNU Binutils (gas, GNU ld, etc.) on AIX? Please use the native AIX tools (as, ld, nm, etc.) with AIX as mentioned in the platform-specific installation notes: http://gcc.gnu.org/install/specific.html#x-ibm-aix David
Re: [lto] Streaming out language-specific DECL/TYPEs
On Thu, Jun 5, 2008 at 5:57 AM, Jan Hubicka <[EMAIL PROTECTED]> wrote: >> Jan Hubicka wrote: >> >> >Sure if it works, we should be lowering the types during gimplification >> >so we don't need to store all this in memory... >> >But C++ FE still use its local data later in stuff like thunks, but we >> >will need to cgraphize them anyway. >> >> I agree. The only use of language-specific DECLs and TYPEs after >> gimplification should be for generating debug information. And if >> that's already been done, then you shouldn't need it at all. > > For LTO with debug info we will probably need some frontend neutral > debug info representaiton in longer run, since optimization modifying > the data types and such will need to compensate. > > We can translate stuff to in-memory dwarf and update it but that would > limit amount of debug info format we will want to support probably. DWARF is not exactly memory or space efficient, sadly. Then again, what most other compilers have done is bite the bullet and define their own "debug info" data, then transform that to dwarf2 at the very end. I"m not sure we want to do that either :(
Re: extend gthr-posix.h with rwlock
David Edelsohn wrote: Luke Dalessandro writes: Luke> Thank you, this was indeed the problem. I added the needed stubbs in Luke> gthr-single.h and it now compiles fine. Unfortunately there seems to be Luke> something wrong with my installation of ld as linking fails with a large Luke> number of errors of the form: Luke> ld: 0711-252 SEVERE ERROR: File auxiliary symbol entry 1 in object _negdi2_s.o: Luke> Field x_offset contains 4. Valid values are between 4 and -1. Luke> The object name is being substituted. Luke> Unfortunately I have almost no experience with AIX. I'll look for a prebuilt Luke> ld that seems newer than mine to see if this helps the problem. Pre-built ld? AIX ships with ld. Are you using GNU Binutils (gas, GNU ld, etc.) on AIX? Please use the native AIX tools (as, ld, nm, etc.) with AIX as mentioned in the platform-specific installation notes: No, I'm sorry I wasn't clear. I am using all of the AIX tools, not Binutils. I just assumed that there was something out-of-date with the ld that came with our AIX 5.1 installation. http://gcc.gnu.org/install/specific.html#x-ibm-aix I have seen this page before, and I'm not sure that it helps me. I'm running into the same behavior posted at http://gcc.gnu.org/ml/gcc-bugs/2005-04/msg03175.html, where the advice is also to look at this page, but there doesn't seem to be a reply from the original poster. Thanks, Luke
A request for md5 hashs to be published
A small request. Can the md5 sum hash for the various release files be published at the main GCC release pages ? If we look at http://gcc.gnu.org/gcc-4.2/ there is no md5 sum there and while I can find that data at a mirror thus : ftp://ftp.mirrorservice.org/sites/sources.redhat.com/pub/gcc/releases/gcc-4.2.4/md5.sum .. there is no statement of the authenticity of that source file. I can confim that the md5sum from *that* specific mirror is correct but that does not convince me that I have a valid tar file : vesta:/mnt/lfs/sources/tarballs# md5sum gcc-4.2.4.tar.bz2 d79f553e7916ea21c556329eacfeaa16 gcc-4.2.4.tar.bz2 The truth is, I can uncompress that tar file and then recompress it and get a different md5sum for the exact same input file. That would also be a valid md5 hash but only for my personal internal mirror. Really, there should be, in my opinion, a single master page with the md5sum of the uncompressed tar ball and then the average user can confirm that it is correct from the master signature page. Dennis
Re: [whopr] Design/implementation alternatives for the driver and WPA
Chris Lattner <[EMAIL PROTECTED]> writes: > On Jun 5, 2008, at 2:03 PM, Ian Lance Taylor wrote: > >> Nick Kledzik <[EMAIL PROTECTED]> writes: >> How does the linker tell LTO that a symbol may be inlined, but must also be externally visible? >>> The linker just tells LTO which symbols must remain. The LTO engine >>> is free to inline anything that would improve codegen, with the >>> exception >>> that any weak definition that must remain (preserved) cannot be >>> inlined. >> >> I'll just note that that isn't optimal for ELF when producing an >> executable. > > Why? Because you have to touch (worst case) every symbol? The cost of > doing LTO *dramatically* dwarfs the cost of touching symbols once. > :) You're right this could be improved, and we're actively working > on it... but it seems like a strange thing to worry about vs > correctness in all cases. Whoops, sorry, I meant the other thing. Not inlining any weak definition that must remain is not optimal. When linking an executable, it is perfectly OK to inline a weak function, even if the weak symbol is required to remain in the final output file. In general if the symbol is known to be bound locally, then it is OK to inline it. This is separate from the question of whether the symbol is visible externally. Ian
Re: A request for md5 hashs to be published
On Fri, Jun 06, 2008 at 01:03:19AM +, Dennis Clarke wrote: > Can the md5 sum hash for the various release files be published at the > main GCC release pages ? > If we look at http://gcc.gnu.org/gcc-4.2/ there is no md5 sum there > and while I can find that data at a mirror thus : > > ftp://ftp.mirrorservice.org/sites/sources.redhat.com/pub/gcc/releases/gcc-4.2.4/md5.sum > > .. there is no statement of the authenticity of that source file. The versions on ftp.gnu.org are accompanied by digital signatures, which should give stronger assurance than just an md5 sum.