Re: #pragma support to guide autovectorizer
> I was wondering if any addition work had been completed toward pragma > support for the autovectorization branch (see > http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01560.html)? > I think Devang was planning to continue this work - I'm not sure where it stands dorit > Thanks.. > > Chad Rosier >
Re: does the instruction combiner regards (foo & 0xff) as a special case?
You are cool, now I found a (set (reg:CC_Z 33 cc) (compare:CC_Z (zero_extend:SI (subreg:QI (reg/v:SI 166 [ a ]) 0)) (const_int 0 [0x0]))) It's what I'm looking for. Thank you so much.
Re: Large, modular C++ application performance ...
On Mon, 2005-08-01 at 14:18 +0200, Steven Bosscher wrote: > On Monday 01 August 2005 11:44, michael meeks wrote: > > However - the log(s) term is rather irrelevant to my argument :-) > > Not really. Maybe the oprofile results for the linker show that the > behavior is worse, or maybe better - who knows :-) > Have you looked at any profiles btw? Just for the curious... Yes - identifying the linker and relocation processing as the root cause of the problem isn't just a stab in the dark :-) This flgas up as the no.1 (individual) performance killer with whatever profiling tools you use eg.: * vtune * speedprof * instrumenting top/tail of dlopen calls etc. :-) Regards, Michael. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot
Re: Large, modular C++ application performance ...
Hi H.J., On Mon, 2005-08-01 at 08:55 -0700, H. J. Lu wrote: > > -fvisibility is helpful - as the paper says, not as helpful as the old > > -Bsymbolic (or link maps exposing only 3 or so functions) were. However > > - -fvisibility can only help so much - if you have: > > Since you were comparing Windows vs. ELF, doesn't Windows need a file > to define which symbols to export for a shared library ? Apparently so - here is my (fragementary) understanding of that - Martin - please do correct me. OO.o builds the .defs on Win32 with a custom tool called 'ldump4'. That (interestingly) goes groping in some binary file format, reads the symbol table, groks symbols tagged with 'EXPORT:', and builds a .def file. ie. it *looks* like it's automated, and can uses the API marked (__dllexport etc.) where appropriate. > Why can't you you do it with ELF using a linker map? Libstdc++.so is > built with a linker map. Any C++ shared library should use one if the > startup time is a big concern. Of coursee, if gcc can generate a list > of symbols suitable for linker map, which needs to be exported, it will > be very helpful. I don't think it will be too hard to implement. So - the thing about linker maps (cf. the ldump4 tool) is that they tend to be hard to maintain, not portable across platforms, a source of grief and problems etc. ;-) [ we have several strata of old, now defunct link maps lying around from previous investments of effort that subsequently became useless ]. As I recall, I saw a suggestion (from you I think), for a new visibility attribute 'export' or somesuch, that would resolve names internally to the library, while still exporting the symbols. That would suit our needs beautifully - if, when used to annotate a class, it would allow the various typeinfo / vague-linkage pieces through as 'default'. Is it a realistic suggestion ? / if so, am happy to knock up a patch. [ and of course, this is only 1/2 the problem - the other half isn't much helped by visibility markup as previously discussed ;-] Thanks, Michael. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot
Re: More fun with aliasing - removing assignments?
On Mon, Aug 01, 2005 at 10:12:37PM -0700, Ian Lance Taylor wrote: > Harald van D??k <[EMAIL PROTECTED]> writes: > > > I finally managed to track down the problem I've been having to this > > short code: > > > > typedef struct { > > unsigned car; > > unsigned cdr; > > } cons; > > > > void nconc (unsigned x, unsigned y) { > > unsigned *ptr = &x; > > while(!(*ptr & 3)) > > ptr = &((cons *)(*ptr))->cdr; > > *ptr = y; > > } > > > > With gcc 4.0-20050728 on i686-pc-linux-gnu, compiling this with -O2 > > appears to remove the assignment to *ptr. (I didn't prepare an example > > program, but it's verifiable with objdump.) Obviously, this code is > > non-portable, but still, I don't see why this can happen. Would anyone > > be kind enough to explain this to me? It works as expected with -O2 > > -fno-strict-aliasing. > > Well, I'd say it's a bug. It works in 4.1. The final assignment gets > removed by tree-ssa-dce.c because it looks like a useless store. This > is because alias analysis thinks it knows what is going on, when it > clearly does not. > Are you sure? I am not a language lawyer, but my understanding is that you cannot legally make pointer 'p' point outside of 'x' using pointer arithmetic. Since 'x' is a PARM_DECL passed by value, the last assignment is a dead store. In this case, 'ptr' should be marked as pointing anywhere. However, alias analysis could also conclude that 'ptr' may not point outside the current local frame. So, the last store would still be marked dead. This distinction of different meanings for "points anywhere" will be a feature of 4.2, most likely. Having said that, I sent rth a 4.0 patch for a similar bug that will "fix" this problem. Richard, have you applied it yet? * tree-ssa-alias.c (add_pointed_to_var): If VALUE is of the form &(*PTR), take points-to information from PTR. Index: tree-ssa-alias.c === RCS file: /cvs/gcc/gcc/gcc/tree-ssa-alias.c,v retrieving revision 2.71.2.1 diff -d -u -p -r2.71.2.1 tree-ssa-alias.c --- tree-ssa-alias.c26 Feb 2005 16:24:27 - 2.71.2.1 +++ tree-ssa-alias.c21 Jul 2005 20:13:44 - @@ -1904,7 +1904,11 @@ add_pointed_to_var (struct alias_info *a if (REFERENCE_CLASS_P (pt_var)) pt_var = get_base_address (pt_var); - if (pt_var && SSA_VAR_P (pt_var)) + if (pt_var == NULL) +{ + pi->pt_anything = 1; +} + else if (SSA_VAR_P (pt_var)) { uid = var_ann (pt_var)->uid; bitmap_set_bit (ai->addresses_needed, uid); @@ -1918,6 +1922,18 @@ add_pointed_to_var (struct alias_info *a if (is_global_var (pt_var)) pi->pt_global_mem = 1; } + else if (TREE_CODE (pt_var) == INDIRECT_REF + && TREE_CODE (TREE_OPERAND (pt_var, 0)) == SSA_NAME) +{ + /* If VALUE is of the form &(*P_j), then PTR will have the same +points-to information as P_j. */ + add_pointed_to_expr (ai, ptr, TREE_OPERAND (pt_var, 0)); +} + else +{ + /* Give up. PTR points anywhere. */ + set_pt_anything (ptr); +} }
Re: More fun with aliasing - removing assignments?
On 8/2/05, Diego Novillo <[EMAIL PROTECTED]> wrote: > On Mon, Aug 01, 2005 at 10:12:37PM -0700, Ian Lance Taylor wrote: > > Harald van D??k <[EMAIL PROTECTED]> writes: > > > > > I finally managed to track down the problem I've been having to this > > > short code: > > > > > > typedef struct { > > > unsigned car; > > > unsigned cdr; > > > } cons; > > > > > > void nconc (unsigned x, unsigned y) { > > > unsigned *ptr = &x; > > > while(!(*ptr & 3)) > > > ptr = &((cons *)(*ptr))->cdr; > > > *ptr = y; > > > } > > > > > > With gcc 4.0-20050728 on i686-pc-linux-gnu, compiling this with -O2 > > > appears to remove the assignment to *ptr. (I didn't prepare an example > > > program, but it's verifiable with objdump.) Obviously, this code is > > > non-portable, but still, I don't see why this can happen. Would anyone > > > be kind enough to explain this to me? It works as expected with -O2 > > > -fno-strict-aliasing. > > > > Well, I'd say it's a bug. It works in 4.1. The final assignment gets > > removed by tree-ssa-dce.c because it looks like a useless store. This > > is because alias analysis thinks it knows what is going on, when it > > clearly does not. > > > Are you sure? I am not a language lawyer, but my understanding > is that you cannot legally make pointer 'p' point outside of > 'x' using pointer arithmetic. Since 'x' is a PARM_DECL passed by > value, the last assignment is a dead store. p is not made to point 'outside' of x, but x is treated as a pointer, cast to a struct pointer and then dereferenced. Only if the loop entry condition is false we end up storing into x (but only to x, not to memory beyond x), and this store is of course dead. Richard.
Re: More fun with aliasing - removing assignments?
On 8/2/05, Richard Guenther <[EMAIL PROTECTED]> wrote: > On 8/2/05, Diego Novillo <[EMAIL PROTECTED]> wrote: > > On Mon, Aug 01, 2005 at 10:12:37PM -0700, Ian Lance Taylor wrote: > > > Harald van D??k <[EMAIL PROTECTED]> writes: > > > > > > > I finally managed to track down the problem I've been having to this > > > > short code: > > > > > > > > typedef struct { > > > > unsigned car; > > > > unsigned cdr; > > > > } cons; > > > > > > > > void nconc (unsigned x, unsigned y) { > > > > unsigned *ptr = &x; > > > > while(!(*ptr & 3)) > > > > ptr = &((cons *)(*ptr))->cdr; > > > > *ptr = y; > > > > } > > > > > > > > With gcc 4.0-20050728 on i686-pc-linux-gnu, compiling this with -O2 > > > > appears to remove the assignment to *ptr. (I didn't prepare an example > > > > program, but it's verifiable with objdump.) Obviously, this code is > > > > non-portable, but still, I don't see why this can happen. Would anyone > > > > be kind enough to explain this to me? It works as expected with -O2 > > > > -fno-strict-aliasing. > > > > > > Well, I'd say it's a bug. It works in 4.1. The final assignment gets > > > removed by tree-ssa-dce.c because it looks like a useless store. This > > > is because alias analysis thinks it knows what is going on, when it > > > clearly does not. > > > > > Are you sure? I am not a language lawyer, but my understanding > > is that you cannot legally make pointer 'p' point outside of > > 'x' using pointer arithmetic. Since 'x' is a PARM_DECL passed by > > value, the last assignment is a dead store. > > p is not made to point 'outside' of x, but x is treated as a pointer, cast > to a struct pointer and then dereferenced. Only if the loop entry condition > is false we end up storing into x (but only to x, not to memory beyond x), > and this store is of course dead. Oh, and a workaround and slight correction would be to write void nconc (unsigned x, unsigned y) { unsigned *ptr = &((cons *)x)->cdr; while(!(*ptr & 3)) ptr = &((cons *)(*ptr))->cdr; *ptr = y; } which makes aliasing see that the store is not dead and in fact it never will be to the argument area. Richard.
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 02:56:50PM +0200, Richard Guenther wrote: > Oh, and a workaround and slight correction would be to write > > void nconc (unsigned x, unsigned y) { > unsigned *ptr = &((cons *)x)->cdr; > while(!(*ptr & 3)) > ptr = &((cons *)(*ptr))->cdr; > *ptr = y; > } > No. Same problem. The aliaser would say "yes, ptr points anywhere, but it cannot escape the local frame". The final store is dead just the same. We only "get it right" because we do not distinguish between different degrees of points-anywhere.
bug in gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)
Hello. I'm not on the list, so please CC me with any replies. I have come across a bug found during some code which serializes doubles. The bug is only encountered when the optimization level is set to -O2 or greater. The bug is not encountered when compiled under gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7) at any optimization level. The de-serialization code is: typedef unsigned char uint8; static uint8 next_byte(uint &offset, std::vector const &bytecode) throw (std::invalid_argument) { if (offset >= bytecode.size()) throw (std::invalid_argument("Unexpected end of bytecode")); return bytecode[offset++]; } double parse_double(uint &offset, std::vector const &bytecode) throw (std::invalid_argument) { typedef unsigned long long uint64; uint64 rtn = uint64(next_byte(offset, bytecode)) << 56; rtn |= uint64(next_byte(offset, bytecode)) << 48; rtn |= uint64(next_byte(offset, bytecode)) << 40; rtn |= uint64(next_byte(offset, bytecode)) << 32; rtn |= uint64(next_byte(offset, bytecode)) << 24; rtn |= uint64(next_byte(offset, bytecode)) << 16; rtn |= uint64(next_byte(offset, bytecode)) << 8; rtn |= uint64(next_byte(offset, bytecode)); return *reinterpret_cast(&rtn); } Full source code to a demonstration of the bug, and a Makefile is at http://mjfrazer.org/~mjfrazer/tmp/pack-test/ The tar file in the directory contains all the other files, so you just need to grab that. cheers -mark
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 09:08:51AM -0400, Diego Novillo wrote: > On Tue, Aug 02, 2005 at 02:56:50PM +0200, Richard Guenther wrote: > > Oh, and a workaround and slight correction would be to write > > > > void nconc (unsigned x, unsigned y) { > > unsigned *ptr = &((cons *)x)->cdr; > > while(!(*ptr & 3)) > > ptr = &((cons *)(*ptr))->cdr; > > *ptr = y; > > } > > > No. Same problem. The aliaser would say "yes, ptr points > anywhere, but it cannot escape the local frame". The final store > is dead just the same. > > We only "get it right" because we do not distinguish between > different degrees of points-anywhere. Then the alias analyzer's broken. This isn't pointer arithmetic in the sense that you mean. It would be if the line were: ptr = &((cons *)(ptr))->cdr; which is equivalent to some offset plus ptr. But there's an extra dereference: ptr = &((cons *)(*ptr))->cdr; ^ As far as I can tell, this code doesn't actually violate any of the aliasing rules. It just looks funny. -- Daniel Jacobowitz CodeSourcery, LLC
Re: bug in gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)
On 8/2/05, Mark Frazer <[EMAIL PROTECTED]> wrote: > Hello. I'm not on the list, so please CC me with any replies. > > I have come across a bug found during some code which serializes > doubles. The bug is only encountered when the optimization level is set > to -O2 or greater. > > The bug is not encountered when compiled under > gcc (GCC) 3.3.3 20040412 (Red Hat Linux 3.3.3-7) > at any optimization level. > > The de-serialization code is: > > typedef unsigned char uint8; > static uint8 next_byte(uint &offset, std::vector const &bytecode) >throw (std::invalid_argument) > { >if (offset >= bytecode.size()) > throw (std::invalid_argument("Unexpected end of bytecode")); >return bytecode[offset++]; > } > > double parse_double(uint &offset, std::vector const &bytecode) >throw (std::invalid_argument) > { >typedef unsigned long long uint64; >uint64 rtn = uint64(next_byte(offset, bytecode)) << 56; >rtn |= uint64(next_byte(offset, bytecode)) << 48; >rtn |= uint64(next_byte(offset, bytecode)) << 40; >rtn |= uint64(next_byte(offset, bytecode)) << 32; >rtn |= uint64(next_byte(offset, bytecode)) << 24; >rtn |= uint64(next_byte(offset, bytecode)) << 16; >rtn |= uint64(next_byte(offset, bytecode)) << 8; >rtn |= uint64(next_byte(offset, bytecode)); >return *reinterpret_cast(&rtn); > } > > Full source code to a demonstration of the bug, and a Makefile is at > http://mjfrazer.org/~mjfrazer/tmp/pack-test/ > The tar file in the directory contains all the other files, so you just > need to grab that. Try -fno-strict-aliasing. This may be related to PR23192. Richard.
Re: bug in gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)
Richard Guenther <[EMAIL PROTECTED]> [05/08/02 09:29]: > Try -fno-strict-aliasing. This may be related to PR23192. -fno-strict-aliasing does indeed make the problem go away. thanks! -mark -- Forget your stupid theme park! I'm gonna make my own! With hookers! And blackjack! In fact, forget the theme park! - Bender
Re: bug in gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)
Mark Frazer <[EMAIL PROTECTED]> [05/08/02 09:18]: > Hello. I'm not on the list, so please CC me with any replies. > > I have come across a bug found during some code which serializes > doubles. The bug is only encountered when the optimization level is set > to -O2 or greater. Oh, I forgot to mention that I'm running Fedora Core 4 on ia32. [EMAIL PROTECTED] pack-test]$ gcc --version gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5) Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [EMAIL PROTECTED] pack-test]$ uname -a Linux pacific.mjfrazer.org 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 athlon i386 GNU/Linux -mark -- Forget your stupid theme park! I'm gonna make my own! With hookers! And blackjack! In fact, forget the theme park! - Bender
Re: Large, modular C++ application performance ...
On Tue, Aug 02, 2005 at 10:59:01AM +0100, michael meeks wrote: > Hi H.J., > > > Why can't you you do it with ELF using a linker map? Libstdc++.so is > > built with a linker map. Any C++ shared library should use one if the > > startup time is a big concern. Of coursee, if gcc can generate a list > > of symbols suitable for linker map, which needs to be exported, it will > > be very helpful. I don't think it will be too hard to implement. > > So - the thing about linker maps (cf. the ldump4 tool) is that they > tend to be hard to maintain, not portable across platforms, a source of > grief and problems etc. ;-) [ we have several strata of old, now defunct > link maps lying around from previous investments of effort that > subsequently became useless ]. Maitaining a C++ linker map isn't easy. I think gcc should help out here. > > As I recall, I saw a suggestion (from you I think), for a new > visibility attribute 'export' or somesuch, that would resolve names > internally to the library, while still exporting the symbols. I sugggested the "export" visibility to export a symbol from an executable, even if it wasn't used by any DSOs. > > That would suit our needs beautifully - if, when used to annotate a > class, it would allow the various typeinfo / vague-linkage pieces > through as 'default'. Is it a realistic suggestion ? / if so, am happy > to knock up a patch. > > [ and of course, this is only 1/2 the problem - the other half isn't > much helped by visibility markup as previously discussed ;-] > Why not? If you know a symbol in DSO won't be overridden by others, you can resolve it locally via a linker map. H.J.
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 09:39:56AM -0400, Daniel Jacobowitz wrote: > Then the alias analyzer's broken. > Broken? I'm saying that we currently get this right. I don't know what position are you arguing. > This isn't pointer arithmetic in the sense that you mean. It > would be if the line were: > > ptr = &((cons *)(ptr))->cdr; > Yes, I realize this now. And that is not my point. > which is equivalent to some offset plus ptr. But there's an extra > dereference: > > ptr = &((cons *)(*ptr))->cdr; > ^ > This code does builds an address location out of an arbitrary integer: unsigned int D.1142_8 = *ptr_1; struct cons *D.1143_9 = (struct cons *) D.1142_8; ptr_10 = &D.1143_9->cdr; Does the language allow the creation of address locations out of arbitrary integer values? Is the dereference of such an address a defined operation? If so, then it's simply a matter of recognizing this situation when computing points-anywhere attributes.
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 09:57:39AM -0400, Diego Novillo wrote: > On Tue, Aug 02, 2005 at 09:39:56AM -0400, Daniel Jacobowitz wrote: > > > Then the alias analyzer's broken. > > > Broken? I'm saying that we currently get this right. I don't > know what position are you arguing. Sorry, my mistake. I'd forgotten that Ian said we got this right in 4.1. > This code does builds an address location out of an arbitrary integer: > > unsigned int D.1142_8 = *ptr_1; > struct cons *D.1143_9 = (struct cons *) D.1142_8; > ptr_10 = &D.1143_9->cdr; > > Does the language allow the creation of address locations out of > arbitrary integer values? Is the dereference of such an > address a defined operation? If so, then it's simply a matter of > recognizing this situation when computing points-anywhere > attributes. Yes, it does - well, it's implementation defined, but GCC has long chosen the natural interpretation. C99 6.3.2.3, paragraph 5. This is no different from that classic example, a pointer which escapes via printf/scanf. -- Daniel Jacobowitz CodeSourcery, LLC
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 10:05:37AM -0400, Daniel Jacobowitz wrote: > Yes, it does - well, it's implementation defined, but GCC has long > chosen the natural interpretation. C99 6.3.2.3, paragraph 5. This is > no different from that classic example, a pointer which escapes via > printf/scanf. > OK, thanks. That settles it then.
Re: More fun with aliasing - removing assignments?
Diego Novillo <[EMAIL PROTECTED]> writes: > Does the language allow the creation of address locations out of > arbitrary integer values? Yes. 6.3.2.3 Pointers 5 An integer may be converted to any pointer type. [...] > Is the dereference of such an address a defined operation? It is implemetation-defined. [...] Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation. Also, the integer may have been the result of casting a valid pointer, in which case the operation is fully defined (assuming the integer is wide enough). Andreas. -- Andreas Schwab, SuSE Labs, [EMAIL PROTECTED] SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: bug in gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5)
Mark Frazer <[EMAIL PROTECTED]> [05/08/02 09:32]: > Richard Guenther <[EMAIL PROTECTED]> [05/08/02 09:29]: > > Try -fno-strict-aliasing. This may be related to PR23192. > > -fno-strict-aliasing does indeed make the problem go away. changing the de-serialization function to: double parse_double(uint &offset, vector const &bytecode) throw (std::invalid_argument) { union { uint64 ival; double dval; } rtn; rtn.ival = uint64(next_byte(offset, bytecode)) << 56; rtn.ival |= uint64(next_byte(offset, bytecode)) << 48; rtn.ival |= uint64(next_byte(offset, bytecode)) << 40; rtn.ival |= uint64(next_byte(offset, bytecode)) << 32; rtn.ival |= uint64(next_byte(offset, bytecode)) << 24; rtn.ival |= uint64(next_byte(offset, bytecode)) << 16; rtn.ival |= uint64(next_byte(offset, bytecode)) << 8; rtn.ival |= uint64(next_byte(offset, bytecode)); return rtn.dval; } Allows for the strict-aliasing optimization to be left in. So, it seems the bug was mine, not gcc's. I'm off to search for other reinterpret_cast abuses in my code... cheers -mark -- To Captain Bender! He's the best! ...at being a big jerk who's stupid and his big ugly face is as dumb as a butt! - Fry
Re: Large, modular C++ application performance ...
On Tue, 2005-08-02 at 06:57 -0700, H. J. Lu wrote: > Maitaining a C++ linker map isn't easy. I think gcc should help out > here. What do you suggest ? - something separate from the visibility markup ? perhaps what I'm suggesting is some horribly mis-use of that. Clearly adding a new visibility attribute that would bind that symbol internally, yet export it would be a simple approach; did you have a better idea ? and/or suggestions for a name ? - or is this a total non-starter for some other reason ? > > That would suit our needs beautifully - if, when used to annotate a > > class, it would allow the various typeinfo / vague-linkage pieces > > through as 'default'. Is it a realistic suggestion ? / if so, am happy > > to knock up a patch. > > > > [ and of course, this is only 1/2 the problem - the other half isn't > > much helped by visibility markup as previously discussed ;-] > > Why not? If you know a symbol in DSO won't be overridden by others, > you can resolve it locally via a linker map. Sure - the other (more than) 1/2 of the performance problem comes from named relocations to symbols external to the DSO. Thanks, Michael. -- [EMAIL PROTECTED] <><, Pseudo Engineer, itinerant idiot
RE: rfa (x86): 387<=>sse moves
Hello All, I applied the recent patches to the 7/23 snapshot, and am still seeing some 387 to sse moves. In particular, in SpecFP's 177.mesa (matrix.c), I'm seeing fld1's feeding moves to sse registers. Compiled via: gcc -O3 -march=k8 -mfpmath=sse matrix.c Thanks. Tony -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dale Johannesen Sent: Monday, August 01, 2005 1:53 PM To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; gcc@gcc.gnu.org Subject: Re: rfa (x86): 387<=>sse moves On Jul 31, 2005, at 9:51 AM, Uros Bizjak wrote: > Hello! > >> With -march=pentium4 -mfpmath=sse -O2, we get an extra move for code >> like >> >>double d = atof(foo); >>int i = d; >> >> >>callatof >>fstpl -8(%ebp) >>movsd -8(%ebp), %xmm0 >>cvttsd2si %xmm0, %eax >> >> >> (This is Linux, Darwin is similar.) I think the difficulty is that for > > This problem is similar to the problem, described in PR target/19398. > There is another testcase and a small analysis in the PR that might > help with this problem. Thanks, that does seem relevant. The patches so far don't fix this case; I've commented the PR explaining why.
Re: #pragma support to guide autovectorizer
On Aug 2, 2005, at 12:10 AM, Dorit Naishlos wrote: I was wondering if any addition work had been completed toward pragma support for the autovectorization branch (see http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01560.html)? I think Devang was planning to continue this work - I'm not sure where it stands I made some progress and it is still on my list, but right now I'm fighting fire-fight on debugging issues. Developers are making lots of noise on this front, once they switch to gcc-4.0. This work has missed 4.1 train. - Devang
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 08:32:53AM -0400, Diego Novillo wrote: > Having said that, I sent rth a 4.0 patch for a similar bug that > will "fix" this problem. Richard, have you applied it yet? No, I forgot about it. r~
Re: More fun with aliasing - removing assignments?
On Tue, Aug 02, 2005 at 10:05:53AM -0700, Richard Henderson wrote: > On Tue, Aug 02, 2005 at 08:32:53AM -0400, Diego Novillo wrote: > > Having said that, I sent rth a 4.0 patch for a similar bug that > > will "fix" this problem. Richard, have you applied it yet? > > No, I forgot about it. > That's fine. Just applied it.
Re: More fun with aliasing - removing assignments?
Diego Novillo <[EMAIL PROTECTED]> writes: > On Tue, Aug 02, 2005 at 10:05:37AM -0400, Daniel Jacobowitz wrote: > > > Yes, it does - well, it's implementation defined, but GCC has long > > chosen the natural interpretation. C99 6.3.2.3, paragraph 5. This is > > no different from that classic example, a pointer which escapes via > > printf/scanf. > > > OK, thanks. That settles it then. Just to close out this thread for the record, Andrew Pinski opened PR 23912 for this problem, and Diego checked in a patch for the 4.0 branch. So all should be well in 4.0.2. Ian
memcpy to an unaligned address
In a typical Ethernet/IP ARP header the source IP address is unaligned. Instead of using... out->srcIPAddr = in->dstIPAddr; ... I used... memcpy(&out->srcIPAddr, &in->dstIPAddr, sizeof(uint32_t)); ... to account for the unaligned destination. This worked until gcc 4, which now generates a simple load/store. ldr r3, [r6, #24] addsr2, r4, #0 addsr2, #14 str r3, [r2, #0] A nice optimisation, but in this case it's incorrect. $r4 is aligned, and the result of adding #14 to $r4 is an unaligned pointer. Should gcc know better, or do I need to give it a little more information to help it out? Please cc me in your reply. Cheers, Shaun
RE: memcpy to an unaligned address
Original Message >From: Shaun Jackman >Sent: 02 August 2005 18:33 > In a typical Ethernet/IP ARP header the source IP address is > unaligned. Instead of using... > out->srcIPAddr = in->dstIPAddr; > ... I used... > memcpy(&out->srcIPAddr, &in->dstIPAddr, sizeof(uint32_t)); > ... to account for the unaligned destination. This worked until gcc 4, > which now generates a simple load/store. > ldr r3, [r6, #24] > addsr2, r4, #0 > addsr2, #14 > str r3, [r2, #0] > A nice optimisation, but in this case it's incorrect. $r4 is aligned, > and the result of adding #14 to $r4 is an unaligned pointer. > > Should gcc know better, or do I need to give it a little more > information to help it out? In order for anyone to answer your questions about the alignment of various types in a struct, don't you think you should perhaps have told us a little about what those types actually are and how the struct is laid out? [*] cheers, DaveK [*] - See debugging, psychic ;) -- Can't think of a witty .sigline today
Re: does the instruction combiner regards (foo & 0xff) as a special case?
[EMAIL PROTECTED] wrote: I guess the combiner generates something like a trucation pattern when special constant are detected. The combiner also takse a similiar action in pattern See the section of the documentation that talks about instruction canonicalization. http://gcc.gnu.org/onlinedocs/gccint/Insn-Canonicalizations.html See in particular the last bullet. Also, as Joern mentioned, you should try stepping through try_combine to see what is really happening. -- Jim Wilson, GNU Tools Support, http://www.specifix.com
Re: memcpy to an unaligned address
Shaun Jackman <[EMAIL PROTECTED]> writes: > In a typical Ethernet/IP ARP header the source IP address is > unaligned. Instead of using... > out->srcIPAddr = in->dstIPAddr; > ... I used... > memcpy(&out->srcIPAddr, &in->dstIPAddr, sizeof(uint32_t)); > ... to account for the unaligned destination. This worked until gcc 4, > which now generates a simple load/store. > ldr r3, [r6, #24] > addsr2, r4, #0 > addsr2, #14 > str r3, [r2, #0] > A nice optimisation, but in this case it's incorrect. $r4 is aligned, > and the result of adding #14 to $r4 is an unaligned pointer. It isn't incorrect; gcc can assume that pointers are always correctly aligned for their type. Anything else would result in horrible code. If your program forms a pointer that is not properly aligned, it is already invalid, and later breakage is only a symptom of that. -- Falk
RE: splitting load immediates using high and lo_sum
> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > > On Jul 21, 2005, at 5:04 PM, Tabony, Charles wrote: > > >> From: Dale Johannesen [mailto:[EMAIL PROTECTED] > >> > >> On Jul 21, 2005, at 4:36 PM, Tabony, Charles wrote: > >> > >>> Hi, > >>> > >>> I am working on a port for a processor that has 32 bit > registers but > >>> can > >>> only load 16 bit immediates. > >>> "" > >>> "%0.h = #HI(%1)") > >> > >> What are the semantics of this? Low bits zeroed, or untouched? > >> If the former, your semantics are identical to Sparc; look at that. > > > > The low bits are untouched. However, I would expect the compiler to > > always follow setting the high bits with setting the low bits. > > OK, if you're willing to accept that limitation (your > architecture could > handle putting the LO first, which Sparc can't) then Sparc is still a > good model to look at. What it does should work for you. Earlier I was able to successfully split load immediates into high and lo_sum insns, and that has worked great as far as scheduling. However, I noticed that now instead of loading the address of a constant such as a string, compiled programs will load the address of a constant that is the address of that string and then dereference it. My guess is that this is caused by the constant in the high/lo_sum pair being hidden from CSE. I looked at the way SPARC and MIPS handle the problem, but I don't think that will work for me. If I understand correctly, they split the move into a load immediate that has the lower bits cleared, corresponding to a sethi or lui instruction, and an ior immediate. The semantics of the instructions I am working with, "R0.H = #HI(CONSTANT)" and "R0.L = #LO(CONSTANT)" are that the half of the register not being set is unmodified. Since I can not use an ior immediate like SPARC and MIPS, how can I split move immediate insns so that they can be effeciently scheduled but still eliminate the unnecessary indirection? Also, does the method used by SPARC and MIPS work for symbols? Thank you, Charles
Re: memcpy to an unaligned address
On Aug 2, 2005, at 10:32 AM, Shaun Jackman wrote: In a typical Ethernet/IP ARP header the source IP address is unaligned. Instead of using... out->srcIPAddr = in->dstIPAddr; ... I used... memcpy(&out->srcIPAddr, &in->dstIPAddr, sizeof(uint32_t)); ... to account for the unaligned destination. This worked until gcc 4, which now generates a simple load/store. ldr r3, [r6, #24] addsr2, r4, #0 addsr2, #14 str r3, [r2, #0] A nice optimisation, but in this case it's incorrect. $r4 is aligned, and the result of adding #14 to $r4 is an unaligned pointer. Should gcc know better, or do I need to give it a little more information to help it out? gcc-help is the correct list to ask this question on. Anyway, I suspect people would be aided in helping you by seeing the source code and knowing what version of gcc you're using... I suspect you don't mark the structure as packed and as using 1 or 2 byte alignment. If you do that, then the compiler should generate the correct code, for example: mrs $ cat t1.c struct { char a[14]; int i __attribute__((aligned(1), packed)); } s, d; main() { d.i = s.i; } $ arm-gcc -O4 t1.c -S gives: _main: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. str sl, [sp, #-4]! ldr sl, .L3 ldr r2, .L3+4 .L2: add sl, pc, sl ldr ip, [sl, r2] ldr r0, .L3+8 ldrhr1, [ip, #16] ldr r2, [sl, r0] ldrhr3, [ip, #14] @ lr needed for prologue strhr1, [r2, #16] @ movhi strhr3, [r2, #14] @ movhi ldmfd sp!, {sl} mov pc, lr for me. Notice the adding of 14, notice the two 16 bit moves instead of one 4 byte move. If you lie to the compiler, it will make your life rough. Telling it that it is aligned, when the data isn't aligned, is a lie.
Re: memcpy to an unaligned address
On 8/2/05, Dave Korn <[EMAIL PROTECTED]> wrote: > In order for anyone to answer your questions about the alignment of > various types in a struct, don't you think you should perhaps have told us a > little about what those types actually are and how the struct is laid out? Of course, my apologies. I was clearly overly terse. I declare the structure packed as follows: typedef struct { uint16_t a; uint32_t b; } __attribute__((packed)) st; void foo(st *s, int n) { memcpy(&s->b, &n, sizeof n); } This code generates the unaligend store: $ arm-elf-objdump -d packed.o ... 0: e24dd004sub sp, sp, #4 ; 0x4 4: e5801002str r1, [r0, #2] 8: e28dd004add sp, sp, #4 ; 0x4 c: e12fff1ebx lr $ arm-elf-gcc --version | head -1 arm-elf-gcc (GCC) 4.0.1 Cheers, Shaun
Re: memcpy to an unaligned address
One of the things that continues to baffle me (and my colleagues) is the bizarre way in which attributes such as "packed" work when applied to structs. It would be natural to assume, as Shaun did, that marking a struct "packed" (or, for that matter, "packed,aligned(2)") would apply that attribute to the fields of the struct. But it doesn't work that way. To get the right results, you have to stick attributes all over the structure fields, one by one. This is highly counterintuive. Worse yet, in this example the attribute is applied to the structure elements to some extent but not consistently -- it causes the fields to be packed -- hence unaligned -- but it does not do unaligned accesses to the fields. This sure looks like a bug. paul
Re: memcpy to an unaligned address
On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote: > One of the things that continues to baffle me (and my colleagues) is > the bizarre way in which attributes such as "packed" work when applied > to structs. > > It would be natural to assume, as Shaun did, that marking a struct > "packed" (or, for that matter, "packed,aligned(2)") would apply that > attribute to the fields of the struct. This is exactly the behaviour suggested by the info docs: $ info gcc 'C Ext' 'Type Attr' ... Specifying this attribute for `struct' and `union' types is equivalent to specifying the `packed' attribute on each of the structure or union members. Cheers, Shaun
Re: More fun with aliasing - removing assignments?
> > OK, thanks. That settles it then. > > Just to close out this thread for the record, Andrew Pinski opened PR > 23912 for this problem, and Diego checked in a patch for the 4.0 > br > n > h. So all should be well in 4.0.2. > And the alias analyzer for 4.1 has tihs code, which is why it comes up with the right answer: case NOP_EXPR: case CONVERT_EXPR: case NON_LVALUE_EXPR: { tree op = TREE_OPERAND (t, 0); /* Cast from non-pointer to pointers are bad news for us. Anything else, we see through */ if (!(POINTER_TYPE_P (TREE_TYPE (t)) && ! POINTER_TYPE_P (TREE_TYPE (op return get_constraint_for (op); /* FALLTHRU */ } default: { temp.type = ADDRESSOF; temp.var = anything_id; temp.offset = 0; return temp; } We special case casts from integer constants like 0 (somewhere else) :) I decided it wasn't worth trying to change years of practice of "let's cast integers to pointers" by trying to sneak this in. I'd rathre just watch as all their code explodes for other reasons, like trying to cast pointers to unsigned int's on a 64 bit machine with LP64 models.
RE: memcpy to an unaligned address
Original Message >From: Shaun Jackman >Sent: 02 August 2005 20:26 > On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote: >> One of the things that continues to baffle me (and my colleagues) is >> the bizarre way in which attributes such as "packed" work when applied >> to structs. >> >> It would be natural to assume, as Shaun did, that marking a struct >> "packed" (or, for that matter, "packed,aligned(2)") would apply that >> attribute to the fields of the struct. > > This is exactly the behaviour suggested by the info docs: > > $ info gcc 'C Ext' 'Type Attr' > ... > Specifying this attribute for `struct' and `union' types is > equivalent to specifying the `packed' attribute on each of the > structure or union members. > There are two separate issues here: 1) Is the base of the struct aligned to the natural alignment, or can the struct be based at any address 2) Is there padding between the struct members to maintain their natural alignments (on the assumption that the struct's base address is aligned.) I think this is where some of the ambiguity in the docs comes from. But I'm about to leave the office now, so I can't go into depth with this thread right now cheers, DaveK -- Can't think of a witty .sigline today
RE: memcpy to an unaligned address
> "Dave" == Dave Korn <[EMAIL PROTECTED]> writes: Dave> Original Message >> From: Shaun Jackman Sent: 02 August 2005 20:26 >> On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote: >>> One of the things that continues to baffle me (and my colleagues) >>> is the bizarre way in which attributes such as "packed" work when >>> applied to structs. >>> >>> It would be natural to assume, as Shaun did, that marking a >>> struct "packed" (or, for that matter, "packed,aligned(2)") would >>> apply that attribute to the fields of the struct. >> This is exactly the behaviour suggested by the info docs: >> >> $ info gcc 'C Ext' 'Type Attr' ... Specifying this attribute for >> `struct' and `union' types is equivalent to specifying the >> `packed' attribute on each of the structure or union members. >> Dave> There are two separate issues here: Dave> 1) Is the base of the struct aligned to the natural alignment, Dave> or can the struct be based at any address Dave> 2) Is there padding between the struct members to maintain Dave> their natural alignments (on the assumption that the struct's Dave> base address is aligned.) Sure. But in Shaun's case it looks like (2) has been applied, except that the compiler doesn't adjust the generated code correctly. I would argue that "packed" applied to a whole struct should produce BOTH effects 1 and 2. There's a third case for which there appears to be no notation: 3) A pointer to a T that doesn't have the normal alignment of the type T. For example, as far as I can tell, GCC offers no way to say "pointer to unaligned int" -- short of creating a one-member struct. paul
GCC-3.4.5 status report
Hi, The number of open PRs registered as CC-3.4.x regressions only and targetted for 3.4.5 has decreased from 125 (last week) to 115. Which is a progress! Still, we have too many PRs for a stable branch. Here is the complete list as communicated to me by the bugzilla mail interface. Note to Dan: we still miss ways to query for PRs closed in specific laps to time. The C++ front-end remains the winner in the category of maximum number of regressions (46). bootstrap: 2 18532 libgcc.mk isn't parallel build safe for multilib 22213 quoting of dir-variable in mklibgcc.in c: 5 16676 ICE with nested functions and -g1, blocks glibc 20239 ICE on empty preprocessed input 21536 C99 array of variable length use causes segmentation fault 22061 internal compiler error: in find_function_data, at function.c:317 22458 ICE on missing brace c++: 46 11224 warning "value computed is not used" no longer emitted 14500 most specialized function template vs. non-template function 14950 always_inline does not mix with templates and -O0 16021 Tests for container swap specialisations FAIL in debug mode 16030 stdcall function decoration vs LTHUNK alias in multiple inheritanc 16042 ICE with array assignment 16276 G++ generates local references to linkonce sections 16405 Temporary aggregate copy not elided 16572 Wrong filename/line number were reported by g++ in inlining's warning messages 17248 __always_inline__ throws "unimplemented" in -O0 mode 17332 Missed inline opportunity 17609 spurious error message after using keyword 17655 ICE with using a C99 initializer in an if-condition 17972 const/pure functions result in bad asm 18273 Fail to generate debug info for member function. 18368 C++ error message regression 18445 ice during overload resolution in template instantiation 18462 Segfault on declaration of large array member 18466 int ::i; accepted 18512 ICE on invalid usage of template base class 18514 Alternate "asm" name ignored for redeclared builtin function imported into namespace std 18545 ICE when returning undefined type 18625 triple error message for invalid typedef 18738 typename not allowed with non-dependent qualified name 19043 -fpermissive gives bad loop initializations 19063 ICE on invalid template parameter 19395 invalid scope qualifier allowed in typedef 19396 Invalid template in typedef accepted 19397 ICE with invalid typedef 19441 Bad error message with invalid destructor declaration 19628 g++ no longer accepts __builtin_constant_p in constant-expressions 19710 ice on invalid one line C++ code 19734 Another ICE on invalid destructor call 19762 ICE in invalid explicit instantiation of a destructor 19764 ICE on explicit instantiation of a non-template destructor 19982 The left side of the "=" operator must be an lvalue. 20152 ICE compiling krusader-1.5.1 with latest CVS gcc 20153 ICE when C++ template function contains anonymous union 20383 #line directive breaks try-catch statement 20427 ()' not default initialized 20552 ICE in write_type, at cp/mangle.c:1579 20905 confuses unrelated type name with instance name 21784 Using vs builtin names 22215 g++ -O2 generates Undefined Global for statically defined function 22545 ICE with pointer to class member & user defined conversion operator 23162 internal compiler error: in c_expand_expr, at c-common.c:4138 debug: 4 16035 internal compiler error: in gen_subprogram_die, at dwarf2out.c:10798 17076 ICE on variable size array initialization in debug mode in C++ 20253 Macro debug info broken due to lexer change 21932 -O3 -fno-unit-at-a-time causes ICE fortran: 2 18913 seg. fault with -finit-local-zero option on complex array of dimension 1 20774 Debug information in .o (from FORTRAN) points to temporary file under certain circumstances libf2c: 1 17725 g77 libs installed in wrong directory libobjc: 1 11572 GNU libobjc no longer compiled on Darwin libstdc++: 1 11953 _REENTRANT defined when compiling non-threaded code. middle-end: 6 18956 'bus error' at runtime while passing a special struct to a C++ member function 19183 ICE with -fPIC 19371 Missing uninitialized warning with dead code (pure/const functions) 20329 current 3.4.4 miscompiles Linux kernel with athlon optimisations 21964 broken tail call at -O2 or more 22177 error: in assign_stack_temp_for_type, at function.c:655 other: 4 15378 -Werror should provide notification of why gcc is exiting 17594 GCC does not error about unknown options which starts with a valid option 20731 contrib/gcc_update hard code -r gcc-3_4-branch 22511 cc1plus: error: unrecognized command line option "-Wno-pointer-sign" preprocessor: 2 15307 Preprocessor ICE on invalid input 19475 missing whitespace after macro name in C90 or C++ rtl-optimization: 20 11707 constants not propagated in unrolled loop iterations with a conditional 12863 basic block reordering fails for fallth
Re: memcpy to an unaligned address
On 8/2/05, Dave Korn <[EMAIL PROTECTED]> wrote: > There are two separate issues here: > > 1) Is the base of the struct aligned to the natural alignment, or can the > struct be based at any address The base of the struct is aligned to the natural alignment, four bytes in this case. > 2) Is there padding between the struct members to maintain their natural > alignments (on the assumption that the struct's base address is aligned.) There is no padding. The structure is defined as __attribute__((packed)) to explicitly remove the padding. The result is that gcc knows the unaligned four byte member is at an offset of two bytes from the base of the struct, but uses a four byte load at the unaligned address of base+2. I don't expect... p->unaligned = n; ... to work, but I definitely expect memcpy(&p->unaligned, &n, sizeof p->unaligned); to work. The second case is being optimised to the first case though and generating and unaligned store. Cheers, Shaun
Re: memcpy to an unaligned address
> "Shaun" == Shaun Jackman <[EMAIL PROTECTED]> writes: >> 2) Is there padding between the struct members to maintain their >> natural alignments (on the assumption that the struct's base >> address is aligned.) Shaun> There is no padding. The structure is defined as Shaun> __attribute__((packed)) to explicitly remove the padding. The Shaun> result is that gcc knows the unaligned four byte member is at Shaun> an offset of two bytes from the base of the struct, but uses a Shaun> four byte load at the unaligned address of base+2. I don't Shaun> expect... Shaun> p-> unaligned = n; Shaun> ... to work, ... I would. If you tell gcc that a thing is unaligned, it is responsible for doing unaligned references to it. That very definitely includes direct references to the content in expressions. And in general that works. Clearly there is a GCC bug here; GCC put the field at an unaligned offset, but did not do unaligned references to it. paul
Re: memcpy to an unaligned address
On Aug 2, 2005, at 1:15 PM, Shaun Jackman wrote: There is no padding. The structure is defined as __attribute__((packed)) to explicitly remove the padding. The result is that gcc knows the unaligned four byte member is at an offset of two bytes from the base of the struct, but uses a four byte load at the unaligned address of base+2. I don't expect... p->unaligned = n; ... to work, Actually, that works just fine, with: typedef struct { unsigned short int a; unsigned int b; } __attribute__((packed)) st; void foo(st *s, int n) { s->b = n; } I get: _foo: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. mov r3, r1, lsr #24 mov r2, r1, lsr #8 mov ip, r1, lsr #16 @ lr needed for prologue strbr3, [r0, #5] strbr2, [r0, #3] strbip, [r0, #4] strbr1, [r0, #2] mov pc, lr but I definitely expect memcpy(&p->unaligned, &n, sizeof p->unaligned); to work. Ah, I was having trouble getting it to fail for me... Now I can: #include typedef struct { unsigned short int a; unsigned int b; } __attribute__((packed)) st; void foo(st *s, int n) { memcpy(&s->b, &n, sizeof n); } _foo: @ args = 0, pretend = 0, frame = 4 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. sub sp, sp, #4 @ lr needed for prologue str r1, [r0, #2] add sp, sp, #4 bx lr Yes, this is a compiler bug in the expansion of memcpy, please file a bug report. The solution is for the compiler to notice the memory alignment of the destination and `do-the-right-thing' when it isn't aligned.
Re: memcpy to an unaligned address
> > On Aug 2, 2005, at 1:15 PM, Shaun Jackman wrote: > > There is no padding. The structure is defined as > > __attribute__((packed)) to explicitly remove the padding. The result > > is that gcc knows the unaligned four byte member is at an offset of > > two bytes from the base of the struct, but uses a four byte load at > > the unaligned address of base+2. I don't expect... > > p->unaligned = n; > > ... to work, > > Actually, that works just fine, with: > > typedef struct { >unsigned short int a; >unsigned int b; > } __attribute__((packed)) st; > > void foo(st *s, int n) > { >s->b = n; > } > > Ah, I was having trouble getting it to fail for me... Now I can: > > #include > > typedef struct { >unsigned short int a; >unsigned int b; > } __attribute__((packed)) st; > > void foo(st *s, int n) > { >memcpy(&s->b, &n, sizeof n); > } > > Yes, this is a compiler bug in the expansion of memcpy, please file a > bug report. The solution is for the compiler to notice the memory > alignment of the destination and `do-the-right-thing' when it isn't > aligned. No it is not, once you take the address (which should be rejected), it is of type "unsigned int *" and not unaligned variable, passing it to memcpy assumes the type alignment is the natural alignment. -- Pinski
Re: memcpy to an unaligned address
Andrew Pinski <[EMAIL PROTECTED]> writes: > > Yes, this is a compiler bug in the expansion of memcpy, please file a > > bug report. The solution is for the compiler to notice the memory > > alignment of the destination and `do-the-right-thing' when it isn't > > aligned. > > No it is not, once you take the address (which should be rejected), it > is of type "unsigned int *" and not unaligned variable, passing it to > memcpy assumes the type alignment is the natural alignment. That argument doesn't make sense to me. memcpy takes a void* argument, which has no presumed alignment. The builtin should work the same way. That is, there is an implicit cast to void* in the argument to memcpy. The compiler can certainly take advantage of any knowledge it has about the alignment, but it can't assume anything about the alignment that it doesn't already know. Ian
Re: memcpy to an unaligned address
> "Andrew" == Andrew Pinski <[EMAIL PROTECTED]> writes: >> Yes, this is a compiler bug in the expansion of memcpy, please >> file a bug report. The solution is for the compiler to notice the >> memory alignment of the destination and `do-the-right-thing' when >> it isn't aligned. Andrew> No it is not, once you take the address (which should be Andrew> rejected), it is of type "unsigned int *" and not unaligned Andrew> variable, passing it to memcpy assumes the type alignment is Andrew> the natural alignment. That seems like a misfeature. It sounds like the workaround is to avoid memcpy, and just use variable assignment. Alternatively, cast the pointers to char*, which should force memcpy to do the right thing. Ugh. paul
GCC 4.2 Projects
Although we're still in Stage 3, it's time to start thinking about GCC 4.2. I know that many people are working on projects that they hope to include in GCC 4.2, and it's reasonable to start gathering them. I don't plan to actually work on ordering them in any coherent way for a few more weeks, and I want to keep the focus on fixing bugs in Stage 3, but I also don't want to be obstructing forward progress for GCC 4.2. In keeping with the clear preference from 4.1, we'll do all project proposals as publicly posted Wiki pages on the GCC Wiki. I'll poll the Wiki for new projects, but I think people might appreciate mail to the GCC mailing list when you add something. See: http://gcc.gnu.org/wiki/GCC%204.2%20Projects for some guidelines. Thanks, -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304
Re: memcpy to an unaligned address
On Aug 2, 2005, at 1:37 PM, Andrew Pinski wrote: No it is not, :-) Ah, yes, the old, we don't have pointers to unaligned types problem... anyway, we can at least agree that this is a gapping hole people can drive trucks though in the type system, but I'm still claiming it isn't a feature on theoretic grounds. :-( Shaun, want to do up an entry in the manual describing this? We have known about this for years and years, but, we don't do a good job communicating it to users. Essentially, & doesn't work as one would expect on unaligned data, as it produces a pointer to an aligned object instead of a pointer to unaligned object. Essentially, we don't have a type system that contains pointer to unaligned types. The compiler then goes on to make codegen choices based upon the fact that the data are known to be aligned, and bad things happen.
Re: memcpy to an unaligned address
On Tue, Aug 02, 2005 at 02:04:16PM -0700, Mike Stump wrote: > Shaun, want to do up an entry in the manual describing this? We have > known about this for years and years, but, we don't do a good job > communicating it to users. Essentially, & doesn't work as one would > expect on unaligned data, as it produces a pointer to an aligned > object instead of a pointer to unaligned object. I suppose we could make & on an unaligned project return a void*. That isn't really right, but it would at least prevent the cases that we know don't work from compiling.
Re: memcpy to an unaligned address
On Aug 2, 2005, at 1:45 PM, Ian Lance Taylor wrote: That argument doesn't make sense to me. memcpy takes a void* argument, which has no presumed alignment. The memcpy builtin uses the static type of the actual argument (before conversion to void*), to gain hints about the alignments of the data coming in. This is so that we can producing nice fast code for 1-16 bytes objects. This is actually good. The real problem is formation of the address of the member doesn't produce a pointer to unaligned type, but rather a pointer to aligned type, this is the part that is wrong. We'd have to add pointers to unaligned data to our type system to fix it. That should be done, but is a hard/big job, and no one has stepped forward to do it.
Re: memcpy to an unaligned address
On Tue, Aug 02, 2005 at 02:29:44PM -0700, Mike Stump wrote: > On Aug 2, 2005, at 1:45 PM, Ian Lance Taylor wrote: > >That argument doesn't make sense to me. memcpy takes a void* > >argument, which has no presumed alignment. > > The memcpy builtin uses the static type of the actual argument > (before conversion to void*), to gain hints about the alignments of > the data coming in. This is so that we can producing nice fast code > for 1-16 bytes objects. This is actually good. The real problem is > formation of the address of the member doesn't produce a pointer to > unaligned type, but rather a pointer to aligned type, this is the > part that is wrong. We'd have to add pointers to unaligned data to > our type system to fix it. That should be done, but is a hard/big > job, and no one has stepped forward to do it. So my suggestion to just make pointers to unaligned objects void* would work in this case, then.
Re: memcpy to an unaligned address
On Tue, Aug 02, 2005 at 04:07:00PM -0600, Shaun Jackman wrote: > On 8/2/05, Joe Buck <[EMAIL PROTECTED]> wrote: > > I suppose we could make & on an unaligned project return a void*. That > > isn't really right, but it would at least prevent the cases that we know > > don't work from compiling. > > That sounds like a dangerous idea only because I'd expect... > int *p = &packed_struct.unaligned_member; > ... to fail if unaligned_member is not an int, but if the & operator > returns a void*, it would suddenly become very permissive. Ah. I was thinking as a C++ programmer, where void* cannot be assigned to int* without an explicit cast. The decision to allow this in C was the worst mistake the standards committee made. The problem is that the type returned by malloc is not just any void*, but a special pointer that is guaranteed to have alignment sufficient to store any type. This is very different from the type of the arguments to memcpy, which is assumed to have no alignment that can be counted on.
Re: memcpy to an unaligned address
On 8/2/05, Joe Buck <[EMAIL PROTECTED]> wrote: > I suppose we could make & on an unaligned project return a void*. That > isn't really right, but it would at least prevent the cases that we know > don't work from compiling. That sounds like a dangerous idea only because I'd expect... int *p = &packed_struct.unaligned_member; ... to fail if unaligned_member is not an int, but if the & operator returns a void*, it would suddenly become very permissive. Cheers, Shaun
Re: memcpy to an unaligned address
On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote: > It sounds like the workaround is to avoid memcpy, and just use > variable assignment. Alternatively, cast the pointers to char*, which > should force memcpy to do the right thing. Ugh. I swear originally, back in the gcc 2.95 days, I used memcpy because the memcpy function checked for unaligned pointers, whereas storing to and loading from unaligned variables generated a simple store/load instruction which wouldn't work. It seems the tables have turned and the exact opposite is true now with gcc 4, where memcpy doesn't work, but unaligned variables do. I believe gcc 3 behaved the same as gcc 2 -- memcpy worked, unaligned variables didn't work. Can someone confirm this summary is correct? It seems to me there's an argument for a _memcpy_unaligned(3) function, as ugly as that is. Cheers, Shaun
Re: memcpy to an unaligned address
On 8/2/05, Paul Koning <[EMAIL PROTECTED]> wrote: > It sounds like the workaround is to avoid memcpy, and just use > variable assignment. Alternatively, cast the pointers to char*, which > should force memcpy to do the right thing. Ugh. Casting to void* does not work either. gcc keeps the alignment information -- but not the *unalignment* information, if that distinction makes any sense -- of a particular variable around as long as it can, through casts and even through assignment. The unalignment information, on the other hand, is lost immediately after the & operator. None of these examples produce an unaligned load: memcpy(&s->b, &n, sizeof n); memcpy((void*)&s->b, &n, sizeof n); void *p = &s->b; memcpy(p, &n, sizeof n); But as pointed out by others, this does produce an unaligned load: s->b = n; Cheers, Shaun
Re: memcpy to an unaligned address
On 8/2/05, Shaun Jackman <[EMAIL PROTECTED]> wrote: > operator. None of these examples produce an unaligned load: I should clarify the wording I'm using here. By "an unaligned load" I mean code to safely load from an unaligned pointer. Cheers, Shaun
Re: GCC-3.4.5 status report
On Tue, 2005-08-02 at 22:07 +0200, Gabriel Dos Reis wrote: > Hi, > > The number of open PRs registered as CC-3.4.x regressions only and > targetted for 3.4.5 has decreased from 125 (last week) to 115. Which > is a progress! Still, we have too many PRs for a stable branch. > > Here is the complete list as communicated to me by the bugzilla mail > interface. Note to Dan: we still miss ways to query for PRs closed > in specific laps to time. I'll work on this, but i probably won't get to it till next week (best guess).
gcc-3.4-20050802 is now available
Snapshot gcc-3.4-20050802 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/3.4-20050802/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 3.4 CVS branch with the following options: -rgcc-ss-3_4-20050802 You'll find: gcc-3.4-20050802.tar.bz2 Complete GCC (includes all of below) gcc-core-3.4-20050802.tar.bz2 C front end and core compiler gcc-ada-3.4-20050802.tar.bz2 Ada front end and runtime gcc-g++-3.4-20050802.tar.bz2 C++ front end and runtime gcc-g77-3.4-20050802.tar.bz2 Fortran 77 front end and runtime gcc-java-3.4-20050802.tar.bz2 Java front end and runtime gcc-objc-3.4-20050802.tar.bz2 Objective-C front end and runtime gcc-testsuite-3.4-20050802.tar.bz2The GCC testsuite Diffs from 3.4-20050726 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-3.4 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Attempt for rotating register allocation
Hi, all So far, using the dataflow info (gen with df.c and df.h), I can find out the pseudo registers use and def in a one bb loop. Now, need to establish a struct to record the lifetime activity of each pseudo register in a software pipelined loop. According to this struct, we can allocate rotating registers to each pseudo register used and defed in swp loops. Although, rotating register allocation for swp in GCC may be interfered by the reg alloc process in current GCC, which may result in failure. I also want to try it. During this process, I can grasp more techniques in the back-end of GCC. Needs advice and cooperations. Chunjiang Li Creative Compiler Research Group, National University of Defense Technology, China.
Partial Success Building 4.0.1 on x86_64-slackware-linux
I have a partial success on x86_64-slackware-linux. It is partial because it (mostly) bootstraps (see item 1) and but fails to install (see item 2). 1. Java compilation repeatedly failed, so I dropped it from the languages to build 2. While bootstrap succeeds, "make install" fails with the following error: $ sudo make install /bin/sh ../gcc-4.0.1/mkinstalldirs /usr/local/gcc401 /usr/local/gcc401 make[1]: Entering directory `/home/kurt/books/gccbook2/gcc-obj/fixincludes' make[1]: *** No rule to make target `../libiberty/libiberty.a', needed by `full-stamp'. Stop. make[1]: Leaving directory `/home/kurt/books/gccbook2/gcc-obj/fixincludes' make: *** [install-fixincludes] Error 2 Output from config.guess: x86_64-unknown-linux-gnu Output from resulting gcc -v: Languages: c,c++,objc Distribution: slamd64 (Slackware 64-bit Linux distribution) Kernel: Linux easter 2.6.12.3 #1 Fri Jul 29 06:04:06 EDT 2005 x86_64 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux C Library: GNU C Library stable release version 2.3.2, by Roland McGrath et al. Copyright (C) 2003 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 3.3.3. Compiled on a Linux 2.4.26 system on 2004-05-24. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.10 by Xavier Leroy BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk ./configure invocation: ../gcc-4.0.1/configure \ --disable-nls \ --with-gnu-gettext \ --prefix=/usr/local \ --host=x86_64-slackware-linux \ --target=x86_64-slackware-linux \ --enable-languages=c,c++,objc Kurt -- Know what I hate most? Rhetorical questions. -- Henry N. Camp -- "Speed is subsittute fo accurancy."
Request to reopen a PR
Hi Sorry if this is the wrong address to contact. This is a minor request for a minor libmudflap problem. Could somebody with appropriate privilege please do me a favor and reopen the following bugzilla PR? http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20003 It seems the system won't let me do it because I'm not the original reporter. Thanks Greg