Re: adding an argument for test execution in testsuite
It is unfortunate that UPC program cannot use dg-additional-sources as we would need to change our run-time to support this option. By the time we reach "main" run-time is already initialized to support specified number of threads. One of the options might be to define a default number of threads to run if number is not specified. Anyway, I spent a little bit more time on this and was able to create a wrapper for "upc_load" function the same way it is done in gcc-dg.exp (renaming the old upc_load into prev_upc_load). Wrapper adds the necessary flags for dynamic environment. Notice that ${tool}_load already accepts arguments that can be passed to the program. Nenad On 5/4/11 3:32 PM, Janis Johnson wrote: On 05/04/2011 11:21 AM, Nenad Vukicevic wrote: It seems that I fixed my problem by defining remote_spawn procedure (and fixing the order of loading libraries :) ) in my own upc-dg.exp file and adding a line to it that append additional arguments to the command line: "append commandline $upc_run_arguments". global $upc_run_arguments is getting set before dg-test is being called. I used a simple string compare to see if dynamic threads are required. So far it works as expected. Working "so far" shouldn't be good enough, especially if your test will be run for a variety of targets. Presumably you don't really need the number of threads to be specified on the command line, you just need for it to look as if it were specified at run time. You could, for example, define it in a second source file included in the test via dg-additional-sources and use it from a global variable or call a function to get it. Janis
Re: Why is this a problem in the C++ bootstrap and not the normal one ?
On Thu, May 12, 2011 at 1:32 AM, Toon Moene wrote: > ../../gcc/gcc/objc/objc-next-runtime-abi-02.c: In function 'const char* > newabi_append_ro(const char*)': > ../../gcc/gcc/objc/objc-next-runtime-abi-02.c:1885:29: error: invalid > conversion from 'const char*' to 'char*' [-fpermissive] > > I have to see if it is useful to keep on testing the C++ bootstrap daily, > because apparently, some bitrot already set in ... Yes, I believe it is useful precisely because of these silent regressions. -- Gaby
Re: basic bloc chaining: using dominance
On Wed, May 11, 2011 at 10:03 PM, Ian Lance Taylor wrote: > Pierre Vittet writes: > >> First, thanks for your help. I have looked at several function using >> calculate_dominance_info(). From what I understand, when you have >> finish to use it, you have to clear the structure by making a >> free_dominance_info(). >> In the function flow_loops_find (file gcc/cfgloop.c), there is a call >> to calculate_dominance_info() without call to free_dominance_info(). I >> feel it is a bug, no? > > Not in this case, no. The caller or a later pass is responsible for > freeing it in this case. There should ideally be a comment about this. dominator information should be freed if it is made invalid by CFG transformations. Otherwise the policy is to keep it around to avoid re-computing it (too many passes need it). For post-dominator info the policy is to always free it. Richard. > Ian >
Re: Non-optimal stack usage with C++ temporaries
On Wed, May 11, 2011 at 10:15 PM, Matt Fischer wrote: > I've noticed some behavior with g++ that seems strange to me. I don't > know if there's some technicality in the C++ standard that requires > this, or if it's just a limitation to the optimization code, but it > seemed strange so I thought I'd see if anybody could shed more light > on it. > > Here's a test program that illustrates the behavior: > > struct Foo { > char buf[256]; > Foo() {} // suppress automatically-generated constructor code for clarity > ~Foo() {} > }; > > void func0(const Foo &); > void func1(const Foo &); > void func2(const Foo &); > void func3(const Foo &); > > void f() > { > func0(Foo()); > func1(Foo()); > func2(Foo()); > func3(Foo()); > } > > Compiling with -O2 and "-fno-stack-protector -fno-exceptions" for > clarity, on g++ 4.4.3, gives the following: > > : > 0: 55 push %ebp > 1: 89 e5 mov %esp,%ebp > 3: 81 ec 18 04 00 00 sub $0x418,%esp > 9: 8d 85 f8 fb ff ff lea -0x408(%ebp),%eax > f: 89 04 24 mov %eax,(%esp) > 12: e8 fc ff ff ff call 13 <_Z1fv+0x13> > 17: 8d 85 f8 fc ff ff lea -0x308(%ebp),%eax > 1d: 89 04 24 mov %eax,(%esp) > 20: e8 fc ff ff ff call 21 <_Z1fv+0x21> > 25: 8d 85 f8 fd ff ff lea -0x208(%ebp),%eax > 2b: 89 04 24 mov %eax,(%esp) > 2e: e8 fc ff ff ff call 2f <_Z1fv+0x2f> > 33: 8d 85 f8 fe ff ff lea -0x108(%ebp),%eax > 39: 89 04 24 mov %eax,(%esp) > 3c: e8 fc ff ff ff call 3d <_Z1fv+0x3d> > 41: c9 leave > 42: c3 ret > > The function makes four function calls, each of which constructs a > temporary for the parameter. The compiler dutifully allocates stack > space to construct these, but it seems to allocate separate stack > space for each of the temporaries. This seems unnecessary--since > their lifetimes don't overlap, the same stack space could be used for > each of them. The real-life code I adapted this example from had a > fairly large number of temporaries strewn throughout it, each of which > were quite large, so this behavior caused the generated function to > use up a pretty substantial amount of stack, for what seems like no > good reason. > > My question is, is this expected behavior? My understanding of the > C++ standard is that each of those temporaries goes away at the > semicolon, so it seems like they have non-overlapping lifetimes, but I > know there are some exceptions to that rule. Could someone comment on > whether this is an actual bug, or required for some reason by the > standard, or just behavior that not enough people have run into > problems with? It's a missed optimization and not easy to fix. Richard. > Thanks, > Matt >
Can the size of pointers to data and text be different?
I am using gcc4.3.2. In our microcontroller, move instruction(mov reg, imm) can accept 16bits and 32bits immediate operand. The data memory size is less than 64KB, however, code memory size is larger than 64KB. The immediate operand may be addresses of variables in data sections and function pointers. The address of variables can be represented by 16bits. However, function pointers may be larger than 16bits. I'd like to use "mov reg, imm16" for addresses of variables and "mov reg, imm32" for function pointers. So that the code size can be a little bit smaller. Another way to understand the requirement is the size of pointers to data and text have to be different. How can I select appropriate mov for them? I tried to use LABEL_REF and SYMBOL_REL to distinguish between them, but it didn't help. It seems that function pointers are treated as symbols too. Are there any other cases that references to functions in text sections are used in data sections? Thanks. -Qifei Fan
More atomic functions please
Hello, I'm not sure if it should be better handled as missed optimization, but there is a certain lack of functionality in the GCC's __sync_* function family. When implementing a reference counting smart pointer, two operations are of crucial importance: void __sync_increment(T* p); bool __sync_decrement_iszero(T* p); The former only increments the location pointed to by p, the latter decrements it and returns true if and only if the result was zero. Both can be implemented in terms of existing __sync functions (and what can't? -- since there is __sync_bool_compare_and_swap()), e.g.: void __sync_increment(T* p) { __sync_fetch_and_add(p, 1); } bool __sync_decrement(T* p) { return __sync_fetch_and_add(p, -1) == 1; } Unfortunately, onx86/x64 both are compiled in a rather poor way: __sync_increment: lock addl $x01,(ptr) which is longer than: lock incl (ptr) __sync_decrement: movl -1, %rA lock xadd %rA, (ptr) cmpl $0x01, %rA je/jne... which is undoubtedly longer than "lock dec" and wastes a register. I can optimally implement the increment function with a bit of inline assembly, but decrement is not so lucky, as there is no way to inform the compiler the result is in the flags register. One must retreat to something like that: lock decl (ptr) sete %rA which GCC will finally use to perform a comparison in if(), emitting: lock decl (ptr) sete %rA testb %rA, %rA je/jne... which is hardly an improvement. On the other hand, the __sync functions integrate perfectly with the flag system (i.e. the pairs like cmpxchg/jne), so implementing the changes in the compiler gives far better opportunities to emit an optimal sequence compared to what can inline assembly do. As my code is to a high degree propelled by atomic power, I would like to ask you to provide these functions or tweak the optimizer in order to notice the aforementioned idioms. There is also lack of generic __sync_exchange() -- quite an important operation in lock-free programming. It could be implemented in terms of compare_exchange, but many platforms have native support for it and thus it should be exposed at the API level, tweaking the optimizer is not the proper way IMHO. Best regards, Piotr Wydersk
Re: More atomic functions please
On Thu, 12 May 2011, Piotr Wyderski wrote: > Hello, > > I'm not sure if it should be better handled as missed optimization, > but there is a certain lack of functionality in the GCC's __sync_* > function family. I don't think we should add new functions to that family; instead the aim should be to implement the functionality (built-in functions etc.) required for a good implementation of the C1x and C++0x atomics support, and recommend users to use those in future. http://gcc.gnu.org/wiki/Atomic -- Joseph S. Myers jos...@codesourcery.com
Re: More atomic functions please
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote: > Hello, > > I'm not sure if it should be better handled as missed optimization, > but there is a certain lack of functionality in the GCC's __sync_* > function family. > > When implementing a reference counting smart pointer, two operations > are of crucial importance: > > void __sync_increment(T* p); > bool __sync_decrement_iszero(T* p); > > The former only increments the location pointed to by p, the latter decrements > it and returns true if and only if the result was zero. > > Both can be implemented in terms of existing __sync functions (and what > can't? -- since there is __sync_bool_compare_and_swap()), e.g.: > >void __sync_increment(T* p) { > > __sync_fetch_and_add(p, 1); >} > > bool __sync_decrement(T* p) { > > return __sync_fetch_and_add(p, -1) == 1; > } And that's the right thing to do. If the generated code is not optimal, we should just improve __sync_fetch_and_add code generation. It isn't hard to special case addition of 1 or -1, and we already care whether the result of the builtin is used or ignored. For == 1 we could add some pattern that combiner would merge. Please file an enhancement request in gcc bugzilla. Jakub
Re: More atomic functions please
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote: > Unfortunately, onx86/x64 both are compiled in a rather poor way: > > __sync_increment: > > lock addl $x01,(ptr) > > which is longer than: > >lock incl (ptr) GCC actually generates lock incl (ptr) already now, it just depends on which CPU you optimize for. /* X86_TUNE_USE_INCDEC */ ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM), So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl, if addl is better (e.g. on Atom incl is very bad compared to addl $1), it will generate it. Jakub
gcc-4.5-20110512 is now available
Snapshot gcc-4.5-20110512 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20110512/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch revision 173717 You'll find: gcc-4.5-20110512.tar.bz2 Complete GCC (includes all of below) MD5=12f0c75f2b235dec5b87c4831dc477e3 SHA1=44f24ef585da1a68f2f0d98b87d86830541e7ce3 gcc-core-4.5-20110512.tar.bz2C front end and core compiler MD5=b00358e1c0d93b8ac6228972f5d1ca80 SHA1=3a8248c56490b83ff2f42a818cc647c7cc657ab3 gcc-ada-4.5-20110512.tar.bz2 Ada front end and runtime MD5=f99272c6d21b7ed01f72637c3962d094 SHA1=7b0032e6dc688811ecb89507f3e9f24a6a1498fc gcc-fortran-4.5-20110512.tar.bz2 Fortran front end and runtime MD5=32202cf5257900baaa31eb5da9e6395d SHA1=f29bdb2328e89b34a9de43a156bda42b970f1ebe gcc-g++-4.5-20110512.tar.bz2 C++ front end and runtime MD5=6623d74e88a7b1e4f742106a997087c5 SHA1=9c450b0ea6ba2134f5c80e4df733c6d83d101501 gcc-go-4.5-20110512.tar.bz2 Go front end and runtime MD5=5912c4cc9d1966ee57132f15cda52dd5 SHA1=28f46ef6e0eeada39f9bca31099bd8164b06fe95 gcc-java-4.5-20110512.tar.bz2Java front end and runtime MD5=469a0097d034dcafd5fb13fefc665708 SHA1=778ee54103c6c30c6f925600628ed1626f7e9928 gcc-objc-4.5-20110512.tar.bz2Objective-C front end and runtime MD5=e20978097265a3d04bdfd3b0e95de0a7 SHA1=765a2128abe8ec13a9201e1bc2c319ee0361930a gcc-testsuite-4.5-20110512.tar.bz2 The GCC testsuite MD5=34f99bfac27722938399d18462aa591b SHA1=ba9f9ad674a7e40c597f11fbf0cb8db10f860b38 Diffs from 4.5-20110505 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Can the size of pointers to data and text be different?
fanqifei writes: > I am using gcc4.3.2. > In our microcontroller, move instruction(mov reg, imm) can accept > 16bits and 32bits immediate operand. > The data memory size is less than 64KB, however, code memory size is > larger than 64KB. > The immediate operand may be addresses of variables in data sections > and function pointers. The address of variables can be represented by > 16bits. However, function pointers may be larger than 16bits. > I'd like to use "mov reg, imm16" for addresses of variables and "mov > reg, imm32" for function pointers. So that the code size can be a > little bit smaller. > Another way to understand the requirement is the size of pointers to > data and text have to be different. > > How can I select appropriate mov for them? I tried to use LABEL_REF > and SYMBOL_REL to distinguish between them, but it didn't help. It > seems that function pointers are treated as symbols too. > Are there any other cases that references to functions in text > sections are used in data sections? SYMBOL_REF_FUNCTION_P. (A LABEL_REF refers to a goto label.) Ian
Re: More atomic functions please
Jakub Jelinek wrote: > /* X86_TUNE_USE_INCDEC */ > ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM), > > So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl, > if addl is better (e.g. on Atom incl is very bad compared to addl $1), > it will generate it. Why is lock inc/dec worse than add/sub on Core2I7? The only difference I know of is the way the carry flag is handled. Best regards, Piotr Wyderski
Re: More atomic functions please
On Fri, May 13, 2011 at 07:55:44AM +0200, Piotr Wyderski wrote: > Jakub Jelinek wrote: > > > /* X86_TUNE_USE_INCDEC */ > > ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM), > > > > So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl, > > if addl is better (e.g. on Atom incl is very bad compared to addl $1), > > it will generate it. > > Why is lock inc/dec worse than add/sub on Core2I7? > The only difference I know of is the way the carry flag > is handled. Yeah, and that is exactly the problem, instructions that only sets a subset of the flags is problematic. See e.g. Intel's 248966.pdf, 3.5.1.1 "Use of the INC and DEC Instructions". Jakub
Generate annotations for a binary translator
Hi, all I am wondering if there is any possibility that gcc can generate annotations like control flow, or register usage into the executable. The idea comes from the paper below, Techniques to improve dynamic binary optimization http://www-users.cs.umn.edu/~adas/adas-thesis-embed.pdf The paper lists annotations that may benefit a binary translator on chapter 5. What a binary translator does is like QEMU. Take basic block (bb) register usage as an example. It's good to a binary translator to know that a basic block register usage information. Say, bb A whose binary address starts from 0x100 to 0x120 does NOT use R1, then the binary translator can use R1 for free. I know there is a data structure for basic block. But in order to let a binary translator use the basic block register usage information, each basic block must be associated with its corresponding binary (virtual) address. If it is possible to generating such information, which part of gcc should I look into first? Thanks! Regards, chenwj -- Wei-Ren Chen (陳韋任) Computer Systems Lab, Institute of Information Science, Academia Sinica, Taiwan (R.O.C.) Tel:886-2-2788-3799 #1667