Re: adding an argument for test execution in testsuite

2011-05-12 Thread Nenad Vukicevic

It is unfortunate that UPC program cannot use dg-additional-sources
as we would need to change our run-time to support this option. By the
time we reach "main" run-time is already initialized to support specified
number of threads. One of the  options might be to define a default number
of threads to run if number is not specified.

Anyway, I spent a little bit more time on this and was able to create a
wrapper for "upc_load" function the same way it is done in gcc-dg.exp
(renaming the old upc_load into prev_upc_load). Wrapper adds the necessary
flags for dynamic environment. Notice that ${tool}_load already accepts
arguments that can be passed to the program.

Nenad

On 5/4/11 3:32 PM, Janis Johnson wrote:

On 05/04/2011 11:21 AM, Nenad Vukicevic wrote:

It seems that I fixed my problem by defining remote_spawn
procedure (and fixing the order of loading libraries :) ) in my
own upc-dg.exp file and adding a line to it that append
additional arguments to the command line: "append commandline
$upc_run_arguments".

global $upc_run_arguments is getting set before dg-test is being
called. I used a simple string compare to see if dynamic
threads are required. So far it works as expected.

Working "so far" shouldn't be good enough, especially if your test will
be run for a variety of targets.

Presumably you don't really need the number of threads to be specified
on the command line, you just need for it to look as if it were
specified at run time.  You could, for example, define it in a second
source file included in the test via dg-additional-sources and use it
from a global variable or call a function to get it.

Janis




Re: Why is this a problem in the C++ bootstrap and not the normal one ?

2011-05-12 Thread Gabriel Dos Reis
On Thu, May 12, 2011 at 1:32 AM, Toon Moene  wrote:

> ../../gcc/gcc/objc/objc-next-runtime-abi-02.c: In function 'const char*
> newabi_append_ro(const char*)':
> ../../gcc/gcc/objc/objc-next-runtime-abi-02.c:1885:29: error: invalid
> conversion from 'const char*' to 'char*' [-fpermissive]
>
> I have to see if it is useful to keep on testing the C++ bootstrap daily,
> because apparently, some bitrot already set in ...

Yes, I believe it is useful precisely because of these silent regressions.

-- Gaby


Re: basic bloc chaining: using dominance

2011-05-12 Thread Richard Guenther
On Wed, May 11, 2011 at 10:03 PM, Ian Lance Taylor  wrote:
> Pierre Vittet  writes:
>
>> First, thanks for your help. I have looked at several function using
>> calculate_dominance_info(). From what I understand, when you have
>> finish to use it, you have to clear the structure by making a
>> free_dominance_info().
>> In the function flow_loops_find (file gcc/cfgloop.c), there is a call
>> to calculate_dominance_info() without call to free_dominance_info(). I
>> feel it is a bug, no?
>
> Not in this case, no.  The caller or a later pass is responsible for
> freeing it in this case.  There should ideally be a comment about this.

dominator information should be freed if it is made invalid by CFG
transformations.
Otherwise the policy is to keep it around to avoid re-computing it (too many
passes need it).  For post-dominator info the policy is to always free it.

Richard.

> Ian
>


Re: Non-optimal stack usage with C++ temporaries

2011-05-12 Thread Richard Guenther
On Wed, May 11, 2011 at 10:15 PM, Matt Fischer  wrote:
> I've noticed some behavior with g++ that seems strange to me.  I don't
> know if there's some technicality in the C++ standard that requires
> this, or if it's just a limitation to the optimization code, but it
> seemed strange so I thought I'd see if anybody could shed more light
> on it.
>
> Here's a test program that illustrates the behavior:
>
> struct Foo {
>    char buf[256];
>    Foo() {} // suppress automatically-generated constructor code for clarity
>    ~Foo() {}
> };
>
> void func0(const Foo &);
> void func1(const Foo &);
> void func2(const Foo &);
> void func3(const Foo &);
>
> void f()
> {
>    func0(Foo());
>    func1(Foo());
>    func2(Foo());
>    func3(Foo());
> }
>
> Compiling with -O2 and "-fno-stack-protector -fno-exceptions" for
> clarity, on g++ 4.4.3, gives the following:
>
>  :
>   0:   55                              push   %ebp
>   1:   89 e5                   mov    %esp,%ebp
>   3:   81 ec 18 04 00 00       sub    $0x418,%esp
>   9:   8d 85 f8 fb ff ff       lea    -0x408(%ebp),%eax
>   f:   89 04 24                mov    %eax,(%esp)
>  12:   e8 fc ff ff ff                  call   13 <_Z1fv+0x13>
>  17:   8d 85 f8 fc ff ff       lea    -0x308(%ebp),%eax
>  1d:   89 04 24                mov    %eax,(%esp)
>  20:   e8 fc ff ff ff                  call   21 <_Z1fv+0x21>
>  25:   8d 85 f8 fd ff ff       lea    -0x208(%ebp),%eax
>  2b:   89 04 24                mov    %eax,(%esp)
>  2e:   e8 fc ff ff ff                  call   2f <_Z1fv+0x2f>
>  33:   8d 85 f8 fe ff ff       lea    -0x108(%ebp),%eax
>  39:   89 04 24                mov    %eax,(%esp)
>  3c:   e8 fc ff ff ff                  call   3d <_Z1fv+0x3d>
>  41:   c9                              leave
>  42:   c3                              ret
>
> The function makes four function calls, each of which constructs a
> temporary for the parameter.  The compiler dutifully allocates stack
> space to construct these, but it seems to allocate separate stack
> space for each of the temporaries.  This seems unnecessary--since
> their lifetimes don't overlap, the same stack space could be used for
> each of them.  The real-life code I adapted this example from had a
> fairly large number of temporaries strewn throughout it, each of which
> were quite large, so this behavior caused the generated function to
> use up a pretty substantial amount of stack, for what seems like no
> good reason.
>
> My question is, is this expected behavior?  My understanding of the
> C++ standard is that each of those temporaries goes away at the
> semicolon, so it seems like they have non-overlapping lifetimes, but I
> know there are some exceptions to that rule.  Could someone comment on
> whether this is an actual bug, or required for some reason by the
> standard, or just behavior that not enough people have run into
> problems with?

It's a missed optimization and not easy to fix.

Richard.

> Thanks,
> Matt
>


Can the size of pointers to data and text be different?

2011-05-12 Thread fanqifei
I am using gcc4.3.2.
In our microcontroller, move instruction(mov reg, imm) can accept
16bits and 32bits immediate operand.
The data memory size is less than 64KB, however, code memory size is
larger than 64KB.
The immediate operand may be addresses of variables in data sections
and function pointers. The address of variables can be represented by
16bits. However, function pointers may be larger than 16bits.
I'd like to use "mov reg, imm16" for addresses of variables and "mov
reg, imm32" for function pointers. So that the code size can be a
little bit smaller.
Another way to understand the requirement is the size of pointers to
data and text have to be different.

How can I select appropriate mov for them? I tried to use LABEL_REF
and SYMBOL_REL to distinguish between them, but it didn't help. It
seems that function pointers are treated as symbols too.
Are there any other cases that references to functions in text
sections are used in data sections?

Thanks.
-Qifei Fan


More atomic functions please

2011-05-12 Thread Piotr Wyderski
Hello,

I'm not sure if it should be better handled as missed optimization,
but there is a certain lack of functionality in the GCC's __sync_*
function family.

When implementing a reference counting smart pointer, two operations
are of crucial importance:

void __sync_increment(T* p);
bool __sync_decrement_iszero(T* p);

The former only increments the location pointed to by p, the latter decrements
it and returns true if and only if the result was zero.

Both can be implemented in terms of existing __sync functions (and what
can't? -- since there is __sync_bool_compare_and_swap()), e.g.:

   void __sync_increment(T* p) {

  __sync_fetch_and_add(p, 1);
   }

  bool __sync_decrement(T* p) {

 return __sync_fetch_and_add(p, -1) == 1;
  }

Unfortunately, onx86/x64 both are compiled in a rather poor way:

__sync_increment:

lock addl $x01,(ptr)

which is longer than:

   lock incl (ptr)

__sync_decrement:

movl -1, %rA
lock xadd %rA, (ptr)
cmpl $0x01, %rA
je/jne...

which is undoubtedly longer than "lock dec" and wastes a register.
I can optimally implement the increment function with a bit of inline
assembly, but decrement is not so lucky, as there is no way to
inform the compiler the result is in the flags register. One must retreat
to something like that:

lock decl (ptr)
sete %rA

which GCC will finally use to perform a comparison in if(), emitting:

lock decl (ptr)
sete %rA
testb  %rA, %rA
je/jne...

which is hardly an improvement. On the other hand, the __sync functions
integrate perfectly with the flag system (i.e. the pairs like cmpxchg/jne),
so implementing the changes in the compiler gives far better opportunities
to emit an optimal sequence compared to what can inline assembly do.

As my code is to a high degree propelled by atomic power, I would like to
ask you to provide these functions or tweak the optimizer in order to notice
the aforementioned idioms.

There is also lack of generic __sync_exchange() -- quite an important operation
in lock-free programming. It could be implemented in terms of compare_exchange,
but many platforms have native support for it and thus it should be
exposed at the
API level, tweaking the optimizer is not the proper way IMHO.

Best regards,
Piotr Wydersk


Re: More atomic functions please

2011-05-12 Thread Joseph S. Myers
On Thu, 12 May 2011, Piotr Wyderski wrote:

> Hello,
> 
> I'm not sure if it should be better handled as missed optimization,
> but there is a certain lack of functionality in the GCC's __sync_*
> function family.

I don't think we should add new functions to that family; instead the aim 
should be to implement the functionality (built-in functions etc.) 
required for a good implementation of the C1x and C++0x atomics support, 
and recommend users to use those in future.

http://gcc.gnu.org/wiki/Atomic

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: More atomic functions please

2011-05-12 Thread Jakub Jelinek
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
> Hello,
> 
> I'm not sure if it should be better handled as missed optimization,
> but there is a certain lack of functionality in the GCC's __sync_*
> function family.
> 
> When implementing a reference counting smart pointer, two operations
> are of crucial importance:
> 
> void __sync_increment(T* p);
> bool __sync_decrement_iszero(T* p);
> 
> The former only increments the location pointed to by p, the latter decrements
> it and returns true if and only if the result was zero.
> 
> Both can be implemented in terms of existing __sync functions (and what
> can't? -- since there is __sync_bool_compare_and_swap()), e.g.:
> 
>void __sync_increment(T* p) {
> 
>   __sync_fetch_and_add(p, 1);
>}
> 
>   bool __sync_decrement(T* p) {
> 
>  return __sync_fetch_and_add(p, -1) == 1;
>   }

And that's the right thing to do.  If the generated code is not optimal, we 
should
just improve __sync_fetch_and_add code generation.
It isn't hard to special case addition of 1 or -1, and we already care whether
the result of the builtin is used or ignored.  For == 1 we could add some 
pattern that
combiner would merge.

Please file an enhancement request in gcc bugzilla.

Jakub


Re: More atomic functions please

2011-05-12 Thread Jakub Jelinek
On Thu, May 12, 2011 at 06:11:59PM +0200, Piotr Wyderski wrote:
> Unfortunately, onx86/x64 both are compiled in a rather poor way:
> 
> __sync_increment:
> 
> lock addl $x01,(ptr)
> 
> which is longer than:
> 
>lock incl (ptr)

GCC actually generates lock incl (ptr) already now, it just depends
on which CPU you optimize for.
  /* X86_TUNE_USE_INCDEC */
  ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),

So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
if addl is better (e.g. on Atom incl is very bad compared to addl $1),
it will generate it.

Jakub


gcc-4.5-20110512 is now available

2011-05-12 Thread gccadmin
Snapshot gcc-4.5-20110512 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20110512/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_5-branch 
revision 173717

You'll find:

 gcc-4.5-20110512.tar.bz2 Complete GCC (includes all of below)

  MD5=12f0c75f2b235dec5b87c4831dc477e3
  SHA1=44f24ef585da1a68f2f0d98b87d86830541e7ce3

 gcc-core-4.5-20110512.tar.bz2C front end and core compiler

  MD5=b00358e1c0d93b8ac6228972f5d1ca80
  SHA1=3a8248c56490b83ff2f42a818cc647c7cc657ab3

 gcc-ada-4.5-20110512.tar.bz2 Ada front end and runtime

  MD5=f99272c6d21b7ed01f72637c3962d094
  SHA1=7b0032e6dc688811ecb89507f3e9f24a6a1498fc

 gcc-fortran-4.5-20110512.tar.bz2 Fortran front end and runtime

  MD5=32202cf5257900baaa31eb5da9e6395d
  SHA1=f29bdb2328e89b34a9de43a156bda42b970f1ebe

 gcc-g++-4.5-20110512.tar.bz2 C++ front end and runtime

  MD5=6623d74e88a7b1e4f742106a997087c5
  SHA1=9c450b0ea6ba2134f5c80e4df733c6d83d101501

 gcc-go-4.5-20110512.tar.bz2  Go front end and runtime

  MD5=5912c4cc9d1966ee57132f15cda52dd5
  SHA1=28f46ef6e0eeada39f9bca31099bd8164b06fe95

 gcc-java-4.5-20110512.tar.bz2Java front end and runtime

  MD5=469a0097d034dcafd5fb13fefc665708
  SHA1=778ee54103c6c30c6f925600628ed1626f7e9928

 gcc-objc-4.5-20110512.tar.bz2Objective-C front end and runtime

  MD5=e20978097265a3d04bdfd3b0e95de0a7
  SHA1=765a2128abe8ec13a9201e1bc2c319ee0361930a

 gcc-testsuite-4.5-20110512.tar.bz2   The GCC testsuite

  MD5=34f99bfac27722938399d18462aa591b
  SHA1=ba9f9ad674a7e40c597f11fbf0cb8db10f860b38

Diffs from 4.5-20110505 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Can the size of pointers to data and text be different?

2011-05-12 Thread Ian Lance Taylor
fanqifei  writes:

> I am using gcc4.3.2.
> In our microcontroller, move instruction(mov reg, imm) can accept
> 16bits and 32bits immediate operand.
> The data memory size is less than 64KB, however, code memory size is
> larger than 64KB.
> The immediate operand may be addresses of variables in data sections
> and function pointers. The address of variables can be represented by
> 16bits. However, function pointers may be larger than 16bits.
> I'd like to use "mov reg, imm16" for addresses of variables and "mov
> reg, imm32" for function pointers. So that the code size can be a
> little bit smaller.
> Another way to understand the requirement is the size of pointers to
> data and text have to be different.
>
> How can I select appropriate mov for them? I tried to use LABEL_REF
> and SYMBOL_REL to distinguish between them, but it didn't help. It
> seems that function pointers are treated as symbols too.
> Are there any other cases that references to functions in text
> sections are used in data sections?

SYMBOL_REF_FUNCTION_P.

(A LABEL_REF refers to a goto label.)

Ian


Re: More atomic functions please

2011-05-12 Thread Piotr Wyderski
Jakub Jelinek wrote:

>  /* X86_TUNE_USE_INCDEC */
>  ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),
>
> So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
> if addl is better (e.g. on Atom incl is very bad compared to addl $1),
> it will generate it.

Why is lock inc/dec worse than add/sub on Core2I7?
The only difference I know of is the way the carry flag
is handled.

  Best regards,
  Piotr Wyderski


Re: More atomic functions please

2011-05-12 Thread Jakub Jelinek
On Fri, May 13, 2011 at 07:55:44AM +0200, Piotr Wyderski wrote:
> Jakub Jelinek wrote:
> 
> >  /* X86_TUNE_USE_INCDEC */
> >  ~(m_PENT4 | m_NOCONA | m_CORE2I7 | m_GENERIC | m_ATOM),
> >
> > So, if you say -mtune=bdver1 or -mtune=k8, it will generate incl,
> > if addl is better (e.g. on Atom incl is very bad compared to addl $1),
> > it will generate it.
> 
> Why is lock inc/dec worse than add/sub on Core2I7?
> The only difference I know of is the way the carry flag
> is handled.

Yeah, and that is exactly the problem, instructions that only sets a subset
of the flags is problematic.
See e.g. Intel's 248966.pdf, 3.5.1.1 "Use of the INC and DEC Instructions".

Jakub


Generate annotations for a binary translator

2011-05-12 Thread 陳韋任
Hi, all

  I am wondering if there is any possibility that gcc can
generate annotations like control flow, or register usage
into the executable. The idea comes from the paper below,

  Techniques to improve dynamic binary optimization
  http://www-users.cs.umn.edu/~adas/adas-thesis-embed.pdf

The paper lists annotations that may benefit a binary translator
on chapter 5. What a binary translator does is like QEMU.

  Take basic block (bb) register usage as an example. It's
good to a binary translator to know that a basic block
register usage information. Say, bb A whose binary address
starts from 0x100 to 0x120 does NOT use R1, then the binary
translator can use R1 for free.

  I know there is a data structure for basic block. But in
order to let a binary translator use the basic block register
usage information, each basic block must be associated with
its corresponding binary (virtual) address.

  If it is possible to generating such information, which
part of gcc should I look into first?

  Thanks!

Regards,
chenwj

-- 
Wei-Ren Chen (陳韋任)
Computer Systems Lab, Institute of Information Science,
Academia Sinica, Taiwan (R.O.C.)
Tel:886-2-2788-3799 #1667