Re: Call for compiler help/advice: atomic builtins for v3

2005-11-06 Thread Peter Dimov

Richard Henderson wrote:


To keep all this in perspective, folks should remember that atomic
operations are *slow*.  Very very slow.  Orders of magnitude slower
than function calls.  Seriously.  Taking p4 as the extreme example,
one can expect a null function call in around 10 cycles, but a locked
memory operation to take 1000.  Usually things aren't that bad, but
I believe some poor design decisions were made for p4 here.  But even
on a platform without such problems you can expect a factor of 30
difference.


Apologies in advance if the following is not relevant...

Even on a P4, inlining may enable compiler optimizations. One case is when 
the compiler can see that the return value of __sync_fetch_and_or (for 
instance) isn't used. It's possible to use a wait-free "lock or" instead of 
a "lock cmpxchg" loop (MSVC 8 does this for _InterlockedOr.)


Another case is when inlining results in a sequence of K adjacent 
__sync_fetch_and_add( &x, 1 ) operations. These can legally be replaced with 
a single __sync_fetch_and_add.


Currently the __sync_* intrinsics seem to be fully locked, but if 
acquire/release/unordered variants are added, other platforms may also 
suffer from lack of inlining. On a PowerPC an unordered atomic increment is 
pretty much the same speed as an ordinary increment (when there is no 
contention.) 



Re: Empty loops removal (Was Re: Some extra decorations)

2009-05-03 Thread Peter Dimov

Jonathan Wakely:

2009/5/4 Joseph S. Myers:
> On Mon, 4 May 2009, Jan Hubicka wrote:
>
>> On mainline I enabled infinite loop removal at
>> -funsafe-loop-optimizations. I would suggest adding
>> -fempty-loops-terminate and make it default for C++? It does not apply
>> for C, right?
>
> You mean for C++0x (I see no such rule in C++03), and there is no such
> rule for C at present.

Yes, the rule is new for C++0x, and it is in the context of for, while
and do-while loops only, not recursive calls.


It might be worth raising this issue on c++std-core, because it's easy for a 
compiler to transform recursion to a loop using tail call elimination, and I 
suspect that it is in line with the original intent to treat recursion with 
no side effects as finite in the same way. 



Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

On 05/20/2010 01:10 PM, Paolo Carlini wrote:

... for reference, it would be something like this (in my recollections,
it was even uglier ;)

template
  _Tp*
  addressof(_Tp& __v)
  {
return reinterpret_cast<_Tp*>
  (&const_cast(reinterpret_cast(__v)));
  }


It's uglier because the code above doesn't work for functions, and because 
of compiler bugs.



By the way, Peter (I think you are the author of the current boost
implementation, which I looked at yesterday), in case we end up having
something like the above, temporarily at least, which kind of
acknowledgment would you be Ok with? Is it enough your name in the
ChangeLog?


Any kind of acknowledgment is fine with me, including none at all. Whichever 
you prefer. :-)





Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow 
references to functions to be casted to char&, and I believe the standard 
doesn't permit that, either. 



Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

Paolo Carlini wrote:

On 05/20/2010 02:18 PM, Peter Dimov wrote:

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow
references to functions to be casted to char&, and I believe the
standard doesn't permit that, either.

I see. I'm a bit reluctant to add complexity to the code, given that
current Comeau and Intel, at least, in strict-mode, also like it...


If it works, there's certainly no need to add complexity.

Here's the ticket that prompted the boost::addressof changes:

https://svn.boost.org/trac/boost/ticket/1846

but it doesn't say which compiler didn't like it at the time. MSVC 8.0 also 
does. 



Re: [RFC] Implementing addressof for C++0x

2010-05-20 Thread Peter Dimov

Jason Merrill wrote:

On 05/20/2010 08:18 AM, Peter Dimov wrote:

On 05/20/2010 01:55 PM, Paolo Carlini wrote:

It's uglier because the code above doesn't work for functions,


By the way, do you have a specific testcase in mind?

Because addressof_fn_test.cpp, part of Boost, passes...


This is probably a g++/gcc extension... some compilers do not allow
references to functions to be casted to char&, and I believe the
standard doesn't permit that, either.


The standard permits a compiler to accept or reject such a cast.

5.2.10/8: Converting a pointer to a function into a pointer to an object 
type or vice versa is conditionally-supported.


Thanks; that is, then, why the latest Comeau accepts it. It didn't occur to 
me to try the earlier versions on http://www.comeaucomputing.com/tryitout/ - 
they reject the code. This paragraph is a new addition, not present in 
C++03; "conditionally supported" is a C++0x-ism. :-) 



Re: Recent libstdc++ regression on i686-linux: abi/cxx_runtime_only_linkage.cc

2008-08-26 Thread Peter Dimov

Mark Mitchell:

Richard Henderson wrote:

H.J. Lu wrote:

Can we declare that Linux/ia32 generates i486 insn by default?


We the gcc team?  I'm not sure.  For now I'll say no.

We an individual linux distributor?  Certainly.
In fact I would be surprised if i586 wasn't a
decent minimum these days.


I agree.  We the GCC team have to accept that some CPUs may not have the
ability to do this.  That might be old x86 CPUs; it might also be
brand-new embedded CPUs.


Setting the default to -march=i486 will still let people who target i386 to 
use -march=i386.


The problem, from the point of view of a library such as boost::shared_ptr, 
is that there is no way to distinguish between user A, who just types g++ 
foo.cpp and expects to get a program that works well on a typical machine, 
and user B, who types g++ -march=i386 foo.cpp, with the explicit intent to 
run the result on a 386.


Since A users outnumber B users, boost::shared_ptr assumes A and uses 486 
atomic instructions even though __i486__ is not defined.


Has the default been 486, I'd be able to recognize user B's intent and not 
use 486 instructions.


(Not that I've ever received a bug report about shared_ptr not working on 
386.)




Re: Recent libstdc++ regression on i686-linux: abi/cxx_runtime_only_linkage.cc

2008-08-27 Thread Peter Dimov

Paolo Carlini:

Peter Dimov wrote:

The problem, from the point of view of a library such as
boost::shared_ptr, is that there is no way to distinguish between user A,
who just types g++ foo.cpp and expects to get a program that works well
on a typical machine, and user B, who types g++ -march=i386 foo.cpp, with
the explicit intent to run the result on a 386.

Maybe "no way" is a tad too strong: now we have
|__GCC_HAVE_SYNC_COMPARE_AND_SWAP_? and more could be added...


I may be missing something, but doesn't testing __i486__ give me the same
information as __HAVE_CAS_x in this case?

The problem is not that the library cannot distinguish between -m386
and -m486; the problem is that it cannot distinguish between explicit -m386
and implicit -m386.

This is an issue because many users target i386 by accident and not by
design simply because it is the default in many g++ installations.

In practice, when one does:

g++ foo.cpp
g++ -m586 bar.cpp
g++ foo.o bar.o

it is reasonable to expect the end result to work on a 586 or better.

But if a library header uses spinlocks on 386 and inlined __sync on 586, the
code will fail in subtle ways, because the manipulation of some shared
variables may no longer be atomic.

The only solution today for the above situation is to ignore the lack of
__i486__ and consistently use cmpxchg. This of course is not good for people
who explicitly target i386.

If g++ defaults to i486, the libraries can use the lack of __i486__ as a 
definite sign of the user explicitly targeting i386, in which case they can 
safely refrain from using cmpxchg/xadd without fear of breaking the above 
example.