Re: Call for compiler help/advice: atomic builtins for v3
Richard Henderson wrote: To keep all this in perspective, folks should remember that atomic operations are *slow*. Very very slow. Orders of magnitude slower than function calls. Seriously. Taking p4 as the extreme example, one can expect a null function call in around 10 cycles, but a locked memory operation to take 1000. Usually things aren't that bad, but I believe some poor design decisions were made for p4 here. But even on a platform without such problems you can expect a factor of 30 difference. Apologies in advance if the following is not relevant... Even on a P4, inlining may enable compiler optimizations. One case is when the compiler can see that the return value of __sync_fetch_and_or (for instance) isn't used. It's possible to use a wait-free "lock or" instead of a "lock cmpxchg" loop (MSVC 8 does this for _InterlockedOr.) Another case is when inlining results in a sequence of K adjacent __sync_fetch_and_add( &x, 1 ) operations. These can legally be replaced with a single __sync_fetch_and_add. Currently the __sync_* intrinsics seem to be fully locked, but if acquire/release/unordered variants are added, other platforms may also suffer from lack of inlining. On a PowerPC an unordered atomic increment is pretty much the same speed as an ordinary increment (when there is no contention.)
Re: Empty loops removal (Was Re: Some extra decorations)
Jonathan Wakely: 2009/5/4 Joseph S. Myers: > On Mon, 4 May 2009, Jan Hubicka wrote: > >> On mainline I enabled infinite loop removal at >> -funsafe-loop-optimizations. I would suggest adding >> -fempty-loops-terminate and make it default for C++? It does not apply >> for C, right? > > You mean for C++0x (I see no such rule in C++03), and there is no such > rule for C at present. Yes, the rule is new for C++0x, and it is in the context of for, while and do-while loops only, not recursive calls. It might be worth raising this issue on c++std-core, because it's easy for a compiler to transform recursion to a loop using tail call elimination, and I suspect that it is in line with the original intent to treat recursion with no side effects as finite in the same way.
Re: [RFC] Implementing addressof for C++0x
On 05/20/2010 01:10 PM, Paolo Carlini wrote: ... for reference, it would be something like this (in my recollections, it was even uglier ;) template _Tp* addressof(_Tp& __v) { return reinterpret_cast<_Tp*> (&const_cast(reinterpret_cast(__v))); } It's uglier because the code above doesn't work for functions, and because of compiler bugs. By the way, Peter (I think you are the author of the current boost implementation, which I looked at yesterday), in case we end up having something like the above, temporarily at least, which kind of acknowledgment would you be Ok with? Is it enough your name in the ChangeLog? Any kind of acknowledgment is fine with me, including none at all. Whichever you prefer. :-)
Re: [RFC] Implementing addressof for C++0x
On 05/20/2010 01:55 PM, Paolo Carlini wrote: It's uglier because the code above doesn't work for functions, By the way, do you have a specific testcase in mind? Because addressof_fn_test.cpp, part of Boost, passes... This is probably a g++/gcc extension... some compilers do not allow references to functions to be casted to char&, and I believe the standard doesn't permit that, either.
Re: [RFC] Implementing addressof for C++0x
Paolo Carlini wrote: On 05/20/2010 02:18 PM, Peter Dimov wrote: On 05/20/2010 01:55 PM, Paolo Carlini wrote: It's uglier because the code above doesn't work for functions, By the way, do you have a specific testcase in mind? Because addressof_fn_test.cpp, part of Boost, passes... This is probably a g++/gcc extension... some compilers do not allow references to functions to be casted to char&, and I believe the standard doesn't permit that, either. I see. I'm a bit reluctant to add complexity to the code, given that current Comeau and Intel, at least, in strict-mode, also like it... If it works, there's certainly no need to add complexity. Here's the ticket that prompted the boost::addressof changes: https://svn.boost.org/trac/boost/ticket/1846 but it doesn't say which compiler didn't like it at the time. MSVC 8.0 also does.
Re: [RFC] Implementing addressof for C++0x
Jason Merrill wrote: On 05/20/2010 08:18 AM, Peter Dimov wrote: On 05/20/2010 01:55 PM, Paolo Carlini wrote: It's uglier because the code above doesn't work for functions, By the way, do you have a specific testcase in mind? Because addressof_fn_test.cpp, part of Boost, passes... This is probably a g++/gcc extension... some compilers do not allow references to functions to be casted to char&, and I believe the standard doesn't permit that, either. The standard permits a compiler to accept or reject such a cast. 5.2.10/8: Converting a pointer to a function into a pointer to an object type or vice versa is conditionally-supported. Thanks; that is, then, why the latest Comeau accepts it. It didn't occur to me to try the earlier versions on http://www.comeaucomputing.com/tryitout/ - they reject the code. This paragraph is a new addition, not present in C++03; "conditionally supported" is a C++0x-ism. :-)
Re: Recent libstdc++ regression on i686-linux: abi/cxx_runtime_only_linkage.cc
Mark Mitchell: Richard Henderson wrote: H.J. Lu wrote: Can we declare that Linux/ia32 generates i486 insn by default? We the gcc team? I'm not sure. For now I'll say no. We an individual linux distributor? Certainly. In fact I would be surprised if i586 wasn't a decent minimum these days. I agree. We the GCC team have to accept that some CPUs may not have the ability to do this. That might be old x86 CPUs; it might also be brand-new embedded CPUs. Setting the default to -march=i486 will still let people who target i386 to use -march=i386. The problem, from the point of view of a library such as boost::shared_ptr, is that there is no way to distinguish between user A, who just types g++ foo.cpp and expects to get a program that works well on a typical machine, and user B, who types g++ -march=i386 foo.cpp, with the explicit intent to run the result on a 386. Since A users outnumber B users, boost::shared_ptr assumes A and uses 486 atomic instructions even though __i486__ is not defined. Has the default been 486, I'd be able to recognize user B's intent and not use 486 instructions. (Not that I've ever received a bug report about shared_ptr not working on 386.)
Re: Recent libstdc++ regression on i686-linux: abi/cxx_runtime_only_linkage.cc
Paolo Carlini: Peter Dimov wrote: The problem, from the point of view of a library such as boost::shared_ptr, is that there is no way to distinguish between user A, who just types g++ foo.cpp and expects to get a program that works well on a typical machine, and user B, who types g++ -march=i386 foo.cpp, with the explicit intent to run the result on a 386. Maybe "no way" is a tad too strong: now we have |__GCC_HAVE_SYNC_COMPARE_AND_SWAP_? and more could be added... I may be missing something, but doesn't testing __i486__ give me the same information as __HAVE_CAS_x in this case? The problem is not that the library cannot distinguish between -m386 and -m486; the problem is that it cannot distinguish between explicit -m386 and implicit -m386. This is an issue because many users target i386 by accident and not by design simply because it is the default in many g++ installations. In practice, when one does: g++ foo.cpp g++ -m586 bar.cpp g++ foo.o bar.o it is reasonable to expect the end result to work on a 586 or better. But if a library header uses spinlocks on 386 and inlined __sync on 586, the code will fail in subtle ways, because the manipulation of some shared variables may no longer be atomic. The only solution today for the above situation is to ignore the lack of __i486__ and consistently use cmpxchg. This of course is not good for people who explicitly target i386. If g++ defaults to i486, the libraries can use the lack of __i486__ as a definite sign of the user explicitly targeting i386, in which case they can safely refrain from using cmpxchg/xadd without fear of breaking the above example.