Hello,
I've come across an issue when working on a smart pointer
implementation. Gcc does not seem to propagate constants enough, missing
some optimization opportunities. I don't think that this issue is
specific to smart pointers, so there might be other cases when gcc
generates suboptimal code.
Attached a simple test case. The smart pointer here is a unique pointer,
always only a single instance holds a raw pointer to the resource. The
deletion can be customized through a policy class. In main(), I allocate
an int, then pass it through several smart pointers. At the end, the
last smart pointer holds the raw pointer to the allocated memory.
Compiled as:
g++ -g -O3 -o gccoptbug.o -c gccoptbug.cpp
g++ -o gccoptbug gccoptbug.o
The generated code on AMD64 looks like this:
0x00000000004004d0 <+0>: sub $0x8,%rsp
0x00000000004004d4 <+4>: mov $0x4,%edi
0x00000000004004d9 <+9>: callq 0x4004c0 <_Znwm@plt> ; operator new
0x00000000004004de <+14>: mov %rax,%rdi
0x00000000004004e1 <+17>: callq 0x4004a0 <_ZdlPv@plt> ; operator
delete
0x00000000004004e6 <+22>: xor %edi,%edi
0x00000000004004e8 <+24>: callq 0x4004a0 <_ZdlPv@plt>
0x00000000004004ed <+29>: xor %edi,%edi
0x00000000004004ef <+31>: callq 0x4004a0 <_ZdlPv@plt>
0x00000000004004f4 <+36>: xor %edi,%edi
0x00000000004004f6 <+38>: callq 0x4004a0 <_ZdlPv@plt>
0x00000000004004fb <+43>: xor %edi,%edi
0x00000000004004fd <+45>: callq 0x4004a0 <_ZdlPv@plt>
0x0000000000400502 <+50>: xor %eax,%eax
0x0000000000400504 <+52>: add $0x8,%rsp
0x0000000000400508 <+56>: retq
The allocated memory is freed, then op delete is called four times with
a 0 pointer. The dtor and the called deleter fn was inlined. So far so good.
If I modify the deleter policy to call op delete only when the pointer
is not zero (#if 1 at line 6), the generated code changes to:
0x00000000004004d0 <+0>: sub $0x58,%rsp
0x00000000004004d4 <+4>: mov $0x4,%edi
0x00000000004004d9 <+9>: callq 0x4004c0 <_Znwm@plt>
0x00000000004004de <+14>: lea 0x40(%rsp),%rdi
0x00000000004004e3 <+19>: mov %rax,0x40(%rsp)
0x00000000004004e8 <+24>: movq $0x0,(%rsp)
0x00000000004004f0 <+32>: movq $0x0,0x10(%rsp)
0x00000000004004f9 <+41>: movq $0x0,0x20(%rsp)
0x0000000000400502 <+50>: movq $0x0,0x30(%rsp)
0x000000000040050b <+59>: callq 0x400630 <Ptr<int, Deleter<int>
>::~Ptr()>
0x0000000000400510 <+64>: lea 0x30(%rsp),%rdi
0x0000000000400515 <+69>: callq 0x400630 <Ptr<int, Deleter<int>
>::~Ptr()>
0x000000000040051a <+74>: lea 0x20(%rsp),%rdi
0x000000000040051f <+79>: callq 0x400630 <Ptr<int, Deleter<int>
>::~Ptr()>
0x0000000000400524 <+84>: lea 0x10(%rsp),%rdi
0x0000000000400529 <+89>: callq 0x400630 <Ptr<int, Deleter<int>
>::~Ptr()>
0x000000000040052e <+94>: mov %rsp,%rdi
0x0000000000400531 <+97>: callq 0x400630 <Ptr<int, Deleter<int>
>::~Ptr()>
0x0000000000400536 <+102>: xor %eax,%eax
0x0000000000400538 <+104>: add $0x58,%rsp
0x000000000040053c <+108>: retq
Instead of eliminating the calls to op delete, the actual smart ptr
objects appear on the stack, and the dtor is not inlined anymore.
gcc 4.4 and 4.5 optimizes as expected:
0x0000000000400640 <+0>: sub $0x8,%rsp
0x0000000000400644 <+4>: mov $0x4,%edi
0x0000000000400649 <+9>: callq 0x400540 <_Znwm@plt>
0x000000000040064e <+14>: test %rax,%rax
0x0000000000400651 <+17>: je 0x40065b <main()+27>
0x0000000000400653 <+19>: mov %rax,%rdi
0x0000000000400656 <+22>: callq 0x400510 <_ZdlPv@plt>
0x000000000040065b <+27>: xor %eax,%eax
0x000000000040065d <+29>: add $0x8,%rsp
0x0000000000400661 <+33>: retq
4.6 and 4.7 (r182889) generates the suboptimal code as above.
I've checked bugzilla, and #46076
(http://gcc.gnu.org/bugzilla/show_bug.cgi?id=46076) is related, I guess.
There, Jan Hubicka 2010-10-19 03:20:48 UTC writes that main() is
optimized for size. To check this, I've added foo() to the test case,
and it is optimized correctly w/ 4.6 and 4.7. Moreover, -Os produces the
same foo() and main() functions. However, the size optimized version is
more than 3 times as large as the other one. Is this normal?
Regards, Peter
template<typename T>
struct Deleter
{
static void Delete(T* p_)
{
#if 0 // if enabled, Delete() is not inlined
if (p_)
#endif
delete p_;
}
};
template<typename T, class D = Deleter<T> >
class Ptr
{
public:
Ptr() : m_ptr(0)
{
}
Ptr(T* p_) : m_ptr(p_)
{
}
Ptr(const Ptr& p_) : m_ptr(p_.Forget())
{
}
~Ptr()
{
D::Delete(m_ptr);
}
T* Forget() const
{
T* s = m_ptr;
m_ptr = 0;
return s;
}
private:
mutable T* m_ptr;
};
int main()
{
typedef Ptr<int> MyPtr;
MyPtr p0 = new int;
MyPtr p1 = p0;
MyPtr p2 = p1;
MyPtr p3 = p2;
MyPtr p4 = p3;
}