https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96942
--- Comment #11 from Jonathan Wakely <redi at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #9)
> The most pronounced difference for depth=18 seems to be caused by m_b_r
> over-allocating by 2x: internally it mallocs 2x of the size given to the
> constructor, and then Linux pre-faults those extra pages, penalizing the
> benchmark.
It adds 11 bytes to the size given to the constructor (for its internal
bookkeeping) and then rounds up to a power of two.
Since the poolSize function actually returns sizeof(Node) more than it needs,
and sizeof(Node) > 11, the overallocation should be avoidable by simply fixing
poolSize to return the right value:
int poolSize(int depth)
{
return ((1 << (depth + 1)) - 1) * sizeof(Node);
}
The original function returns a power of two, but the code actually creates an
odd number of nodes (there is only one node at depth zero, not two).