On Wednesday, 31 July 2013 at 11:15:31 UTC, Joseph Rushton
Wakeling wrote:
Hi all,
When playing with the graph library code, I noticed something
odd.
Here's a function to calculate the neighbours of a vertex:
auto neighbours(immutable size_t v)
{
immutable size_t start = _sumTail[v] + _sumHead[v];
immutable size_t end = _sumTail[v + 1] + _sumHead[v
+ 1];
if(!_cacheNeighbours[v])
{
size_t j = start;
foreach (i; _sumTail[v] .. _sumTail[v + 1])
{
_neighbours[j] = _head[_indexTail[i]];
++j;
}
foreach (i; _sumHead[v] .. _sumHead[v + 1])
{
_neighbours[j] = _tail[_indexHead[i]];
++j;
}
assert(j == end);
_cacheNeighbours[v] = true;
}
return _neighbours[start .. end];
}
Now, I noticed that if instead of declaring the variables
start, end, I instead
manually write out these expressions in the code, I get a small
but consistent
speedup in the program.
So, I'm curious (i) Why? As I'd have assumed the compiler
could optimize away
unnecessary variables like this, and (ii) is there a way of
declaring start/end
in the code such that at compile time the correct expression
will be put in
place where it's needed?
I'm guessing some kind of template solution, but I didn't get
it working (I
probably need to study templates more:-).
(... knocks head against wall to try and dislodge current
micro-optimization
obsession ...)
There is no telling, as bear said, compare the disassemblies(you
can try it on dpaste and compare).
I do not believe D's code generation is optimal from what I have
seen(at least for DMD). It could be cache misses due to
pipelining issues(the order of instructions matter) or other
weird stuff.
It could be that when you alias start and end, D ends up using a
different algorithm that somehow changes the code generation.
I don't think D's code generation is even close to being as
mature as many of the common C++ compilers.