On 11/5/2021 8:13 AM, Andrew Appel wrote:
Dear GCC developers:
TL;DR Can I talk to a maintainer of gcc's tail-call optimizer to learn more
about the very impressive technology behind it? Or is there a paper, or
documentation, that explains it?
No paper or documentation I'm aware of.
I am sure you know that gcc does a much better job of tail-call optimizations
than its competitors. Recently I noticed this little gem:
void f( void *p);
int h( int n, void *p)
{
{
void *a[ 10 ];
a[ 0 ]= p;
f(( void *)a);
}
return h(n, p);
}
With the inner curly braces, gcc turns the call to h into a jump. Without the
inner curly braces, gcc does not do tail-call optimization. This is exactly
correct, because we cannot know whether f() stores its argument in a place
where later calls to h() might access it. The inner braces ensure that a[] is
officially deallocated before the call to h.
Right. The lifetime of the object is critical to knowing if its safe to
tail call.
The easiest way to see what GCC does is via its debugging dumps. In
this particular case we're dealing with tail recursion which is
happening entirely in the gimple optimizers. So
-fdump-tree-all-blocks-details will give you the relevant debugging dumps.
The dumps are numbered based on the pass ordering in the compiler.
Looking at the pass before "tailr1" we have:
;; Function h (h, funcdef_no=0, decl_uid=1938, cgraph_uid=1,
symbol_order=0)
h (int n, void * p)
{
void * a[10];
int _8;
;; basic block 2, loop depth 0, maybe hot
;; prev block 0, next block 1, flags: (NEW, VISITED)
;; pred: ENTRY (FALLTHRU,EXECUTABLE)
a[0] = p_2(D);
f (&a);
a ={v} {CLOBBER};
_8 = h (n_6(D), p_2(D));
return _8;
;; succ: EXIT (EXECUTABLE)
}
Note the a = {v}{CLOBBER}. That's the end of the lifetime of "a" and
it's how we know tail recursion optimization is safe in this particular
case. In the .tailr1 dump we have:
;; Function h (h, funcdef_no=0, decl_uid=1938, cgraph_uid=1, symbol_order=0)
Eliminated tail recursion in bb 2 : _8 = h (n_6(D), p_2(D));
Updating SSA:
creating PHI node in block #2 for .MEM
Registering new PHI nodes in block #0
Registering new PHI nodes in block #2
Updating SSA information for statement a[0] = p_2(D);
Updating SSA information for statement f (&a);
Updating SSA information for statement a ={v} {CLOBBER};
Symbols to be put in SSA form
{ D.1944 }
Incremental SSA update started at block: 0
Number of blocks in CFG: 3
Number of blocks to update: 2 ( 67%)
Affected blocks: 0 2
fix_loop_structure: fixing up loops for function
flow_loops_find: discovered new loop 1 with header 2
h (int n, void * p)
{
void * a[10];
;; basic block 2, loop depth 1, maybe hot
;; prev block 0, next block 1, flags: (NEW, VISITED)
;; pred: ENTRY (FALLTHRU,EXECUTABLE)
;; 2 (FALLTHRU,DFS_BACK,EXECUTABLE)
a[0] = p_2(D);
f (&a);
a ={v} {CLOBBER};
goto <bb 2>; [INV]
;; succ: 2 (FALLTHRU,DFS_BACK,EXECUTABLE)
}
And you can see we've turned the tail recursive call into a simple jump
which creates a loop.
Clang/llvm does not tail-call-optimize this program.
So, congratulations.
I would enjoy speaking to one of you who is knowledgeable about how gcc does
this.
A few years ago I wrote some compiler textbooks,
Modern Compiler Implementation in C, in ML, in Java (Cambridge University Press)
and I am still interested in these things. More to the point, I use C as the
target for a functional-language compiler, for which tail-call optimization is
very important.
Yup. Got it on my bookshelf somewhere.
So please let me know,
* is there a paper somewhere that describes the approach that gcc takes for TCO?
* I'm particularly interested in how gcc's intermediate language represents the
fact that a[] is deallocated at the first close-curly-brace
* Would someone like to Zoom sometime to tell me about this?
The dump files are going to be the best window into this stuff. There
are cases where we just mark potential tail call sites, but have to
defer the final decision until later in the optimizer pipeline, but this
case is simple enough that we don't need to query the target's
capabilities, look at stack usage, worry about argument mappings, etc.
jeff