//===----------------------------------------------------------------------===//
//             Guaranteed Efficient Tail call support for LLVM
//===----------------------------------------------------------------------===//

Sep 5, 2004 - Initial revision

See Also: Custom Calling Conventions in LLVM


Efficient tail call support is the ability for a function call to free the stack
space of a caller before the callee begins executing.  Because the stack space
for the caller is freed before the callee runs, many problems which would
otherwise require a linear amount of stack space will run in a constant amount
of space.

Efficient support for aritrary tail calls is absolutely essential for a broad
class of languages, particularly functional languages.  In particular, the
Scheme language requires, as part of its language definition, that all
implementations implement arbitrary tail calls efficiently (even indirect
calls!).

Efficient tail calls are also an important optimization that LLVM should support
for traditional languages.  Consider the following C fragment:

void foo() {
  if (cond)
    bar();
  else
    baz();
}

Both of the calls to the subfunctions are eligible for tail call elimination
through the most trivial analysis.  By knowing that these calls are tail calls,
LLVM could generate direct branches to the prolog of baz and bar instead of
calls and a ret from foo.

LLVM currently supports an optimization named 'tail recursion elimination'.
This optimization converts self recursive functions who have simple tail calls
into explicit loops.  There is a large amount of overlap between tail recursion
elimination and proper tail call support, but proper tail call support is more
general: it applies to arbitrary tail calls to arbitrary (even unknown)
functions.

One of the problems with tail call optimization is knowing when it is *safe* to
perform the optimization.  This information can either come through analysis
(e.g. DSA), or through intrinsic information known by the LLVM front-end (e.g.,
that no code is produced that is not eligible for a tail call, as is in the 
case of scheme).  Note that the analysis is non-trivial, as must prove that the
called function cannot access anything on the stack of the caller.

//===----------------------------------------------------------------------===//
// Proposed implementation
//===----------------------------------------------------------------------===//

Because the analysis to prove a call safe to be a tail call can be non-trivial,
but the information about what calls are legal tail calls is trivial to
maintain, this information should be kept as a bit in on the LLVM call
instruction.  This bit indicates that the called function is known to not access
the stack of the caller, thus any alloca instructions in the caller can be
deallocated before the call.

Note that 'invoke' instructions (aka calls that can throw exceptions, and which
have local handlers) are never safe to be marked for tail calls: if the callee
function unwinds, the discarded stack would be needed.

At the LLVM level, the information about which calls are suitable to be tail
calls is simply maintained and updated as appropriate.  For example, the inliner
would need to clear tailcall flags when inlining functions into callers with
stack allocations.  Aside from preserving this information, the LLVM level would
have little use for it.  Suggested .ll syntax:

  %X = tail call int %foo()
  ret int %X

At the code generator level, the instruction selector should "do its best" to
render any calls marked 'tail' as correct and efficient tail calls.  Note that,
in general, this is impossible with the C call conventions and ABIs required on
most platforms.  The primary reason for this is that the C language allows
variable argument functions (which are incompatible with correct tail calls) and
does not require agreement on the prototype of the callee and caller of a
function.  Clearly this contradicts our goal of supporting correct tail calls
100% of the time (as required by the Scheme language, for example).

The solution to this is to implement the "Custom Calling Conventions" notes. In
particular, note that CC#0 is carefully defined to be the most efficient calling
convention possible, and it requires that both the caller and callee function
prototypes match (casting a function pointer to a differing function pointer
type and calling through it is undefined).  Because of this, for non-varargs
functions, the target should pass parameters in "pascal order", permitting
efficient tail calls.

CC#0 will be, by far, the most common calling convention that is found in C
programs that have been link-time optimized, but there is no guarantee, in
general, that the optimizer will be able to prove the CC#1 -> CC#0 conversion
safe.  Because of this, languages that do not require all functions to be
compatible with C calling conventions should default to marking their functions
as CC#0 explicitly: this will permit guaranteed predictable tail calls in all
non-varargs cases (which are all of the cases possible).