//===----------------------------------------------------------------------===// // Guaranteed Efficient Tail call support for LLVM //===----------------------------------------------------------------------===// Sep 5, 2004 - Initial revision See Also: Custom Calling Conventions in LLVM Efficient tail call support is the ability for a function call to free the stack space of a caller before the callee begins executing. Because the stack space for the caller is freed before the callee runs, many problems which would otherwise require a linear amount of stack space will run in a constant amount of space. Efficient support for aritrary tail calls is absolutely essential for a broad class of languages, particularly functional languages. In particular, the Scheme language requires, as part of its language definition, that all implementations implement arbitrary tail calls efficiently (even indirect calls!). Efficient tail calls are also an important optimization that LLVM should support for traditional languages. Consider the following C fragment: void foo() { if (cond) bar(); else baz(); } Both of the calls to the subfunctions are eligible for tail call elimination through the most trivial analysis. By knowing that these calls are tail calls, LLVM could generate direct branches to the prolog of baz and bar instead of calls and a ret from foo. LLVM currently supports an optimization named 'tail recursion elimination'. This optimization converts self recursive functions who have simple tail calls into explicit loops. There is a large amount of overlap between tail recursion elimination and proper tail call support, but proper tail call support is more general: it applies to arbitrary tail calls to arbitrary (even unknown) functions. One of the problems with tail call optimization is knowing when it is *safe* to perform the optimization. This information can either come through analysis (e.g. DSA), or through intrinsic information known by the LLVM front-end (e.g., that no code is produced that is not eligible for a tail call, as is in the case of scheme). Note that the analysis is non-trivial, as must prove that the called function cannot access anything on the stack of the caller. //===----------------------------------------------------------------------===// // Proposed implementation //===----------------------------------------------------------------------===// Because the analysis to prove a call safe to be a tail call can be non-trivial, but the information about what calls are legal tail calls is trivial to maintain, this information should be kept as a bit in on the LLVM call instruction. This bit indicates that the called function is known to not access the stack of the caller, thus any alloca instructions in the caller can be deallocated before the call. Note that 'invoke' instructions (aka calls that can throw exceptions, and which have local handlers) are never safe to be marked for tail calls: if the callee function unwinds, the discarded stack would be needed. At the LLVM level, the information about which calls are suitable to be tail calls is simply maintained and updated as appropriate. For example, the inliner would need to clear tailcall flags when inlining functions into callers with stack allocations. Aside from preserving this information, the LLVM level would have little use for it. Suggested .ll syntax: %X = tail call int %foo() ret int %X At the code generator level, the instruction selector should "do its best" to render any calls marked 'tail' as correct and efficient tail calls. Note that, in general, this is impossible with the C call conventions and ABIs required on most platforms. The primary reason for this is that the C language allows variable argument functions (which are incompatible with correct tail calls) and does not require agreement on the prototype of the callee and caller of a function. Clearly this contradicts our goal of supporting correct tail calls 100% of the time (as required by the Scheme language, for example). The solution to this is to implement the "Custom Calling Conventions" notes. In particular, note that CC#0 is carefully defined to be the most efficient calling convention possible, and it requires that both the caller and callee function prototypes match (casting a function pointer to a differing function pointer type and calling through it is undefined). Because of this, for non-varargs functions, the target should pass parameters in "pascal order", permitting efficient tail calls. CC#0 will be, by far, the most common calling convention that is found in C programs that have been link-time optimized, but there is no guarantee, in general, that the optimizer will be able to prove the CC#1 -> CC#0 conversion safe. Because of this, languages that do not require all functions to be compatible with C calling conventions should default to marking their functions as CC#0 explicitly: this will permit guaranteed predictable tail calls in all non-varargs cases (which are all of the cases possible).