//===----------------------------------------------------------------------===// // LLVM and Multiple Return Value Support //===----------------------------------------------------------------------===// 8/24/2004 [Note, this is dependant on CustomCallingConventions.txt] LLVM functions currently support a subset of the power of C functions: in particular, they can return zero or one values of first class type. C functions, on the other hand, can return multiple values in registers with some system ABIs (e.g. return complex numbers with the real and imaginary components in two different registers). In LLVM, if a function returns multiple values, we transform the function to take a pointer to a memory location, and store the return values through the pointer. This is inefficient for multiple reasons (temporary memory must be stack allocated, memory stores and the corresponding loads must be emitted, etc), but it also means that we cannot properly support those platforms that return multiple values in registers properly. Another annoying factor is that some ABI's have really wierd rules about what is returned in registers. In particular, an ABI might require a C99 complex number to be returned in registers, but an identical function that returns a pair of two doubles by-value might be required to return them by reference. The real problem with returning multiple values from functions is how to bind them on the caller side. In particular, we currently have the situation where the call instruction "is" the return value (when used as an operand to another instruction). When we have the situation where a call can define multiple values, there is no suitable value to use. Multiple return values are also critically important for one other future instruction as well: the future LLVM inline assembly instruction. In particular, inline assembly can define multiple result values that need to be bound to multiple SSA values (e.g. a fsincos instruction that computes the sine and cosine of the operand value, and many many others). Finally, one annoying, but trivial, nuance of the LLVM type-system is the 'void' type, which is only currently allowed as the return value of a function. Special cases like this should be eliminated. //===----------------------------------------------------------------------===// // Suggested approach: callee side // On the callee side, the return instruction will be enhanced to be an n-ary instruction. In particular, it will be perfectly valid to have an function like: {int, float} %foo() { ret int 4, float 17 } which returns both values "in registers". The containing function's type would use a structure type as its return type (this is currently illegal in LLVM). Finally, the LLVM 'void' type will be eliminated and become a synonym for {}. This means that these functions are equivalent: void %bar1() { ret void } {} %bar2() { ret void } ... The asmwriter should pretty print the '{}' type as void when in a call or function return type. Note that the aggregate returned by a function is extremely limited. In particular, a function may only return a first-class value or a structure type. If it returns a structure type, all elements of that structure are required to be first-class values (no nested structures or arrays are allowed). //===----------------------------------------------------------------------===// // Suggested approach: caller side // The description above is straight-forward, but does not address how multiple return values are bound to LLVM Value*'s. I suggest that this happens in two steps. First, the aggregate return value is bound to the Call instruction's Value* as an aggregate. Second, the individual components of the aggregate are accessed with simple LLVM instructions. Consider a call to %foo above: %Agg = call {int, float} %foo() %Agg.0 = getaggregatevalue {int, float} %Agg, uint 0 %Agg.1 = getaggregatevalue {int, float} %Agg, uint 1 In the example above, the 'getaggregatevalue' instructions bind the individual returned values from the aggregate LLVM value to individual ones. Because they may only be used on one-level structs, the first operand is always a struct, and the second operand is always a constant uint. Note that we limit the LLVM IR so that the only instruction that is allowed to use this aggregate as an operand is the 'getaggregatevalue' instruction. In particular, these aggregates are *not* allowed as operands to PHI nodes, cast instructions, or calls. Because of this, native code generation is completely trivial. //===----------------------------------------------------------------------===// // Native code generation // No targets support returning an arbitrary number of values in registers. In particular, the C ABI will specify when and if the target is required to return multiple values in registers for a particular C-level function call. Given this information, a C front-end (which is attempting to interoperate with the native ABI and calling conventions) would generate a function an call of "Calling Convention #1" (see CustomCallingConventions.txt for a description about how multiple calling conventions will be handled in the future). CC#1 is allowed to be restricted in arbitrary ways by the target-specific ABI. As such, any front-ends generating CC#1 calls must know about these restrictions and meet them. Consider the annoying example above, dealing with complex numbers. If the ABI specifies that a complex number return value is returned in two registers, this would result in an LLVM function that returns a '{double,double}' type. If the target requires that an equivalent function that returns a C struct containing two doubles be returned in memory, the function would compiled to the same LLVM code as it is today: a pointer is passed in and the return values are stored through. In the instruction selector, supporting this is also straight-forward. If a function uses CC#0, it may return an arbitrary number of values "in registers", and could easily be returning more values than the target has physical registers. Because CC#0 requires the callee and callers to agree on function prototypes, though, it is perfectly safe for the instruction selector to decide on the number and types of values that it allows to be returned in registers (for example, the X86 target might support only returning 3 integer and 3 FP values in registers). If a particular function is prototypes to return more registers than specified, the target can code gen the function to take a shadow argument with a pointer to a memory area to store the return values into. The important part about this is that it is trivial to statically determine on the caller and callee sides how a call site will be code generated, so the target can use any calling conventions that will lead to high performance (this is the whole point of CC#0 to begin with). If a function is using CC#1, we know that the number of registers returned must be supported by the target, otherwise it is a compile-time error. Due to the limitations on the 'getaggregatevalue' instruction we imposed above, the instruction selector for targets is very simple. In particular, there are two cases: either the return values are actually in registers, or the return values needed to be passed through memory anyway. If return values are passed through registers, the callee will be code generated like this (assuming X86): call foo mov %reg1024, EAX mov %reg1025, EDX When the call instruction is code generated, all of the getaggregatevalue instructions (Which are the only possible users of the call) are associated with the appropriate virtual registers (e.g. Agg.0 -> %reg1024). If the return value is in memory, the caller passes in the shadow argument (as specified by the target), and loads the values from memory after the call. In this example, we would get: push ESP+1234 ;; Ret val location call foo mov %reg1024, [ESP+1234] mov %reg1025, [ESP+1238] ... and code generation proceeds as normal. //===----------------------------------------------------------------------===// // Enhancements to the -argpromotion pass // The -argpromotion pass converts arguments passed by reference to be passed by value instead. For example, given a function like: int test(int*P) { return *P; } It would transform the function to pass *P by value instead of passing in the address of P (this allows for many subsequent optimizations in the callers). The result of this transformation is the following function: int test(int PV) { return PV; } The argpromotion pass currently does not handle pointers stored through, though it currently could do so if the function initially returns void. If functions could return multiple values, however, we could do this generically, transforming functions like: int test2(int *P) { *P = 42; return 1; } into: {int,int} test2() { return 1,42; } This optimization is particularly valuable because it would turn our simple struct pair returning function into a function that returns multiple values in registers. In particular, given a C function like: struct pair { int X, int Y }; struct pair foo() { struct pair P; P.x = 1; P.Y = 2; return P; } ... and an ABI that says that the struct must be returned in memory, argpromotion would turn it into the substantially more efficient LLVM function: {int,int} %foo() { return int 1, int 2 } //===----------------------------------------------------------------------===// // Other enhancements: We could introduce llvm.sincos.f32/f64 with prototypes of: {double,double} %llvm.sincos(double %x); Transforming sincos into llvm.sincos would eliminate the need for the argument to be addressible (e.g. exposing it to SROA/mem2reg). It can be lowered in the code generator to sincos if needed.