//===----------------------------------------------------------------------===//
//                  LLVM and Multiple Return Value Support
//===----------------------------------------------------------------------===//

8/24/2004     [Note, this is dependant on CustomCallingConventions.txt]

LLVM functions currently support a subset of the power of C functions: in
particular, they can return zero or one values of first class type.  C
functions, on the other hand, can return multiple values in registers with some
system ABIs (e.g. return complex numbers with the real and imaginary components
in two different registers).

In LLVM, if a function returns multiple values, we transform the function to
take a pointer to a memory location, and store the return values through the
pointer.  This is inefficient for multiple reasons (temporary memory must be
stack allocated, memory stores and the corresponding loads must be emitted,
etc), but it also means that we cannot properly support those platforms that
return multiple values in registers properly.

Another annoying factor is that some ABI's have really wierd rules about what
is returned in registers.  In particular, an ABI might require a C99 complex
number to be returned in registers, but an identical function that returns a
pair of two doubles by-value might be required to return them by reference.

The real problem with returning multiple values from functions is how to bind
them on the caller side.  In particular, we currently have the situation where
the call instruction "is" the return value (when used as an operand to another
instruction).  When we have the situation where a call can define multiple
values, there is no suitable value to use.

Multiple return values are also critically important for one other future
instruction as well: the future LLVM inline assembly instruction.  In
particular, inline assembly can define multiple result values that need to be
bound to multiple SSA values (e.g. a fsincos instruction that computes the
sine and cosine of the operand value, and many many others).

Finally, one annoying, but trivial, nuance of the LLVM type-system is the
'void' type, which is only currently allowed as the return value of a function.
Special cases like this should be eliminated.


//===----------------------------------------------------------------------===//
// Suggested approach: callee side
//

On the callee side, the return instruction will be enhanced to be an n-ary
instruction.  In particular, it will be perfectly valid to have an function
like:

{int, float} %foo() {
   ret int 4, float 17
}

which returns both values "in registers".  The containing function's type would 
use a structure type as its return type (this is currently illegal in LLVM).
Finally, the LLVM 'void' type will be eliminated and become a synonym for {}.
This means that these functions are equivalent:

void %bar1() { ret void }
{} %bar2()   { ret void }

... The asmwriter should pretty print the '{}' type as void when in a call or
function return type.

Note that the aggregate returned by a function is extremely limited.  In
particular, a function may only return a first-class value or a structure type.
If it returns a structure type, all elements of that structure are required to
be first-class values (no nested structures or arrays are allowed).


//===----------------------------------------------------------------------===//
// Suggested approach: caller side
//

The description above is straight-forward, but does not address how multiple
return values are bound to LLVM Value*'s.  I suggest that this happens in two
steps.  First, the aggregate return value is bound to the Call instruction's
Value* as an aggregate.  Second, the individual components of the aggregate
are accessed with simple LLVM instructions.

Consider a call to %foo above:

   %Agg = call {int, float} %foo()
   %Agg.0 = getaggregatevalue {int, float} %Agg, uint 0
   %Agg.1 = getaggregatevalue {int, float} %Agg, uint 1

In the example above, the 'getaggregatevalue' instructions bind the individual
returned values from the aggregate LLVM value to individual ones.  Because they
may only be used on one-level structs, the first operand is always a struct,
and the second operand is always a constant uint.

Note that we limit the LLVM IR so that the only instruction that is allowed to
use this aggregate as an operand is the 'getaggregatevalue' instruction.  In
particular, these aggregates are *not* allowed as operands to PHI nodes, cast
instructions, or calls.  Because of this, native code generation is completely
trivial.

//===----------------------------------------------------------------------===//
// Native code generation
//

No targets support returning an arbitrary number of values in registers.  In
particular, the C ABI will specify when and if the target is required to return
multiple values in registers for a particular C-level function call.  Given
this information, a C front-end (which is attempting to interoperate with the
native ABI and calling conventions) would generate a function an call of
"Calling Convention #1" (see CustomCallingConventions.txt for a description
about how multiple calling conventions will be handled in the future).  CC#1 is
allowed to be restricted in arbitrary ways by the target-specific ABI.  As such,
any front-ends generating CC#1 calls must know about these restrictions and meet
them.

Consider the annoying example above, dealing with complex numbers.  If the ABI
specifies that a complex number return value is returned in two registers, this
would result in an LLVM function that returns a '{double,double}' type.  If the
target requires that an equivalent function that returns a C struct containing
two doubles be returned in memory, the function would compiled to the same LLVM
code as it is today: a pointer is passed in and the return values are stored
through.

In the instruction selector, supporting this is also straight-forward.  If a
function uses CC#0, it may return an arbitrary number of values "in registers",
and could easily be returning more values than the target has physical
registers.  Because CC#0 requires the callee and callers to agree on function
prototypes, though, it is perfectly safe for the instruction selector to decide
on the number and types of values that it allows to be returned in registers
(for example, the X86 target might support only returning 3 integer and 3 FP
values in registers).  If a particular function is prototypes to return more
registers than specified, the target can code gen the function to take a shadow
argument with a pointer to a memory area to store the return values into.  The
important part about this is that it is trivial to statically determine on the
caller and callee sides how a call site will be code generated, so the target
can use any calling conventions that will lead to high performance (this is
the whole point of CC#0 to begin with).  If a function is using CC#1, we know
that the number of registers returned must be supported by the target,
otherwise it is a compile-time error.

Due to the limitations on the 'getaggregatevalue' instruction we imposed above,
the instruction selector for targets is very simple.  In particular, there are
two cases: either the return values are actually in registers, or the return
values needed to be passed through memory anyway.  If return values are passed
through registers, the callee will be code generated like this (assuming X86):

  call foo
  mov %reg1024, EAX
  mov %reg1025, EDX

When the call instruction is code generated, all of the getaggregatevalue
instructions (Which are the only possible users of the call) are associated
with the appropriate virtual registers (e.g. Agg.0 -> %reg1024).

If the return value is in memory, the caller passes in the shadow argument (as
specified by the target), and loads the values from memory after the call.  In
this example, we would get:

push ESP+1234    ;; Ret val location
call foo
mov %reg1024, [ESP+1234]
mov %reg1025, [ESP+1238]

... and code generation proceeds as normal.


//===----------------------------------------------------------------------===//
// Enhancements to the -argpromotion pass
//

The -argpromotion pass converts arguments passed by reference to be passed by 
value instead.  For example, given a function like:

int test(int*P) { return *P; }

It would transform the function to pass *P by value instead of passing in the
address of P (this allows for many subsequent optimizations in the callers).
The result of this transformation is the following function:

int test(int PV) { return PV; }

The argpromotion pass currently does not handle pointers stored through, though
it currently could do so if the function initially returns void.  If functions
could return multiple values, however, we could do this generically,
transforming functions like:

int test2(int *P) { *P = 42; return 1; }

into:

{int,int} test2() { return 1,42; }

This optimization is particularly valuable because it would turn our simple
struct pair returning function into a function that returns multiple values in
registers.  In particular, given a C function like:

struct pair { int X, int Y };

struct pair foo() { struct pair P; P.x = 1; P.Y = 2; return P; }

... and an ABI that says that the struct must be returned in memory,
argpromotion would turn it into the substantially more efficient LLVM function:

{int,int} %foo() {
  return int 1, int 2
}

//===----------------------------------------------------------------------===//
// Other enhancements:

We could introduce llvm.sincos.f32/f64 with prototypes of:

{double,double} %llvm.sincos(double %x);

Transforming sincos into llvm.sincos would eliminate the need for the argument
to be addressible (e.g. exposing it to SROA/mem2reg).  It can be lowered in
the code generator to sincos if needed.