//===----------------------------------------------------------------------===//
//           Reworking and revitalizing the LLVM Code Generator
//===----------------------------------------------------------------------===//

10/2/2004     Initial revision

This note describes changes to the code generator that we need to make in order
for it to be able to adapt and work well for our current and future targets.
The current MachineInstr/MachineOperand classes are both a mess, and they are
neither expressive enough or simple enough.

The following is a specific list of changes that should be made to each class
or portion of the code generator.


//===----------------------------------------------------------------------===//
//                    An aside about the SparcV9 backend
//===----------------------------------------------------------------------===//

One thing to note is that the SparcV9 backend is currently the piece that is
holding back the progress of the rest of the code generator.  The primary
reason for this is that it uses a completely different representation for
machine instructions (based on Value*'s) than the target independent code
generator does.  In order to address this, the current SparcV9 code generator
can be either converted completely over to the target-independent structure
(this seems unlikely) or we can fork the machine instruction/operand
representation so that the V9 backend uses one set of classes and everything
else uses another.  I think this is the best short term solution until V9 can
be properly addressed.

To split the V9 and target-independent instruction representation, I propose
renaming MachineInstr to SparcMachineInstr and MachineOperand to
SparcMachineOperand.  To support the target independent code generator, a new
MachineInst and MachineInst::Operand class 

The MachineBasicBlock class would then contain two lists of instructions: one
for the target independent code generator and one for the sparc backend.


//===----------------------------------------------------------------------===//
//                     Two-address instruction lowering
//===----------------------------------------------------------------------===//

The code generator currently uses a form of "lowering" to reduce instructions
from three address form to two address form.  This is used on targets that have
operands that they both use and define, permitting the code generator to
represent them in three address form before register allocation.  In particular
consider the X86 addition operation.  Before register allocation, we might
have:

    %vreg1024 = add %vreg1025, %vreg1026

After register allocation, we require:

    EAX = add EAX, ECX

which gets represented as:

    add EAX, ECX

(in other words, one operand is dropped).


There are two problems with this: the first is that deleting the operand from
the machine instruction is problematic, and the second is that this is not
expressive enough.

  1. Any information about the instruction needs to know whether it is being
     used before or after register allocation (because operand numbers shift
     around).  This is a huge problem moving towards an autogenerated target
     description.
  2. This is not expressive enough.  In particular, it is only sufficient for
     targets that need two operands register allocated to the same register.
     Some targets need more, and supporting inline assembly will eventually
     require support for this.

In order to address these problems, we should make the following changes:

  1. The register allocator/two address elimination passes should not delete
     the operand.  After register allocation, the two operands are required
     to have identical registers.
  2. The TargetInstrInfo "isTwoAddress" flag needs to be replaced.  In
     particular, we need more powerful way of describing this.  One way of
     doing so is to have a per-operand reload constraint, which would indicate
     that the second operand must contain the same physical register as the
     first operand.  This change can and should be made independently of the
     first.

See the 'TargetInstrDescriptor class' section below for more information.


//===----------------------------------------------------------------------===//
//                              ValueType enum
//===----------------------------------------------------------------------===//

The ValueType enum indicates what types of value an operand may have (somewhat
equivalent to the GCC concept of a 'mode').  It currently allows for the integer
types i1, i8, i16, i32, i64, and i128, the floating point types 'f32, f64, f80',
and the special types Other and isVoid.  We need arbitrary sized integers to
be able to accurately model targets that have (for example) 13-bit immediate
fields.

This enum should be extended to allow any integer size from 1 to 64 bits, as
well keeping i128.  The floating point types may be extended or changed in the
future (to support non-IEEE FP), but for now they can remain the same.


//===----------------------------------------------------------------------===//
//                        TargetInstrDescriptor class
//===----------------------------------------------------------------------===//

The TargetInstrDescriptor class needs several important changes.  First and
easiest, in needs a field to indicate the opcode of the instruction the
descriptor corresponds to.  Thus, this expression should always be true:

  assert(TII->get(opcode).Opcode == opcode);

The reason for this will become clear later.

Second, the TargetInstrDescriptor needs to indicate the number of operands the
machine instr has.

Third, TargetInstrDescriptor needs to contain information about all of the
operands of a machine instruction.  In particular, any information about
operands of a instruction that is always true (for instructions of that opcode)
should be stored here.  Basically we want to store as much as we can once, in
static memory, instead of once for each instance of a machine instruction.

In particular, we should keep use/def information for each operand, a ValueType
for each operand, operand constraint information for the register allocator,
etc.  None of this should be stored in the MachineInstr object, it should all
be stored once for each opcode.


//===----------------------------------------------------------------------===//
//                              MachineInst class
//===----------------------------------------------------------------------===//

The MachineInst class, which is the target-independent replacement for the
current MachineInstr class, should stay somewhat similar to the current class.
In particular, it should look like the following:

1. The Parent, Next, and Previous pointers should be retained.
2. numImplicitRefs and all users should be eliminated, as it's V9 specific.
3. Instead of containing an opcode, it should contain a direct pointer to the
   TargetInstrDescriptor object for the instruction.  If the opcode is needed,
   clients can use MI->getInstrInfo()->Opcode, or a helper method in
   MachineInstr can be used.
4. The Opcode/MID pointer should be const upon creation of the machine
   instruction.  Once created, this should never, ever, change.  As such, the
   MID pointer should probably be an MID reference, and the 'replace' and
   'setOpcode' methods should go away.
5. The number of operands an instruction contains should never change once
   created (see note about fixing the 2-addr thing above).  As such, the
   operands list should be allocated at the same time the MachineInstr object
   is.  Because of this, the MachineInstr ctor should be made private, and a
   new MachineInstr::create method should be used instead.  MachineInstr::create
   would take a MID pointer (which indicates the number of operands the MI has),
   allocate memory, then call the in-place ctor for MachineInst.  Given that
   all clients call (or should call) BuildMI, this change should be easy.
6. Many of the methods in this class should get zapped.  In particular, all the
   "addOperandOfThisFlavor" methods should go away.  Imagine a MachineInst class
   where mere mortals could understand what all of the methods do!!


//===----------------------------------------------------------------------===//
//                         MachineInst::Operand class
//===----------------------------------------------------------------------===//

The MachineOperand replacement is the hardest part of this whole puzzle to get
right.  In particular, we want to be able to have the following capabilities in
our MI::Operand class:

1. We need the following basic operand types:
    A. Register
    B. Subregister
    C. Immediate
    D. Memory reference

2. For register references, what we have now is basically sufficient: we need to
   be able to represent virtual regs before allocation and change them to
   physregs after.  This isn't hard.

3. Subregisters are similar, but also a bit different.  Basically, a subregister
   reference is a pair containing a virtual register and a small constant
   subregister ID.  At register allocation time, this is replaced with a specific
   physical register.  Details of this mechanism are not described in detail
   here.

4. Immediates need to be generalized substantially.  In particular, we need the
   following capabilities:
     A. The continued ability to *efficiently* represent immediate integers <= 32
        bits in size.
     B. The ability to represent the addresses of global variables, including
        constant offsets from their start.  This should be done without referring
        to the LLVM Value*'s for the globals.
     C. The ability to hold the address of a machine basic block.
     D. The ability to represent that this is a subreference to a portion of the
        immediate.  In particular, RISC machines often load addresses of globals
        in multiple parts, and need relocations to indicate which slab of the
        global address the instruction refers to.
     E. An abstract FrameIndex or ConstantPoolIndex address.

5. Memory references (loaded and stored addresses) need to be very general.  In
   particular, we need to be able to represent the following forms of memory
   reference:

     A. Any immediate.
     B. Target-specific patterns of almost arbitrary complexity.  In particular,
        we should be able to say that the X86 has "CST + REG + [1,2,4,8] * REG"
        addressing.

   In addition, we need to keep an alias set number for the memory reference,
   or some sort of way to relate to the LLVM code.   This alias information 
   should come from the instruction selector.  Finally, we need to keep 
   information about alignment.


//===--------------------- Implementation

The above constraints are fairly onerous to provide, at least in an efficient
way.  The two obvious implementation strategies would be to either:

  1. make the Operand class a union of all of the possible fields with a
     discriminator indicating the current form (this is how MachineOperand
     currently works).
  2. Make the Operand class a base class, which has subclasses for all of the
     different forms and things.

The problem with the first approach is that it wastes memory for operands that 
are simple (e.g. register references and small integer immediates, which are
very very common) by forcing them to be the size of a union of operands.  Also,
it is not clear how to support arbitrary target-specific addressing modes.

The problem with the second approach is that it is very costly.  Instead of
being a single memory allocation for the machine instruction, it is an
allocation for the instruction plus an allocation for each operand.  In
addition, traversing the operands would require trapsing around memory in
random ways, which is bad bad bad.


I propose that we use a hybrid approach.  In particular, the Operand class
would be a base class that contains a union of SOME of the state for the
operand, and, further, derived classes are not allowed to contain extra state.
Consider the following sketch:

struct Operand {
  enum OperandType {
    isReg, isSubReg, isIntImmed, isArbitraryImmediate, isMem
  } OpDiscriminator;  

protected:
  union {
    struct RegStuff {
       unsigned RegNo;
    };
    struct SubRegStuff {
       unsigned VirtReg;
       unsigned SubRegNo;
    };
    struct IntImmediateStuff {
       unsigned IntVal;
    };
    struct ArbitraryImmediateStuff {
       MachineImmediate *MImm;
    };
    struct MemStuff {
      AddressingMode *Addr; 
      unsigned AliasSet;
    };
  } Contents;
public:
  Operand(OperandType OT);
}

struct RegOperand : public Operand {
  unsigned getReg() const { return Contents.RegNo; }
  void setReg(unsigned R) { ... }

  ... classof stuff for dyn_cast ...
};

struct IntImmOperand : public Operand {
  unsigned getValue() { return ... }

  ... classof stuff for dyn_cast ...
};

...


In particular, because the derived classes do not add any data members to the
base class, we can safely construct an array of Operand's (as part of the 
MachineInst memory allocation) and cast the elements to the derived classes.

All of the variable sized and arbitrary complex stuff in the operands (in the 
arbitrary immediate and the memory mode), which are (small) class hierarchies of
simple classes.

One thing that is important to note is that the AddressingMode class is
extensible by the target, allowing them to have their arbitrarily wierd and
complex addressing mode stuff.  The addressing mode class would provide virtual
methods to print and emit machine code for an addressing mode.

A final important aspect of the AddressingMode and MachineImmediate classes is
that all instances of them should be aquired through a ::get method (e.g. 
MachineGlobalAddress::get("malloc")), and all of them are immutable on
construction.  These two requirements allow the objects to all be shared and
reference counted, allowing us to avoid tons of dynamic memory allocations (e.g.
all "[ESP + 12]" objects will be shared by instructions that reference that
stack slot.


//===----------------------------------------------------------------------===//
//                          A path from here to there
//===----------------------------------------------------------------------===//

This document describes a lot of ideas and important things that need to be
done.  When we start making these changes, it is important that we make them
incrementally.  In particular, there are three things that are independent:

* The two-address change can be done at any time, and blocks most of the other
  things here (how can you have info about operands if the operand indexes are
  not stable?).
* The V9 and target-independent Instruction classes can be split up.  Initially
  the bodies of the classes would be the same, then they would be pruned to
  eliminate unneeded facilities on both sides, then the t-i one would be
  enhanced.
* ValueType should be enhanced.

Once these are under way, it would be reasonable to start adding information
about operands to the TargetInstrDescriptor class, then finally the
MachineOperand/MachineInst changes can be made.

It's a long road, but each little piece is relatively straight-forward!