//===----------------------------------------------------------------------===//
//                   A New Target Specifier Abstraction
//===----------------------------------------------------------------------===//

2/22/2011 - initial revision

The [sub]target configuration and selection logic currently in LLVM is a
somewhat confused and bewildering mess.  Various parts of the system have GNU
target triples, "canonicalized" triples, CPU names, target features (SSE4.1?),
MachO CPU subtypes, and the llvm::Triple class.

This leads to a number of problems in LLVM:
 - we have a bunch of duplication
 - we have confusion about what a triple is (normalized or not)
 - no good way to tell if a triple is normalized
 - no good, centralized way to reason about which triples are allowed and valid
 - the MC assembler has to link in the entire X86 backend to get subtarget info
 - we don't have a good way to implement things like .code32 in the MC assembler
 - LLDB replicates a lot of this code and heuristics
 - we don't have good interfaces to inquire about the host
 - we do std::string manipulation in llvm::Triple
 - linux triples are actually quadruples!
 - darwin tools that take -arch have to map them onto something internally.

This proposal describes a replacement for all of this, a new llvm::TargetSpec
class.  It subsumes llvm::Triple, a byte ordering, and a CPU specifier.  This
API is intended to be a central meeting point for this functionality, but we
will still take GNU triples as inputs, etc.  Once world domination has been
completed, the llvm::Triple class should be removed.

//===----------------------------------------------------------------------===//
// The llvm::TargetSpec class
//

Some high level design points:
 - The new class will live in libsupport and thus cannot use anything from
   codegen or other higher level libraries.
 - Unlike llvm::Triple, the class contains a bunch of enums as instance
   variables, instead of containing a string + enums.
 - This doesn't try to capture every detail of codegen, such as pic vs nonpic.
 - A TargetSpec needs to be convertible to string, but doesn't have to look like
   a triple.
 - Need to be able to convert from a GNU triple to target spec, but not the
   other way around.
 - Any field of the TargetSpec can be "unknown"; all fields are represented by
   enums.
 - Not all combinations of fields are valid, some ABI settings are only valid
   for some architectures.
 - There will be lots of enums, but fear not, most people should use helper
   methods on the class, not switching on the enums.  For example, call
   isOSDarwin() instead of playing with enums.
 - If we want to add new fields in the future, we can.

The proposed fields in TargetSpec are:

#1: Arch - This is major ISA mode for the target, e.g. x86, x86_64, arm (which
    should include all variants), thumb (which includes thumb2), ppc, ppc64,
    etc.  GNU Triples like "armv6-..." and "armv5-..." both map onto "arm".
    Using "x86" instead of "i386" avoids confusion about whether this is talking
    about architectures or specific CPUs.

#2: Byte Order - BE, LE, Unknown.

#3: OS - This corresponds directly to Triple::OSType, except that version
    numbers are included in the enum list (so we have darwin10, darwin9, etc as
    enums). Sub-version numbers aren't kept for darwin (no darwin10.4).  OS's
    that support multiple different ABIs are listed as different enums (e.g.
    "linux-eabi") since the entire stack has to be built for that ABI.

#4: Object File - PECOFF, ELF, MachO, Unknown.

#5: CPU name - We'll have a massive enum containing every CPU type possible in
    LLVM, with an arch prefix on it (e.g. x86_nehalem).  This is only valid for
    triples that have an arch specified.  The enumerators for different
    architectures overlap numerically (e.g. x86_nehalem may equal
    arm_cortex_a8).

#6: Feature delta - This is printed along with the CPU number, but it is an
    optional bitfield of arch specific subfeatures that are changed.  For
    example, "core2,sse41,nocmov".  For unknown CPUs, this can also be things like
    "unknown,sse41,64bit".  This field is only valid if the arch is specified,
    and the actual bit values overlap across architectures (e.g. it might be
    true that TargetSpec::arm_vfp2 == TargetSpec::x86_cmov).  We should use
    nofoo instead of -foo to remove features to avoid ambiguity and confusion
    with cpu names that want a hyphen.

When converted to string form, we just convert the enums to lower case and put
dots between them.  This makes it really obvious when we have a TargetSpec vs a
GNU triple.

Here are some examples:

  x86.le.darwin10.macho.nehalem,avx
  x86-64.le.linux.elf.unknown
  arm.le.linux-eabi.elf.cortex-a8
  unknown.be.unknown.unknown.unknown

//===----------------------------------------------------------------------===//
// Implementation
//

The biggest source of ugliness here is that we have to have a giant table of
target features (SSE4!) and arch specific CPU names.  This is already running
through tblgen and should continue to do that.  We can even extend it to reason
about OS's etc.

Since the TargetSpec class just contains a bunch of enums (and a 32-bit
bitfield) it should be very small and passed around by-value.

TargetSpec should have a static function for determining the validity of its
fields and returns a bool.  The constructor for TargetSpec asserts that the
inputs are valid, so we never have an invalid TargetSpec (though missing pieces
are fine, set to unknown).

When you construct a TargetInfo and some unknown piece is forced (e.g. all x86's
are little endian) the constructor auto fills in the information.  For example,
if you want to know if some Arch is only big/little endian, you could make a
TargetSpec with just an Arch specified and see whether the TargetInfo ctor sets
the endianness to big/little or unknown.

TargetSpec should also have "setters" which can fail, so you can see if it is
valid for an existing TargetSpec to be a darwin10 one.  This just wraps the
"isValid" predicate.

//===--------------------
// TargetSpec and Strings
//

TargetSpec internally stores enums, but can be converted to and from the string
form above (and has a .dump() method, etc).  This string is stored in LLVM BC
files and used in other cases where a TargetSpec has to be persisted.  The
actual enum values used are not stable (just like with llvm::Triple).

//===-----------------------
// Detecting Host Properties
//

The JIT and Clang driver (for -march=native) want to detect what the host is.
They can do that by calling TargetSpec::getHostInfo().  If the exact host CPU
cannot be detected, we end up with unknown,feature1,feature2,feature3, all of
the other properties should be reliably detectable.

//===----------------
// MachO CPU Subtypes
//

We should have a static method that takes MachO architecture+subtype enums and
returns a TargetSpec.  For example,
  TargetSpec::getFromDarwinMachO(CPU_TYPE_POWERPC, CPU_SUBTYPE_POWERPC_620)
would return the obvious darwin triple.

TargetSpec should also have methods that return the CPU_TYPE and CPU_SUBTYPE in
cases where they make sense.  LLDB and the code generators can use these.

Similarly, we should have methods to convert a TargetSpec to and from the
-arch specifier on various command line tools.  -arch armv6 should convert to
something like "arm.le.darwin-macho.generic-v6".

//===---------
// Other Stuff
//

There is a lot of other stuff derived from target info.  This should be added as
(well documented!) predicates and accessors on the TargetSpec to centralize the
logic used to refer to targets.

The MC assembler should switch to using this for its CPU feature stuff, fixing
a dependency on the X86/ARM backends.