//===----------------------------------------------------------------------===//
//                Debug Info for Local Variables When Optimizing
//===----------------------------------------------------------------------===//

9/24/2009 - Initial revision.
1/11/2010 - Improvements to codegen section

At the time of this writing, LLVM's DWARF debug info generation has improved
to the point where line numbers and debug information for global variables are
preserved when optimization is turned on, but where mem2reg/sroa completely
destroy debug information for local variables.  This means that debugging
optimized code is very difficult in practice.

This document describes an implementation approach that allows us to represent
variable information for parameters and other automatic variables that does not
interfere with optimization, but will retain a lot of the important values
(producing a decent debugging optimized code experience).  When the framework
is in place, we can incrementally improve various optimizations that hurt debug
information (such as loop strength reduction) to update it better.

//===----------------------------------------------------------------------===//
// The state of debuginfo for variables, and the first step.
//

Consider an example function like this:

void use(int);

int Y;
int test() {
  int X = 4;
  use(X);
  X = Y+2;
  use(X);
  return X;
}

If you ignore line number information, the frontend generates IR that looks
something like this:

define i32 @test() nounwind {
entry:
  %X = alloca i32, align 4
  %0 = bitcast i32* %X to { }*
  call void @llvm.dbg.declare({ }* %0, metadata !4)
  store i32 4, i32* %X
  %tmp = load i32* %X
  call void @use(i32 %tmp)
  %tmp1 = load i32* @Y
  %add = add nsw i32 %tmp1, 2
  store i32 %add, i32* %X
  %tmp2 = load i32* %X
  call void @use(i32 %tmp2)
  %tmp3 = load i32* %X
  ret i32 %tmp3
}

One thing that we need to improve is to get rid of the %0 bitcast and make the
llvm.dbg.declare intrinsic use the alloca through a metadata use.  This will
give us code that looks like this:

define i32 @test() nounwind {
entry:
  %X = alloca i32, align 4                        
  call void @llvm.dbg.declare(metadata !{i32* %X }, metadata !4)
  store i32 4, i32* %X
  %tmp = load i32* %X                             
  call void @use(i32 %tmp)
  %tmp1 = load i32* @Y                            
  %add = add nsw i32 %tmp1, 2                     
  store i32 %add, i32* %X
  %tmp2 = load i32* %X                            
  call void @use(i32 %tmp2)
  %tmp3 = load i32* %X                            
  ret i32 %tmp3
}

Making this change will clean up a bunch of random places in the optimizer,
which currently have to handle a sole bitcast of an alloca used by the debug
intrinsic specially.  Since the metadata does not show up as a "use" of the
alloca, it will automatically drop to null if something (like mem2reg) hacks on
the alloca, transparently losing debug information, but not impacting
optimization.

It is important to note that this is what we have right now, it is just that we
have a lot of scattered code to get this behavior.  The first step is to get
this behavior with less code.  Once we have this, the next step is for mem2reg
and SROA to be able to preserve debug information when they transform the IR,
by transforming the debug information along with the alloca they promote.

//===----------------------------------------------------------------------===//
// Preserving debug info in mem2reg and SRoA.
//

The reason that we currently have to delete debug info when mem2reg is
performed is that we have no way to represent debug information for variables
when it has been promoted to SSA values, we can only represent that the address
(which is pinned to memory) has debug info.  To fix this, we should add a new
debug intrinsic named "@llvm.dbg.value".  The semantics of llvm.dbg.value are
that at from the point where the llvm.dbg.value is "executed", a (piece of a)
specified user source variable is specified to get a new value.  
llvm.dbg.value's operands would be something like:

  call void @llvm.dbg.value(metadata !{ i32 4 }, i64 0, metadata !4)

This call indicates that at that point, the user variable "!4" (which we know
is X above) gets the value of "i32 4".  Because we want to support SRoA, we
actually model that this is an update of the 4 bytes (because it is an i32)
starting at byte #0 (the second argument).  The offset argument is always
required to be an i64 constant.  Given this intrinsic, mem2reg would transform
each store into an llvm.dbg.value, transforming the example above IR into:

define i32 @test() nounwind {
entry:
      ;;%X = alloca i32, align 4
      ;;call void @llvm.dbg.declare(metadata !{i32* %X }, metadata !4)
      ;;store i32 4, i32* %X
  call void @llvm.dbg.value(metadata !{ i32 4 }, i64 0, metadata !4)
      ;;%tmp = load i32* %X
  call void @use(i32 4)
  %tmp1 = load i32* @Y
  %add = add nsw i32 %tmp1, 2
      ;;store i32 %add, i32* %X
  call void @llvm.dbg.value(metadata !{ i32 %add }, i64 0, metadata !4)
      ;;%tmp2 = load i32* %X
  call void @use(i32 %add)
      ;;%tmp3 = load i32* %X
  ret i32 %add
}

I kept the instructions deleted by mem2reg as comments to make it more obvious
what is going on: basically the loads are nuked as usual and the stores are
turned into llvm.dbg.value intrinsics.

At this point, we can see how a human would interpret the debug information: if
stopped at a breakpoint on the first call to 'use' you look at the current 
"live" instance of llvm.dbg.value, and find that 'i32 4' is the current live 
value.  If stopped on the second call to use, the live value is i32 %add.

This approach has a couple of advantages over other approaches: we don't want
computation to have metadata attached to them like "this value updates 
variable !4", because the location of the computation doesn't necessarily have
anything to do with when the update happens (to an extreme, constants don't
have locations).  Using metadata for the uses of instructions works well
because if the optimizer decides to delete %add for some reason, the use will
just transparently drop to null and the debug info generator can render the
variable as "unavailable" in the region containing the second call to use.

One other future improvement that we can do is to add the ability to represent
arbitrary expressions as metadata.  This would allow us to say that "in this
region, the value of the user variable X is "(%i32 %A + 42) / 2" which can be
useful for certain optimization scenarios.

//===----------------------------------------------------------------------===//
// Variable debug info at the MachineInstr Level.
//

For now, I will ignore selection dags and suggest that we start by focusing on
making "fastisel + mem2reg/sroa" generate great variable information.  FastISel
is much easier to reason about in this respect and solving the issue with
fastisel and the rest of the optimizer will put us in a good place for solving
the selectiondag issues.  That said, I think we should introduce a very simple
new target independent opcode (like PHI) to represent this: DEBUG_VALUE.  On
the example, we'd get this code out of -fast-isel (pseudo machineinstrs syntax
to make it easier to read):

entry:
-->     DEBUG_VALUE 4, 0, !4
	ADJCALLSTACKDOWN32 4
	MOV32mi [ESP], 4
	CALL _use
	ADJCALLSTACKUP32 4
	%reg1025 = MOV32rm [Y]
	%reg1026 = ADD32ri %reg1025, 2
-->     DEBUG_VALUE %reg1026, 0, !4
	ADJCALLSTACKDOWN32 4
	MOV32mr [ESP], %reg1026
	CALL _use
	ADJCALLSTACKUP32 4
	EAX = MOV32rr %reg1026
	RET

The DEBUG_VALUE instruction has operands in the same order as the IR intrinsic,
and they mean the same thing.  The most interesting operand is the first one,
which indicates the expression to evaluate to get the value of the user
variable starting at the machineinstr.  There are a couple of important cases
that we need to handle specially because they are so common:

1. Unknown.  If a user variable is clobbered at some point, we need a way to
   specify that the value is unknown.  Using an MO_REG operand with a register
   number of 0 is a reasonable way to do this.
2. Immediate. If a user variable has a constant value, we want to have the
   immediate value.  MO_Immediate and MO_FPImmediate handle the most important
   cases of this.
3. Register.  Often variables are in a simple register.  We should use MO_REG
   (potentially with a special bit to say it's a "debug use") to represent
   this.  Codegen optimizations should generally ignore or special case these
   special uses: we don't want their presence to affect the generated code.
4. Stack slot.  Often variables are in one stack slot (this is the only
   case that dbg.declare handles).  An MO_FrameIndex operand says that the
   user variable is in the specified stack slot.
5. Arbitrary expression.  Eventually, we want the ability to say that "user
   variable I is "EAX/4+14" for example to handle debug info updates after LSR.
   This is not a priority and should be deferred from the initial
   implementation.

With this representation, the location expression of the value is always a
single MachineOperand, and it always has a target-independent representation.
This ensures that DEBUG_VALUE doesn't have target-specified addressing modes in
it, for example.

Continuing the example, when the register allocator runs, it would rewrite the
'debug register uses' to a frameindex reference or register reference as 
appropriate.  This means that we'd end up getting something like this after
regalloc:

entry:
-->     DEBUG_VALUE 4, 0, !4
        ADJCALLSTACKDOWN32 4
        MOV32mi [ESP], 4
        CALL _use
        ADJCALLSTACKUP32 4
        ESI = MOV32rm [Y]
        ESI = ADD32ri ESI, 2
-->     DEBUG_VALUE ESI, 0, !4
        ADJCALLSTACKDOWN32 4
        MOV32mr [ESP], ESI
        CALL _use
        ADJCALLSTACKUP32 4
        EAX = MOV32rr ESI
        RET

One note: PEI would NOT lower the frame indexes to be stack references for any
spilled values.  They should persist through to the asmprinter as FrameIndex
operands.  When the DEBUG_VALUE gets to the asmprinter, the initial 
implementation phase of this (which is also useful for long-term -asm-verbose 
mode) should print out these instructions as comments, producing something
like this:

_test:
LBB1_0:
	pushl	%esi
	# DEBUG_VALUE "X" <- 4
	subl	$8, %esp
	movl	$4, (%esp)
	call	_use
	movl	_Y, %esi
	addl	$2, %esi
	# DEBUG_VALUE "X" <- %esi
	movl	%esi, (%esp)
	call	_use
	movl	%esi, %eax
	addl	$8, %esp
	popl	%esi
	ret

When we have this, we can start doing an evaluation to see how much debug info
we're losing.  In the case of fastisel+mem2reg, we should lose no variable
info at all at the IR level, and only lose variable information in codegen when
the register is not live across the DEBUG_VALUE calls.  The final implementation
step would be to hook these up as the appropriate DWARF ranges.  This requires
emitting the instructions as labels and emitting the right DWARF directives to
the appropriate debug section.  This analysis is a bit nontrivial (requiring
some dataflow analysis) and can be done in parallel with the other work
(e.g. bringing up selectiondag stuff).

//===----------------------------------------------------------------------===//
// Moving forward
//

Once the basics start working, we should be getting a lot of variable
information, and we should have simple ways to tell when we are losing variable
information: we can grep for null MD uses in the IR level and we can grep the
final .s file for the DEBUG_LOC comments to see if they have any that are
unavailable.  At this point we can work on incrementally improving the quality
of the debug information by enhancing specific optimizers that are breaking it
frequently.  One known example from other compilers are various induction
variable optimizations.  Making LSR preserve debug info by rewriting
debug expressions when it is rewriting other code should not be difficult in
theory.  The same can be done for many other mid-level optimizations as well as
codegen passes.