//===----------------------------------------------------------------------===// // Debug Info for Local Variables When Optimizing //===----------------------------------------------------------------------===// 9/24/2009 - Initial revision. 1/11/2010 - Improvements to codegen section At the time of this writing, LLVM's DWARF debug info generation has improved to the point where line numbers and debug information for global variables are preserved when optimization is turned on, but where mem2reg/sroa completely destroy debug information for local variables. This means that debugging optimized code is very difficult in practice. This document describes an implementation approach that allows us to represent variable information for parameters and other automatic variables that does not interfere with optimization, but will retain a lot of the important values (producing a decent debugging optimized code experience). When the framework is in place, we can incrementally improve various optimizations that hurt debug information (such as loop strength reduction) to update it better. //===----------------------------------------------------------------------===// // The state of debuginfo for variables, and the first step. // Consider an example function like this: void use(int); int Y; int test() { int X = 4; use(X); X = Y+2; use(X); return X; } If you ignore line number information, the frontend generates IR that looks something like this: define i32 @test() nounwind { entry: %X = alloca i32, align 4 %0 = bitcast i32* %X to { }* call void @llvm.dbg.declare({ }* %0, metadata !4) store i32 4, i32* %X %tmp = load i32* %X call void @use(i32 %tmp) %tmp1 = load i32* @Y %add = add nsw i32 %tmp1, 2 store i32 %add, i32* %X %tmp2 = load i32* %X call void @use(i32 %tmp2) %tmp3 = load i32* %X ret i32 %tmp3 } One thing that we need to improve is to get rid of the %0 bitcast and make the llvm.dbg.declare intrinsic use the alloca through a metadata use. This will give us code that looks like this: define i32 @test() nounwind { entry: %X = alloca i32, align 4 call void @llvm.dbg.declare(metadata !{i32* %X }, metadata !4) store i32 4, i32* %X %tmp = load i32* %X call void @use(i32 %tmp) %tmp1 = load i32* @Y %add = add nsw i32 %tmp1, 2 store i32 %add, i32* %X %tmp2 = load i32* %X call void @use(i32 %tmp2) %tmp3 = load i32* %X ret i32 %tmp3 } Making this change will clean up a bunch of random places in the optimizer, which currently have to handle a sole bitcast of an alloca used by the debug intrinsic specially. Since the metadata does not show up as a "use" of the alloca, it will automatically drop to null if something (like mem2reg) hacks on the alloca, transparently losing debug information, but not impacting optimization. It is important to note that this is what we have right now, it is just that we have a lot of scattered code to get this behavior. The first step is to get this behavior with less code. Once we have this, the next step is for mem2reg and SROA to be able to preserve debug information when they transform the IR, by transforming the debug information along with the alloca they promote. //===----------------------------------------------------------------------===// // Preserving debug info in mem2reg and SRoA. // The reason that we currently have to delete debug info when mem2reg is performed is that we have no way to represent debug information for variables when it has been promoted to SSA values, we can only represent that the address (which is pinned to memory) has debug info. To fix this, we should add a new debug intrinsic named "@llvm.dbg.value". The semantics of llvm.dbg.value are that at from the point where the llvm.dbg.value is "executed", a (piece of a) specified user source variable is specified to get a new value. llvm.dbg.value's operands would be something like: call void @llvm.dbg.value(metadata !{ i32 4 }, i64 0, metadata !4) This call indicates that at that point, the user variable "!4" (which we know is X above) gets the value of "i32 4". Because we want to support SRoA, we actually model that this is an update of the 4 bytes (because it is an i32) starting at byte #0 (the second argument). The offset argument is always required to be an i64 constant. Given this intrinsic, mem2reg would transform each store into an llvm.dbg.value, transforming the example above IR into: define i32 @test() nounwind { entry: ;;%X = alloca i32, align 4 ;;call void @llvm.dbg.declare(metadata !{i32* %X }, metadata !4) ;;store i32 4, i32* %X call void @llvm.dbg.value(metadata !{ i32 4 }, i64 0, metadata !4) ;;%tmp = load i32* %X call void @use(i32 4) %tmp1 = load i32* @Y %add = add nsw i32 %tmp1, 2 ;;store i32 %add, i32* %X call void @llvm.dbg.value(metadata !{ i32 %add }, i64 0, metadata !4) ;;%tmp2 = load i32* %X call void @use(i32 %add) ;;%tmp3 = load i32* %X ret i32 %add } I kept the instructions deleted by mem2reg as comments to make it more obvious what is going on: basically the loads are nuked as usual and the stores are turned into llvm.dbg.value intrinsics. At this point, we can see how a human would interpret the debug information: if stopped at a breakpoint on the first call to 'use' you look at the current "live" instance of llvm.dbg.value, and find that 'i32 4' is the current live value. If stopped on the second call to use, the live value is i32 %add. This approach has a couple of advantages over other approaches: we don't want computation to have metadata attached to them like "this value updates variable !4", because the location of the computation doesn't necessarily have anything to do with when the update happens (to an extreme, constants don't have locations). Using metadata for the uses of instructions works well because if the optimizer decides to delete %add for some reason, the use will just transparently drop to null and the debug info generator can render the variable as "unavailable" in the region containing the second call to use. One other future improvement that we can do is to add the ability to represent arbitrary expressions as metadata. This would allow us to say that "in this region, the value of the user variable X is "(%i32 %A + 42) / 2" which can be useful for certain optimization scenarios. //===----------------------------------------------------------------------===// // Variable debug info at the MachineInstr Level. // For now, I will ignore selection dags and suggest that we start by focusing on making "fastisel + mem2reg/sroa" generate great variable information. FastISel is much easier to reason about in this respect and solving the issue with fastisel and the rest of the optimizer will put us in a good place for solving the selectiondag issues. That said, I think we should introduce a very simple new target independent opcode (like PHI) to represent this: DEBUG_VALUE. On the example, we'd get this code out of -fast-isel (pseudo machineinstrs syntax to make it easier to read): entry: --> DEBUG_VALUE 4, 0, !4 ADJCALLSTACKDOWN32 4 MOV32mi [ESP], 4 CALL _use ADJCALLSTACKUP32 4 %reg1025 = MOV32rm [Y] %reg1026 = ADD32ri %reg1025, 2 --> DEBUG_VALUE %reg1026, 0, !4 ADJCALLSTACKDOWN32 4 MOV32mr [ESP], %reg1026 CALL _use ADJCALLSTACKUP32 4 EAX = MOV32rr %reg1026 RET The DEBUG_VALUE instruction has operands in the same order as the IR intrinsic, and they mean the same thing. The most interesting operand is the first one, which indicates the expression to evaluate to get the value of the user variable starting at the machineinstr. There are a couple of important cases that we need to handle specially because they are so common: 1. Unknown. If a user variable is clobbered at some point, we need a way to specify that the value is unknown. Using an MO_REG operand with a register number of 0 is a reasonable way to do this. 2. Immediate. If a user variable has a constant value, we want to have the immediate value. MO_Immediate and MO_FPImmediate handle the most important cases of this. 3. Register. Often variables are in a simple register. We should use MO_REG (potentially with a special bit to say it's a "debug use") to represent this. Codegen optimizations should generally ignore or special case these special uses: we don't want their presence to affect the generated code. 4. Stack slot. Often variables are in one stack slot (this is the only case that dbg.declare handles). An MO_FrameIndex operand says that the user variable is in the specified stack slot. 5. Arbitrary expression. Eventually, we want the ability to say that "user variable I is "EAX/4+14" for example to handle debug info updates after LSR. This is not a priority and should be deferred from the initial implementation. With this representation, the location expression of the value is always a single MachineOperand, and it always has a target-independent representation. This ensures that DEBUG_VALUE doesn't have target-specified addressing modes in it, for example. Continuing the example, when the register allocator runs, it would rewrite the 'debug register uses' to a frameindex reference or register reference as appropriate. This means that we'd end up getting something like this after regalloc: entry: --> DEBUG_VALUE 4, 0, !4 ADJCALLSTACKDOWN32 4 MOV32mi [ESP], 4 CALL _use ADJCALLSTACKUP32 4 ESI = MOV32rm [Y] ESI = ADD32ri ESI, 2 --> DEBUG_VALUE ESI, 0, !4 ADJCALLSTACKDOWN32 4 MOV32mr [ESP], ESI CALL _use ADJCALLSTACKUP32 4 EAX = MOV32rr ESI RET One note: PEI would NOT lower the frame indexes to be stack references for any spilled values. They should persist through to the asmprinter as FrameIndex operands. When the DEBUG_VALUE gets to the asmprinter, the initial implementation phase of this (which is also useful for long-term -asm-verbose mode) should print out these instructions as comments, producing something like this: _test: LBB1_0: pushl %esi # DEBUG_VALUE "X" <- 4 subl $8, %esp movl $4, (%esp) call _use movl _Y, %esi addl $2, %esi # DEBUG_VALUE "X" <- %esi movl %esi, (%esp) call _use movl %esi, %eax addl $8, %esp popl %esi ret When we have this, we can start doing an evaluation to see how much debug info we're losing. In the case of fastisel+mem2reg, we should lose no variable info at all at the IR level, and only lose variable information in codegen when the register is not live across the DEBUG_VALUE calls. The final implementation step would be to hook these up as the appropriate DWARF ranges. This requires emitting the instructions as labels and emitting the right DWARF directives to the appropriate debug section. This analysis is a bit nontrivial (requiring some dataflow analysis) and can be done in parallel with the other work (e.g. bringing up selectiondag stuff). //===----------------------------------------------------------------------===// // Moving forward // Once the basics start working, we should be getting a lot of variable information, and we should have simple ways to tell when we are losing variable information: we can grep for null MD uses in the IR level and we can grep the final .s file for the DEBUG_LOC comments to see if they have any that are unavailable. At this point we can work on incrementally improving the quality of the debug information by enhancing specific optimizers that are breaking it frequently. One known example from other compilers are various induction variable optimizations. Making LSR preserve debug info by rewriting debug expressions when it is rewriting other code should not be difficult in theory. The same can be done for many other mid-level optimizations as well as codegen passes.