Fixed Point Lines

Y-Major

Table of Contents:
Faster than Bresenham?
What is "Fixed Point"?
The basic routine
A FASTER line!
Going left vs going right...
Squeezing out the last drop of speed
A example program for lines


A faster line than Bresenham???

Sure it is! The main drawback of the Bresenham line drawing routine is that for each pixel that it draws, it MUST make a jump. This destroys the cache on 80x86 processors, and costs three cycles on top of that. This also makes the inner loop of a Bresenham line routine very unoptimizable. For this reason, a number of routines have been invented to fill the gap. The topic of today's discussion is fixed point lines.

Fixed point lines are superior to Bresenham lines in many ways, but can be slower in one case. If you are not clipping your lines, then short lines will actually be faster with the Bresenham routine. This is because of the initial integer division to determine the slope of the line. If clipping is being performed, then this is irrevelant because the slope will already have been calculated by the clipping routine.


What IS fixed point?
And how does it make lines go faster?

Although I won't describe the complete detail of fixed point here, I will cover the basics...

Fixed Point numbers are simply a way to store floating point information in an integer number. They only store a set number of decimal places, and are therefore slightly inaccurate. The first example I give are in base 10. This is an easy base for humans to deal with, but is very cumbersome with computers. Therefore, in real applications we use base 2... (Note: I will use the "^" character to indicate raising to a power)

Suppose you have the following number: 180.4527

Now because you are working with performance intensive applications, you need good performance. Naturally, you try to us integer math. Unfortunately, you find that the ".4527" is an important part of the information. So you do the following:

IntX := 1804527;
If you look closely, you see that the number above is that same number as the one farther above, it is just "shifted" four places to the left. This leaves you with a (bigger) integer number. Wow, this is great you say! But how do I print it out?

WRITELN('This number was stored in an integer: ', IntX / (10^4)); 
                                        { "Shifted" four places }
What is great about this system is that you can add numbers very easily:

IntX := 1804527;        {180.4527 }
IntY := 0005473;        {  0.5473 }
IntZ := IntX+IntY;      { IntZ now = 1810000 } 
RealZ := IntZ / (10^4); { RealZ = 181.0 }
You can multiply two numbers with integer routines. You can also divide two numbers with integer instructions. This is what makes fixed point so fast. Examples:

IntZ := (IntX*(10^4)) DIV IntY;
IntZ := (IntX*IntY) DIV (10^4);
Now in reality, we will be using base 2 fixed point numbers. This means that we will use this for divide and multiply:

IntZ := (IntX*(2^4)) DIV IntY;
IntZ := (IntX*IntY) DIV (2^4);
Except that instead of shifting four places to the left, we will shift 16 places to the left (and use 32 bit numbers). This is commonly called 16.16 fixed point numbers...

IntZ := (IntX*(2^16)) DIV IntY;
IntZ := (IntX*IntY) DIV (2^16);
Another thing that makes FP numbers go so fast is that the math above simplifies to the following:

IntZ := (IntX SHL 16) DIV IntY;
IntZ := (IntX*IntY) SHR 16;
Conveniant, isn't it! Well this is a huge improvement over using floating point mul's and div's. It also lets people without a coprocessor (like me!) create and use high performance graphics routines.


The routine...

Fixed point lines work by picking a major axis and stepping along that axis. The other coordinate is maintained with fixed point numbers. The slope is calculated, and specifies the intra-pixel spacing in the non-major axis. The following code is a complete Y-Major fixed point line drawer...

PROCEDURE FixLine(X1, Y1, X2, Y2 : INTEGER; Color : BYTE);
VAR
  DeltaX, DeltaY,     { Change in X and Y   }
  Y     : INTEGER;    { Current position    }
  X, Slope : LONGINT; 
BEGIN
  DeltaX := X2-X1+1; DeltaY := Y2-Y1+1;
  Slope := (LONGINT(DeltaX) SHL 16) DIV DeltaY;

  X := LONGINT(X1) SHL 16;
  FOR Y := Y1 TO Y2 DO
  BEGIN
    SetPixel(X SHR 16, Y, Color);
    X := X + Slope;
  END;
END;
I think that you can see why it can go so fast. It is a very simple and a very powerful routine. It can (in the next section) be programmed to take advantage of the processor's (386+) most convienent features. We will get it down to two instructions (3 cycles) per pixel...


A FASTER line!

How can we make this faster? Why, the inline assembler of course! But converting it straight into assembler misses a few optimizations. Here is a direct conversion of the previous Pascal routine:

PROCEDURE FixLine(X1, Y1, X2, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
VAR
  X        : INTEGER;    { Current position    }
  Y, Slope : LONGINT;
ASM
  mov DX, X2
  sub DX, X1
  inc DX
  xor AX, AX
  mov CX, Y2
  sub CX, Y1
  inc CX
  div CX
  mov WORD [Slope], AX
  mov WORD [Slope+2], 0 { Slope := (LONGINT(DeltaY) SHL 16) DIV DeltaX; }

  db $66; mov DX, WORD [Slope]
  db $66; xor BX, BX
  mov BX, X1
  db $66; shl BX, 16  { BX is the X Variable}

  mov CX, Y1          { CX is the Y variable }  
  mov AL, Color
  les DI, Screen.Buffer
@YLoop:
  db $66; rol BX, 16

  mov DI, BX { X }
  mov SI, CX { Y } { Y is an index into the ScreenY array        }
  add SI, SI       { Multipy by two because each entry is a word }
  add DI, DS:[SI+OFFSET Screen.YTable]
  mov ES:[DI], AL  { Set the pixel }

  db $66; rol BX, 16
  db $66; add BX, DX
  inc CX
  cmp CX, Y2
  jbe @YLoop
END;
I skipped some obvious optimizations in that, but I think that you get the idea. That is a far cry from two instructions per pixel... For this, we use the carry flag on the 386. Instead of keeping the integer portion of the fixed point numbers in the top 16 bits of a register, we will now keep the integer portion of the fixed point number in the BOTTOM 16 bits of the register. When we add two fixed point numbers together and the fractional portions overflow, they will set the "carry" flag in the 386. We then use the "adc" instruction to add this onto the integer portion... Here is the more finished routine: (Remember: This is only for Y-Major lines!!)

PROCEDURE FixLine(X1, Y1, X2, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  mov DX, X2; sub DX, X1 { Calculate DeltaX }
  xor AX, AX
  mov CX, Y2; sub CX, Y1 { Calculate DeltaY }
  div CX                 { Slope := (LONGINT(DeltaX) SHL 16) DIV DeltaY }

  mov DX, AX
  db $66; shl DX, 16     { DX is the Increment variable }
  mov DX, 320            { Have it add 320 into it to move a line down }

  { Start out half a pixel in... }
  les DI, Screen.Buffer
  db $66; mov DI, 0; dw 8000h { mov EDI, 80000000h }

  { CX is still set to Y-Delta for counter }
  mov AL, Color
  add DI, X1           { This clears the Carry flag }
@YLoop:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle }

  { Very unrollable! }

  dec CX      { Does not affect Carry flag }
  jns @YLoop  { Loop YDelta Times! }
END;
This is a great fast line, but we still have a problem... This line can only go SouthEast (down an right). We cannot call it like this:
  FixLine(319, 0, 0, 199, 15);
Or else it will do bad things. What are we to do? Well, we could write a dedicated routine for lines that go through each quadrant of a graph, but I feel that that is too ineffiecient. (Tradeoff between speed and size, remember?)


SE vs SW

In order to make this a more general line routine, I'm going to do what is considered by many to be a coding sin. I'm going to have this routine do some self modification on itself to compensate for the differant X-direction... I feel this is a good compromise of speed vs size. All it has to change is the adc into an sbb. This just subtracts the fraction instead of adds it. But if we do this, we will start going up instead of down. (Minus the 320 in the whole number part of the slope goes up) Because of this, we also have to set the integer portion of the slope to be equal to -320 if we are going left. This is all accomplished by the code below:
PROCEDURE FixLine(X1, Y1, X2, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  mov AX, X2; sub AX, X1   { Calculate DeltaX }
  cwd
  xor AX, DX
  sub AX, DX
  mov BX, DX               { Save this to tell that we swaped... }
  mov DX, AX               { AX = ABS(X2-X1) }
  xor AX, AX

  mov CX, Y2; sub CX, Y1   { Calculate DeltaY }
  div CX                   { Slope := (LONGINT(DeltaY) SHL 16) DIV DeltaX; }

  mov DX, AX
  db $66; shl DX, 16       { DX is the Increment variable    }

  or BX, BX                { If BX = 0 then we are going SE  }
  jz @XPos                 { If BX = -1 then we are going SW }
@XNeg:  { X steps in negative direction }
  mov DX, -320             { Have it add 320 (minus a minus) }
  mov BYTE PTR [CS:@YLoop+1], 19h { Change adc into sbb }
  jmp @FlushCache
@XPos:  { X steps in positive direction }
  mov DX, 320        { Have it add 320 into it to move a line down }
  mov BYTE PTR [CS:@YLoop+1], 11h { Change sbb into adc }
  jmp @FlushCache
@FlushCache:               { Flush the CPU cache }

  { Start out half a pixel in... }
  les DI, Screen.Buffer
  db $66; mov DI, -320; dw 8000h { mov EDI, 80000000h }

{ CX is still set to Y-Delta for counter }
  mov AL, Color
  add DI, X1      { This clears the Carry flag }
@YLoop:
  db $66; adc DI, DX  { 1 cycle + 1 cycle Address Generation dependancy lock }
  mov ES:[DI], AL     { 1 cycle }

  dec CX      { Does not affect Carry flag }
  jns @YLoop  { Loop YDelta Times! }
END;
Notice this chunk of code: (Taken from the beginning of the routine)
  cwd
  xor AX, DX
  sub AX, DX
This useful little snippet takes the absolute value of AX without using a conditional jump... It can be used with AL or EAX very easily (just replace the cwd and convert all of the word registers to either byte or DWORD registers).


Unrolling the loop

Below is the finished Y-Major line routine. As you can see below, I have unrolled the loop four times. This spreads the overhead of the looping code over the four pixels. It can easily be unrolled more, or less depending on how time critical your loop is. Included is a rough estimate of the clock cycles required for a 486 (Taken from TASM 4.0's quick reference). It shows how this procedure makes a 3.75 cycle per pixel line possible.

As a note, you will find that fixed-point lines are not the fastest line for every possible case. A good bresenham line will be faster on extremely short lines, because of all of the initialization involved with fixed lines. These are all tradeoffs we must make...

PROCEDURE FixLine(X1, Y1, X2, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  mov AX, X2; sub AX, X1   { Calculate DeltaX }
  cwd
  xor AX, DX
  sub AX, DX
  mov BX, DX               { Save this to tell that we swaped... }
  mov DX, AX              { AX = ABS(X2-X1) }
  xor AX, AX

  mov CX, Y2; sub CX, Y1   { Calculate DeltaY }
  div CX                   { Slope := (LONGINT(DeltaY) SHL 16) DIV DeltaX; }

  mov DX, AX
  inc CX
  db $66; shl DX, 16       { DX is the Increment variable    }

  or BX, BX                { If BX = 0 then we are going SE  }
  jz @XPos                 { If BX = -1 then we are going SW }
@XNeg:  { X steps in negative direction }
  mov DX, -320             { Have it add 320 (minus a minus) }
  mov AL, 19h        { Opcode for sbb}
  jmp @DoneSelfModify
@XPos:  { X steps in positive direction }
  mov DX, 320        { Have it add 320 into it to move a line down }
  mov AL, 11h        { Opcode for adc }
@DoneSelfModify: 
  mov BYTE PTR [CS:@Iteration1+1], AL { Change adc into sbb }
  mov BYTE PTR [CS:@Iteration2+1], AL { Change adc into sbb }
  mov BYTE PTR [CS:@Iteration3+1], AL { Change adc into sbb }
  mov BYTE PTR [CS:@Iteration4+1], AL { Change adc into sbb }

  { Start out half a pixel in... }
  les DI, Screen.Buffer
  mov AL, Color
  db $66; mov DI, -320; dw 8000h { mov EDI, 80000000h }

{ CX is still set to Y-Delta for counter }
  mov BX, CX
  add DI, X1

  and BX, 3       { Get remainder of CX DIV 4               }
  shr CX, 2       { Repeated four times                     }
  add BX, BX      { Index into a word table                 }
  clc             { Make extra sure the carry flag is clear }
  jmp WORD PTR CS:[@JumpTable+BX]        { Clears the cache }

@JumpTable:
  dw OFFSET @Iteration1 { Enter at the right place... }
  dw OFFSET @Iteration4
  dw OFFSET @Iteration3
  dw OFFSET @Iteration2

@Iteration1:
  db $66; adc DI, DX  { 1 cycle }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration2:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration3:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration4:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }

  dec CX      { Does not affect Carry flag } { 1 cycle  }
  jns @Iteration1     { Loop YDelta Times! } { 3 cycles }  

  { 15 cycles total }
  { 15/4 = 3.75 cycles per pixel! }
END;


Example Program

This is just a pretty little routine that demonstrates lines. It generates the patterns using an effect called aliasing. Aliasing is generally considered a bad thing, but here we use it to make pretty patterns...

-----------------] Example Starts here [-----------------
PROGRAM ArbitraryLineExample;
USES
  GraphPro, { THE free graphics unit!  See: unit.htm for more information }
  CRT;      { Include the "KeyPressed" function }

PROCEDURE FixLine(X1, Y1, X2, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  mov AX, X2; sub AX, X1   { Calculate DeltaX }
  cwd
  xor AX, DX
  sub AX, DX
  mov BX, DX               { Save this to tell that we swaped... }
  mov DX, AX              { AX = ABS(X2-X1) }
  xor AX, AX

  mov CX, Y2; sub CX, Y1   { Calculate DeltaY }
  div CX                   { Slope := (LONGINT(DeltaY) SHL 16) DIV DeltaX; }

  mov DX, AX
  inc CX
  db $66; shl DX, 16       { DX is the Increment variable    }

  or BX, BX                { If BX = 0 then we are going SE  }
  jz @XPos                 { If BX = -1 then we are going SW }
@XNeg:  { X steps in negative direction }
  mov DX, -320             { Have it add 320 (minus a minus) }
  mov AL, 19h        { Opcode for sbb}
  jmp @DoneSelfModify
@XPos:  { X steps in positive direction }
  mov DX, 320        { Have it add 320 into it to move a line down }
  mov AL, 11h        { Opcode for adc }
@DoneSelfModify:      
  mov BYTE PTR [CS:@Iteration1+1], AL { Change adc into sbb }
  mov BYTE PTR [CS:@Iteration2+1], AL { Change adc into sbb }
  mov BYTE PTR [CS:@Iteration3+1], AL { Change adc into sbb }
  mov BYTE PTR [CS:@Iteration4+1], AL { Change adc into sbb }

  { Start out half a pixel in... }
  les DI, Screen.Buffer
  mov AL, Color
  db $66; mov DI, -320; dw 8000h { mov EDI, 80000000h }

{ CX is still set to Y-Delta for counter }
  mov BX, CX
  add DI, X1

  and BX, 3       { Get remainder of CX DIV 4               }
  shr CX, 2       { Repeated four times                     }
  add BX, BX      { Index into a word table                 }
  clc             { Make extra sure the carry flag is clear }
  jmp WORD PTR CS:[@JumpTable+BX]        { Clears the cache }

@JumpTable:
  dw OFFSET @Iteration5 { Enter at the right place... }
  dw OFFSET @Iteration4
  dw OFFSET @Iteration3
  dw OFFSET @Iteration2

@Iteration1:
  db $66; adc DI, DX  { 1 cycle }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration2:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration3:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration4:
  db $66; adc DI, DX  { 1 cycle + 1 cycle AGI lock }
  mov ES:[DI], AL     { 1 cycle + 1 cycle AGI lock }
@Iteration5:
  dec CX      { Does not affect Carry flag } { 1 cycle  }
  jns @Iteration1     { Loop YDelta Times! } { 3 cycles }  

  { 15 cycles total }
  { 15/4 = 3.75 cycles per pixel! }
END;

VAR
  I : INTEGER;
BEGIN
  InitGraph;
  FillChar(PTR($A000,0)^, 64000, 2); { Clear the screen to green }

  FOR I := 0 TO 198 DO
  BEGIN
    FixLine(0, 0, I, 199, I AND 1);             { Blue Line }
    FixLine(319, 0, 319-I, 199, I AND 1 SHL 2); { Red Line  }
  END;
  ReadKey;
  CloseGraph;
END.
-----------------] Example Ends here [-----------------


  • Created by Chris Lattner