Beyond pixels: Lines

Table of Contents:
Horizontal and Vertical Lines
Why bother?
Optimizing horizontal lines
Optimizing vertical lines
A example program for axial lines


Axial Lines

The easiest flavor of lines are axial aligned lines. This means that they are parrellel to an axis (X or Y). To draw an axial line, simply step along the axis that the line is parrellel to and draw a pixel for each step along the way. Here are two unoptimized versions of HLine and VLine (Horizontal and Vertical line):
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE);
VAR X : INTEGER;
BEGIN
  FOR X := X1 TO X2 DO
    SetPoint(X, Y, Color);
END;

PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE);
VAR Y : INTEGER;
BEGIN
  FOR Y := Y1 TO Y2 DO
    SetPoint(X, Y, Color);
END;
As you can see, this is just an optimization on a special case of line. In this article, we will optimize these special cases, then in the next article, we will move on to arbitrary lines.


Why bother with axial lines?

Q: Why should we even spend any time worrying about this?

A: Sometimes you find yourself drawing mostly axial lines. In GUIs for example, a majority of lines drawn are either horizontal or vertical. Boxes are composed of horizontal and vertical lines. The real question should be "Why not?"!


Optimizing the h*ll out of HLine

First off, we shouldn't just call SetPoint repeatedly. We have the unneccesary overhead of the call, we reload the ES register for every pixel, and we calculate the address of each pixel each time through the loop. One solution would be to just inline the procedure and start breaking it apart. Instead, I will take advantage of the linearity of video memory and start from scratch.

Because video memory is linear, each pixel in a horizontal line is one byte away from the pixel next to it. This means that we can calculate the starting address of the line, and use incremental math to figure out the address of each following point. Here is some Pascal code that implements this:

PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE);
VAR 
  X : INTEGER;
  Address : WORD;
BEGIN
  Address := ScreenY[Y];
  FOR X := X1 TO X2 DO
    Screen^[Address+X1] := Color;
END;
This is a very fast procedure, with most of the cycles go into processing the FOR loop. In order to get rid of them, we must convert to assembly language. Here is that procedure converted into assembler:

PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen  { Only load ES and DI at the start }
  add DI, X1
  mov BX, Y       { Y is an index into the ScreenY array        }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET ScreenY]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (X2-X1+1)... }
  
  mov CX, X2
  sub CX, X1      { Calculate the Delta X  }
  mov AL, Color   { Load the color         }
@XLoop:
  mov ES:[DI], AL { Set a pixel            }
  inc DI          { Increment address      }
  dec CX          { Decrement loop counter }
  jnz @XLoop      { Repeat if neccesary    }
END;
This is a pretty fast line, but it can be signifigantly faster. Of course, we could unroll the loop a couple of times, but with the built in string instructions of the 80x86 processors we can set pixels in a hurry. The STOSB instruction can be used to do all of this in not many cycles... Here is an updated version:
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen  { Only load ES and DI at the start }
  add DI, X1
  mov BX, Y       { Y is an index into the ScreenY array        }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET ScreenY]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (X2-X1+1)... }
  
  mov CX, X2
  sub CX, X1
  inc CX
  mov AL, Color   { Load the color }
  rep stosb     { Nice eh?       }
END;
What? You want to go faster still? Well on a 16 bit VGA card (which everyone has), we can move pixels twice as fast as this! With a 16 bit VGA, using WORD stores instead of byte stores can transfer twice as much data as the last routine. We must be careful though, because if we start on an odd address, the benefits will be canceled out because of bad data alignment. This routine will also work on 8 bit VGA cards (major oldies) because the CPU will automatically split the access...
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen  { Only load ES and DI at the start }
  add DI, X1
  mov BX, Y       { Y is an index into the ScreenY array        }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET ScreenY]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (X2-X1+1)... }

  mov CX, X2
  mov AL, Color   { Load the color }
  sub CX, X1
  inc CX

  test DI, 1      { Are we on an odd address?  }
  jz @Even
  stosb           { Store the first byte       }
  dec CX          { Decrement the counter one  }
@Even:            { Even address               }

  shr CX, 1       { Do half as many transfers  }
  rep stosw       { Copy all of the even words } 
  adc CX, 0       { If there is a pixel left over, set CX to 1 }
  rep stosb       { If there is a pixel left over, fill it!    }
END;
To make this faster, you could rearrange the function to avoid stalls on a 486+. This is a very FAST procedure that we will use later. For now, lets move on to vertical lines...


It's VLine's Turn!

So now that we have a blazing fast horizontal line routine, should we maybe implement a vertical one? Why, yes we say! Although we won't be able to get the VLine routine to go as fast as the HLine routine, it will be quite a bit faster than our general line routine and will go very fast. The main obstacle is that 16 bit transfers are not possible (Because the pixels are not next to each other) and string moves can't be used.

Here is the VLine routine with additive address computation:

PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE);
VAR 
  Y : INTEGER;
  Address : WORD;
BEGIN
  Address := ScreenY[Y1]+X;    { Calculate the first pixels address }
  FOR Y := Y1 TO Y2 DO
  BEGIN
    Screen^[Address] := Color;
    Address := Screen.Width;   { = to 320 in this case... }
  END;
END;
Again, now most of the cycles go into the loop processing. Here is an assembly version that avoids some of it:
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen  { Only load ES and DI at the start }
  add DI, X
  mov BX, Y1      { Y1 is an index into the ScreenY array       }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET ScreenY]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (Y2-Y1+1)... }
  
  mov CX, Y2
  sub CX, Y1      { Calculate the Delta X  }
  mov BX, Screen.Width { 320 }
  mov AL, Color   { Load the color         }
@YLoop:
  mov ES:[DI], AL { Set a pixel            }
  add DI, BX      { Increment address      }
  dec CX          { Decrement loop counter }
  jnz @YLoop      { Repeat if neccesary    }
END;
To get this to go any faster, we must unroll the loop. I will unroll it four times, and add initialization code that enters in the correct place...
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen  { Only load ES and DI at the start }
  add DI, X
  mov BX, Y1      { Y1 is an index into the ScreenY array       }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET ScreenY]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (Y2-Y1+1)... }
  
  mov CX, Y2
  sub CX, Y1      { Calculate the Delta X  }
  inc CX

  mov DX, Screen.Width { 320 }
  mov AL, Color   { Load the color         }

  mov BX, CX
  and BX, 3       { Get remainder of CX DIV 4 }
  shr CX, 2       { Repeated four times    }
  add BX, BX      { Index into a word table}
  jmp WORD PTR CS:[@JumpTable+BX]

@JumpTable:
  dw OFFSET @Iteration5
  dw OFFSET @Iteration4
  dw OFFSET @Iteration3
  dw OFFSET @Iteration2

@Iteration1:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration2:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration3:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration4:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration5:
  dec CX          { Decrement loop counter }
  jns @Iteration1 { Repeat if neccesary    }
END;
There are some things to be aware of when using these routines... For one, these routines do not check the order of the coordinates passed to them. If you don't know the order, just call the "Line" routine presented in one of the next sections. These routines will be called with sorted coordinates. Another thing to be aware of is that these routines do no clipping. Wait till LineC...

Although VLine is fast, you will see that the HLine routine is twice as fast as this...


Line timing example

The following program tests these lines for speed and reports to you the number of pixels drawn in millions of pixels per second. This example shows very plainly that the HLine routine with 16 bit access is twice as fast as HLine.

As a note, I have done some slight reorganization on the global variables. I stuck all of the global variables related to these functions in one record. It just makes more sense, and is easier to keep track of.

The ReadTimer function presented below just reads from the BIOS the number of clock ticks that have occured since midnight. To get this into seconds, we must divide by 18.2 since the clock ticks at 18.2 times a second.

Without any further ado (is that a word?)....

-----------------] Example Starts here [-----------------
PROGRAM LineExample;
USES CRT;   { Include the "KeyPressed" function }
TYPE
  ScreenBufferType = ARRAY[0..63999] OF BYTE;
  ScreenBufferPtr = ^ScreenBufferType;
  ScreenType = RECORD
    Buffer : ScreenBufferPtr;
    YTable : ARRAY[0..199] OF WORD;   { Y can range from 0 - 199 }
    Width  : WORD;
  END;

VAR
  Screen : ScreenType;

PROCEDURE CalcScreenY(Width : WORD);  { Allow for future expansion }
VAR I : INTEGER;
BEGIN
  FOR I := 0 TO 199 DO
    Screen.YTable[I] := I*Width;
END;

PROCEDURE InitGraph;
BEGIN
  ASM
    mov AX, 0013h    { Function 0, mode 13h }
    int 10h
  END;
  Screen.Buffer := PTR($A000, 0);   { Set up the screen buffer pointer }
  Screen.Width := 320;
  CalcScreenY(320);
END;

PROCEDURE CloseGraph; ASSEMBLER;
ASM
  mov AX, 0003h      { Function 0, Mode 3}
  int 10h
END;

PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen.Buffer  { Only load ES and DI at the start }
  add DI, X1
  mov BX, Y       { Y is an index into the ScreenY array        }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET Screen.YTable]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (X2-X1+1)... }

  mov CX, X2
  mov AL, Color   { Load the color }
  mov AH, AL
  sub CX, X1
  inc CX

  test DI, 1      { Are we on an odd address?  }
  jz @Even
  stosb           { Store the first byte       }
  dec CX          { Decrement the counter one  }
@Even:            { Even address               }
  shr CX, 1       { Do half as many transfers  }
  rep stosw       { Copy all of the even words }
  adc CX, 0       { If there is a pixel left over, set CX to 1 }
  rep stosb       { If there is a pixel left over, fill it!    }
END;

PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
  les DI, Screen.Buffer  { Only load ES and DI at the start }
  add DI, X
  mov BX, Y1      { Y1 is an index into the ScreenY array       }
  add BX, BX      { Multipy by two because each entry is a word }
  add DI, DS:[BX+OFFSET Screen.YTable]

  { At this point, DI points to the first pixel in the line.  }
  { Now we will calculate how many times to loop (Y2-Y1+1)... }

  mov CX, Y2
  sub CX, Y1      { Calculate the Delta X  }
  inc CX

  mov DX, Screen.Width { 320 }
  mov AL, Color   { Load the color         }

  mov BX, CX
  and BX, 3       { Get remainder of CX DIV 4 }
  shr CX, 2       { Repeated four times    }
  add BX, BX      { Index into a word table}
  jmp WORD PTR CS:[@JumpTable+BX]

@JumpTable:
  dw OFFSET @Iteration5
  dw OFFSET @Iteration4
  dw OFFSET @Iteration3
  dw OFFSET @Iteration2

@Iteration1:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration2:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration3:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration4:
  mov ES:[DI], AL { Set a pixel            }
  add DI, DX      { Increment address      }

@Iteration5:
  dec CX          { Decrement loop counter }
  jns @Iteration1 { Repeat if neccesary    }
END;

FUNCTION ReadTimer : LONGINT; ASSEMBLER;
ASM mov AX, 40h; mov ES, AX; mov AX, ES:[6Ch]; mov DX, ES:[6Eh] END;

VAR
  I, J : INTEGER;
  HLineTimer,
  VLineTimer : LONGINT;
  HLineTime,
  VLineTime : REAL;
BEGIN
  InitGraph;

  HLineTimer := ReadTimer;
  FOR J := 0 TO 99 DO
  FOR I := 0 TO 199 DO
    HLine(0, I, 319, I);
  HLineTimer := ReadTimer - HLineTimer;
  HLineTime := HLineTimer / 18.2;

  VLineTimer := ReadTimer;
  FOR J := 0 TO 99 DO
  FOR I := 0 TO 319 DO
    VLine(I, 0, 199, I);
  VLineTimer := ReadTimer - VLineTimer;
  VLineTime := VLineTimer / 18.2;

{  WHILE NOT KeyPressed DO; {}
  CloseGraph;
  WRITELN('HLine time: ', HLineTime :4:4, ' seconds');
  WRITELN('Number of pixels in HLine test: ', 100*200*320);
  WRITELN('Millions of pixels per second: ', (100*200*320) / HLineTime / 1000000:4:6);
  WRITELN;

  WRITELN('VLine time: ', VLineTime :4:4, ' seconds');
  WRITELN('Number of pixels in VLine test: ', 100*320*200);
  WRITELN('Millions of pixels per second: ', (100*320*200) / VLineTime / 1000000:4:6);
  WRITELN;

  WRITELN('Press any key to continue or any other key to quit');
  WHILE NOT KeyPressed DO;
END.
-----------------] Example Ends here [-----------------


  • Created by Chris Lattner