A: Sometimes you find yourself drawing mostly axial lines. In
GUIs for example, a majority of lines drawn are either horizontal or vertical. Boxes are
composed of horizontal and vertical lines. The real question should be "Why not?"!
Because video memory is linear, each pixel in a horizontal line is one byte away from
the pixel next to it. This means that we can calculate the starting address of the
line, and use incremental math to figure out the address of each following point. Here
is some Pascal code that implements this:
Here is the VLine routine with additive address computation:
Although VLine is fast, you will see that the HLine routine is twice as fast as this...
As a note, I have done some slight reorganization on the global variables. I stuck
all of the global variables related to these functions in one record. It just makes
more sense, and is easier to keep track of.
The ReadTimer function presented below just reads from the BIOS the number of clock ticks
that have occured since midnight. To get this into seconds, we must divide by 18.2 since
the clock ticks at 18.2 times a second.
Without any further ado (is that a word?)....
The easiest flavor of lines are axial aligned lines. This means that they are
parrellel to an axis (X or Y). To draw an axial line, simply step along the
axis that the line is parrellel to and draw a pixel for each step along the
way. Here are two unoptimized versions of HLine and VLine (Horizontal and
Vertical line):
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE);
VAR X : INTEGER;
BEGIN
FOR X := X1 TO X2 DO
SetPoint(X, Y, Color);
END;
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE);
VAR Y : INTEGER;
BEGIN
FOR Y := Y1 TO Y2 DO
SetPoint(X, Y, Color);
END;
As you can see, this is just an optimization on a special case of line. In this article,
we will optimize these special cases, then in the next article, we will move on to arbitrary lines.
Q: Why should we even spend any time worrying about this?
First off, we shouldn't just call SetPoint repeatedly. We have the unneccesary overhead
of the call, we reload the ES register for every pixel, and we calculate
the address of each pixel each time through the loop. One solution would be to just
inline the procedure and start breaking it apart. Instead, I will take advantage of
the linearity of video memory and start from scratch.
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE);
VAR
X : INTEGER;
Address : WORD;
BEGIN
Address := ScreenY[Y];
FOR X := X1 TO X2 DO
Screen^[Address+X1] := Color;
END;
This is a very fast procedure, with most of the cycles go into processing the FOR loop.
In order to get rid of them, we must convert to assembly language. Here is that procedure
converted into assembler:
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen { Only load ES and DI at the start }
add DI, X1
mov BX, Y { Y is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET ScreenY]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (X2-X1+1)... }
mov CX, X2
sub CX, X1 { Calculate the Delta X }
mov AL, Color { Load the color }
@XLoop:
mov ES:[DI], AL { Set a pixel }
inc DI { Increment address }
dec CX { Decrement loop counter }
jnz @XLoop { Repeat if neccesary }
END;
This is a pretty fast line, but it can be signifigantly faster. Of course, we could
unroll the loop a couple of times, but with the built in string instructions of the
80x86 processors we can set pixels in a hurry. The STOSB instruction can be used to
do all of this in not many cycles... Here is an updated version:
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen { Only load ES and DI at the start }
add DI, X1
mov BX, Y { Y is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET ScreenY]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (X2-X1+1)... }
mov CX, X2
sub CX, X1
inc CX
mov AL, Color { Load the color }
rep stosb { Nice eh? }
END;
What? You want to go faster still? Well on a 16 bit VGA card (which everyone has), we can move pixels twice
as fast as this! With a 16 bit VGA, using WORD stores instead of byte stores can transfer
twice as much data as the last routine. We must be careful though, because if we start
on an odd address, the benefits will be canceled out because of bad data alignment. This
routine will also work on 8 bit VGA cards (major oldies) because the CPU will automatically
split the access...
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen { Only load ES and DI at the start }
add DI, X1
mov BX, Y { Y is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET ScreenY]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (X2-X1+1)... }
mov CX, X2
mov AL, Color { Load the color }
sub CX, X1
inc CX
test DI, 1 { Are we on an odd address? }
jz @Even
stosb { Store the first byte }
dec CX { Decrement the counter one }
@Even: { Even address }
shr CX, 1 { Do half as many transfers }
rep stosw { Copy all of the even words }
adc CX, 0 { If there is a pixel left over, set CX to 1 }
rep stosb { If there is a pixel left over, fill it! }
END;
To make this faster, you could rearrange the function to avoid stalls on a 486+. This
is a very FAST procedure that we will use later. For now, lets move on
to vertical lines...
So now that we have a blazing fast horizontal line routine, should we maybe implement a
vertical one? Why, yes we say! Although we won't be able to get the VLine routine to
go as fast as the HLine routine, it will be quite a bit faster than our general line
routine and will go very fast. The main obstacle is that 16 bit transfers are not
possible (Because the pixels are not next to each other) and string moves can't be used.
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE);
VAR
Y : INTEGER;
Address : WORD;
BEGIN
Address := ScreenY[Y1]+X; { Calculate the first pixels address }
FOR Y := Y1 TO Y2 DO
BEGIN
Screen^[Address] := Color;
Address := Screen.Width; { = to 320 in this case... }
END;
END;
Again, now most of the cycles go into the loop processing. Here is an assembly version
that avoids some of it:
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen { Only load ES and DI at the start }
add DI, X
mov BX, Y1 { Y1 is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET ScreenY]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (Y2-Y1+1)... }
mov CX, Y2
sub CX, Y1 { Calculate the Delta X }
mov BX, Screen.Width { 320 }
mov AL, Color { Load the color }
@YLoop:
mov ES:[DI], AL { Set a pixel }
add DI, BX { Increment address }
dec CX { Decrement loop counter }
jnz @YLoop { Repeat if neccesary }
END;
To get this to go any faster, we must unroll the loop. I will unroll it four times, and
add initialization code that enters in the correct place...
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen { Only load ES and DI at the start }
add DI, X
mov BX, Y1 { Y1 is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET ScreenY]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (Y2-Y1+1)... }
mov CX, Y2
sub CX, Y1 { Calculate the Delta X }
inc CX
mov DX, Screen.Width { 320 }
mov AL, Color { Load the color }
mov BX, CX
and BX, 3 { Get remainder of CX DIV 4 }
shr CX, 2 { Repeated four times }
add BX, BX { Index into a word table}
jmp WORD PTR CS:[@JumpTable+BX]
@JumpTable:
dw OFFSET @Iteration5
dw OFFSET @Iteration4
dw OFFSET @Iteration3
dw OFFSET @Iteration2
@Iteration1:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration2:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration3:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration4:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration5:
dec CX { Decrement loop counter }
jns @Iteration1 { Repeat if neccesary }
END;
There are some things to be aware of when using these routines... For one, these routines
do not check the order of the coordinates passed to them. If you don't know the order,
just call the "Line" routine presented in one of the next sections. These routines will be
called with sorted coordinates. Another thing to be aware of is that these routines do
no clipping. Wait till LineC...
The following program tests these lines for speed and reports to you the number of pixels
drawn in millions of pixels per second. This example shows very plainly that the HLine
routine with 16 bit access is twice as fast as HLine.
PROGRAM LineExample;
USES CRT; { Include the "KeyPressed" function }
TYPE
ScreenBufferType = ARRAY[0..63999] OF BYTE;
ScreenBufferPtr = ^ScreenBufferType;
ScreenType = RECORD
Buffer : ScreenBufferPtr;
YTable : ARRAY[0..199] OF WORD; { Y can range from 0 - 199 }
Width : WORD;
END;
VAR
Screen : ScreenType;
PROCEDURE CalcScreenY(Width : WORD); { Allow for future expansion }
VAR I : INTEGER;
BEGIN
FOR I := 0 TO 199 DO
Screen.YTable[I] := I*Width;
END;
PROCEDURE InitGraph;
BEGIN
ASM
mov AX, 0013h { Function 0, mode 13h }
int 10h
END;
Screen.Buffer := PTR($A000, 0); { Set up the screen buffer pointer }
Screen.Width := 320;
CalcScreenY(320);
END;
PROCEDURE CloseGraph; ASSEMBLER;
ASM
mov AX, 0003h { Function 0, Mode 3}
int 10h
END;
PROCEDURE HLine(X1, Y, X2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen.Buffer { Only load ES and DI at the start }
add DI, X1
mov BX, Y { Y is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET Screen.YTable]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (X2-X1+1)... }
mov CX, X2
mov AL, Color { Load the color }
mov AH, AL
sub CX, X1
inc CX
test DI, 1 { Are we on an odd address? }
jz @Even
stosb { Store the first byte }
dec CX { Decrement the counter one }
@Even: { Even address }
shr CX, 1 { Do half as many transfers }
rep stosw { Copy all of the even words }
adc CX, 0 { If there is a pixel left over, set CX to 1 }
rep stosb { If there is a pixel left over, fill it! }
END;
PROCEDURE VLine(X, Y1, Y2 : INTEGER; Color : BYTE); ASSEMBLER;
ASM
les DI, Screen.Buffer { Only load ES and DI at the start }
add DI, X
mov BX, Y1 { Y1 is an index into the ScreenY array }
add BX, BX { Multipy by two because each entry is a word }
add DI, DS:[BX+OFFSET Screen.YTable]
{ At this point, DI points to the first pixel in the line. }
{ Now we will calculate how many times to loop (Y2-Y1+1)... }
mov CX, Y2
sub CX, Y1 { Calculate the Delta X }
inc CX
mov DX, Screen.Width { 320 }
mov AL, Color { Load the color }
mov BX, CX
and BX, 3 { Get remainder of CX DIV 4 }
shr CX, 2 { Repeated four times }
add BX, BX { Index into a word table}
jmp WORD PTR CS:[@JumpTable+BX]
@JumpTable:
dw OFFSET @Iteration5
dw OFFSET @Iteration4
dw OFFSET @Iteration3
dw OFFSET @Iteration2
@Iteration1:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration2:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration3:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration4:
mov ES:[DI], AL { Set a pixel }
add DI, DX { Increment address }
@Iteration5:
dec CX { Decrement loop counter }
jns @Iteration1 { Repeat if neccesary }
END;
FUNCTION ReadTimer : LONGINT; ASSEMBLER;
ASM mov AX, 40h; mov ES, AX; mov AX, ES:[6Ch]; mov DX, ES:[6Eh] END;
VAR
I, J : INTEGER;
HLineTimer,
VLineTimer : LONGINT;
HLineTime,
VLineTime : REAL;
BEGIN
InitGraph;
HLineTimer := ReadTimer;
FOR J := 0 TO 99 DO
FOR I := 0 TO 199 DO
HLine(0, I, 319, I);
HLineTimer := ReadTimer - HLineTimer;
HLineTime := HLineTimer / 18.2;
VLineTimer := ReadTimer;
FOR J := 0 TO 99 DO
FOR I := 0 TO 319 DO
VLine(I, 0, 199, I);
VLineTimer := ReadTimer - VLineTimer;
VLineTime := VLineTimer / 18.2;
{ WHILE NOT KeyPressed DO; {}
CloseGraph;
WRITELN('HLine time: ', HLineTime :4:4, ' seconds');
WRITELN('Number of pixels in HLine test: ', 100*200*320);
WRITELN('Millions of pixels per second: ', (100*200*320) / HLineTime / 1000000:4:6);
WRITELN;
WRITELN('VLine time: ', VLineTime :4:4, ' seconds');
WRITELN('Number of pixels in VLine test: ', 100*320*200);
WRITELN('Millions of pixels per second: ', (100*320*200) / VLineTime / 1000000:4:6);
WRITELN;
WRITELN('Press any key to continue or any other key to quit');
WHILE NOT KeyPressed DO;
END.