Compiled Sprites

Table of Contents:
I understand sprites, but what's this?
What are the pro's and con's?
Okay, I get it, but how do I DO it
The Compiler
An example program for sprites


I understand sprites, but what's this?

Okay, okay, okay. You MUST be kidding. Another kind of sprite?? WHY?

Well, the complete need for speed sees no end. Compiled sprites are one more solution to the speed debate. Although they are very fast and have some other advantages, they have a number of very large drawbacks. Therefore, their use is not absolute.

But what IS a compiled sprite?

Instead of storing picture data in the "Data" portion of the sprite, a compiled sprite stores actual machine code for a subroutine that draws the sprite. Because of this arrangment, a compiled sprite only has to access memory twice when drawing a image instead of three times for a RLE or normal sprite. Here is a summary of how they differ:

Normal / RLE Sprite Compiled Sprite
  • Read instruction from code segment
  • Read pixel from sprite data
  • Write pixel to display memory
  • Read instruction (and sprite data) from code segment
  • Write pixel to display memory
  • As you can see, compiled sprite reduce the amount of overhead by storing all data in the form of mov instructions. This eliminates the read from Sprite.Data for the pixel color.


    What are the pro's and con's?

    Compiled sprites have a number of very good things going for them, but also a number of very bad things. Because of this, they are very controversial. Here is a quick list of the Pro's and con's:

    Compiled sprites
    The Good :) The Bad :(
  • They are very fast
  • They handle transparency very well
  • Special effects are possible
  • They can be very large
  • Clipping is impossible
  • They are not very portable
  • Because of this list, compiled sprites should not be used for general purpose stuff, but should be used in some instances. Because of their great speed and their awesome capability to handle transparencies (Transparent pixels are simply not included), they are great for use in a program where there is an overlay over the actively updated region. For example, in a 3D space battle game, the top portion of the screen is covered with a view of space "through the cockpit". Overlayed on top of that is an irregularly shaped cockpit graphic with all of the controls showing through. Because this is mostly transparent, it will actually be smaller than a "normal" sprite.

    The disadvantages of compiled sprites are limiting. Because they are compiled into machine code, a pixel can end up taking up to three bytes of memory (Typical values are much lower). In the real world, memory is very limited. With games requiring digitized sound and music, memory is something that cannot be wasted. The fact that they cannot be clipped is another limiting factor. Because of this, compiled sprites are used for very specialized purposes. For example, I am using a compiled sprite for the local player in my Senior Project. Because of the nature of the game, it is always pointed straight up and never changes size... The world rotates and scales underneath it. Another problem with compiled sprites is that they are not very portable. You cannot use a sprite designed for protected mode in real mode and the sprite itself cannot be displayed on any non-pc platform (Unless a sprite compiler is designed for each target platform).

    One of the benefits that I didn't explain was the possiblity for special effects. Because this is actual code, not just sprite data, the sprite can actually be designed to modify the screen when it is being drawn. As an example, look at the game "Jazz the Jackrabbit" published by Epic Software. It uses compiled sprites exclusively for the background tiles and the characters in it. If you have played it, you may have noticed a number of lens effects. This is because the sprites are not simple compiled bitmaps, they read the data already on the screen and rearrange it to apear like a lens is sitting on it. Other applications are endless. Instead of having just a transparent section, you can have a section that adds 128 to the pixel on the screen. Then when you set up your palette, have 0-127 be normal colors, and 128-255 be colors that are the same as 0-127, just darker. When this sprite passes over an area, it would darken that section of the screen, like looking through sunglasses.


    Okay, I get it, but how do I DO it?

    What, you don't just want the longest description ever about compiled sprites? You actually want to implement them????

    I guess it would make sense to write a converter that changes standard sprites into their compiled sprite equivelent. This is where it helps to know your machine code! Here are the opcodes we'll be using (In the first implementation):

  • C6h - mov r/m8, imm8 - 1 cycle
  • 01h - add r/m16, r16 - 1 cycle
  • CBh - retf - 13 cycles

    The mov opcode is used to set pixel data, the add opcode is used to skip over transparent areas, and the retf is used to return from the procedure. Here is a (small) example sprite, and how it should be compiled:

      X
     X*X
      X  
    
    This sprite should be compiled into the machine code for the following assembly routine:

    DiamondSprite:      ;The DS opcode isn't included.  It's just for clarity.
      mov DS:[DI+1], 1  ;X is color #1
      add DI, BX        ;BX holds the screen width
      mov DS:[DI], 1
      mov DS:[DI+1], 2  ;* is color #2
      mov DS:[DI+2], 1
      add DI, BX
      mov DS:[DI+1], 1  ;X is color #1
      retf
    
    There are a number of optimizations that could be made, but for the sake of clarity, I'll keep it simple. Here is a list of things that I can think of right now:

    • Storing same colored pixels in a register instead of as an immediate (Smaller)
    • Using word and double-word transfers to and from memory. (Smaller and faster)
    • Have "Subroutines" for commonly repeated patterns. (Saves size at the expense of speed)
    • Something to be aware of: Because these compiled sprites use DS to access the screen memory, they don't need a segment prefix (Saving one cycle and byte per pixel). But this also means that it cannot access variables in the data segment, and we have to make sure to restore it when we're done drawing a sprite.
    The routine that calls these sprites has a number of responsibilities. First, they have to take an X and Y coordinate and calculate the offset in screen memory it represents. It also has to fill BX with the width of the screen. Finally, it must make a far call to the routine, and return. It should look like this:

    PROCEDURE DrawCompiledSprite(VAR Sprite : SpriteType; X, Y : INTEGER); ASSEMBLER;
    ASM
      push DS                                   { Save data segment             }
      mov BX, Y                      
      add BX, BX
      mov DI, WORD [Screen.YTable+BX]           { Calculate DI                  } 
      add DI, X
      mov BX, Screen.Width                      { Initialize BX to Screen.Width }
      lds AX, Screen.Buffer
      add DI, AX                   { Sometimes the offset of Screen.Buffer <> 0 }
    
      les SI, Sprite
      call DWORD [ES:SI+SpriteType.Data]        { Call the Sprite!              }
      pop DS                                    { Unsave :) the data segment    }
    END;
    
    As you can see, this routine simply sets up the variables needed, and calls the compiled sprite!

    We need to upgrade the simple DrawSprite routine to dispatch to the DrawRLESprite and DrawCompiledSprite routines is applicable. This would make compiled and RLE sprites transparent to the programmer (and it is a simple addition). Here is the code that is stuck onto the front of DrawSprite:

    PROCEDURE DrawSprite(VAR Sprite : SpriteType; X, Y : INTEGER); ASSEMBLER;
    ASM
      les DI, Sprite
      cmp ES:[DI+SpriteType.SType], STStandard           { Just a boring one... }
      je @DrawSprite
      cmp ES:[DI+SpriteType.SType], STRLE                { Mmmm... RLE...       }
      jne @NotRLE
      pop BP                                             { Restore stack frame  }
      jmp DrawRLESprite
    @NotRLE:
      cmp ES:[DI+SpriteType.SType], STCompiled           { Yeah!  Compiled!     }
      jne @NotCompiled
      pop BP                                             { Restore stack frame  }
      jmp DrawCompiledSprite
    @NotCompiled:
    @DrawSprite:
    
    I just jump into DrawRLESprite and DrawCompiledSprite because they use the same parameters. If they didn't, then I couldn't do that. Notice that I pop BP before jumping to the procedure. This is because for some strange reason, TP saves BP even when there are no local variables... Anyways, we need to clean it off the stack. For more information, look in the inline help under "Inline assembler":"Entry and Exit".


    The Compiler

    Okay, now all we need is the routine that turns a normal sprite into a compiled sprite... A sprite Compiler. The sprite compiler is actually simpler than the RLE converter. With compiled sprites, we simply ignore transparent pixels. Hold your breath! Here we go!:

    PROCEDURE ConvertSpriteToCompiled(VAR Sprite : SpriteType);
    VAR
      SpriteSrc,                           { Current pointer into source sprite }
      SpriteDest : WORD;                   { Current pointer into dest sprite   }
      NewSprite  : SpriteType;             { Temporary sprite                   }
      Value      : BYTE;                   { Color of current pixel             }
      X, Y       : INTEGER;                { Current pixel being compiled       }
    BEGIN
      NewSprite.Width := Sprite.Width;
      NewSprite.Height := Sprite.Height;
      NewSprite.DataLen := Sprite.Width*Sprite.Height*5;           { Worst Case }
      NewSprite.SType   := STCompiled;
      GETMEM(NewSprite.Data, NewSprite.DataLen);        { Get a chunk of memory } 
    
      SpriteSrc := 0; SpriteDest := 0;
      FOR Y := 0 TO Sprite.Height-1 DO                  { Step over each pixel  }
      BEGIN
        FOR X := 0 TO Sprite.Width-1 DO
        BEGIN
          Value := Sprite.Data^[SpriteSrc];             { Speedup...            }
          IF Value <> 0 THEN                        { Ignore blank pixels!      }
    
          IF X = 0 THEN          { Special case of no offset from start of line }
          BEGIN
            NewSprite.Data^[SpriteDest  ] := $C6;   { Opcode for MOV            }
            NewSprite.Data^[SpriteDest+1] := $05;   { MOD/RM value for 0 offset }
            NewSprite.Data^[SpriteDest+2] := Value; { Value...                  }
            INC(SpriteDest, 3);
          END
          ELSE IF X < 128 THEN                      { Can use a byte offset!    }
          BEGIN
            NewSprite.Data^[SpriteDest  ] := $C6;   { Opcode for MOV            }
            NewSprite.Data^[SpriteDest+1] := $45;   { MOD/RM value for byte ofs }
            NewSprite.Data^[SpriteDest+2] := Lo(X); { Offset                    }
            NewSprite.Data^[SpriteDest+3] := Value; { Value...                  }
            INC(SpriteDest, 4);
          END
          ELSE
          BEGIN                                     { Must use a word offset... }
            NewSprite.Data^[SpriteDest  ] := $C6;   { Opcode for MOV            }
            NewSprite.Data^[SpriteDest+1] := $85;   { MOD/RM value for word ofs }
            NewSprite.Data^[SpriteDest+2] := Lo(X); { Low byte of the offset    }
            NewSprite.Data^[SpriteDest+3] := Hi(X); { High byte of the offset   }
            NewSprite.Data^[SpriteDest+4] := Value; { Value...                  }
            INC(SpriteDest, 5);
          END;
          INC(SpriteSrc);
        END;
        IF Y < (Sprite.Height-1) THEN           { Don't do this on the last line}
        BEGIN                                   { Add DI, BX }
          NewSprite.Data^[SpriteDest  ] := $01; { Opcode for Add r/m16, r16     }
          NewSprite.Data^[SpriteDest+1] := $DF; { MOD/RM byte for WORD BX       }
          INC(SpriteDest, 2);
        END;
      END;
      NewSprite.Data^[SpriteDest] := $CB;       { Add in our RetF               }
      INC(SpriteDest);
    
      KillSprite(Sprite);                       { Don't need this anymore!      }
      Sprite := NewSprite;
      Sprite.DataLen := SpriteDest;             { Resize the sprite data to the }
      GETMEM(Sprite.Data, Sprite.DataLen);      { exact amount needed           }
      Move(NewSprite.Data^, Sprite.Data^, Sprite.DataLen); { Copy the data      }
      KillSprite(NewSprite);
    END;
    
    In this routine, I made one small and simple optimization. The small block (starting with "IF Value <> 0 THEN") that writes the mov instruction uses the smallest form of the instruction it can. This varies depending on whether the offset is zero, less than 128 (a signed byte), or less than 32768 (A signed word). This means that we can still use sprites that are 320 pixels wide, but we don't have a penalty for small sprites.

    There are a number of optimizations that could be made (as discussed earlier) and will probably make up some future articles... Until then, I give you COMPILED SPRITES!!!


    Example Program

    This example program shows the relative speeds and sizes of the different kinds of sprites. It profiles them using three different test sprites, to show off the strengths and weaknesses of each of the three types (Standard, RLE, and compiled). This program uses the
    timer unit to achieve milisecond second accuracy. This ensures that the resolution of the timer does not interfere with the timings. You'll also have to grab the newest Sprite unit which includes the compiler and all of the other nifty stuff we've been creating.

    For an example, here are the values that I recieved on my 486sx33 (just the percents) from the example program:

    Sprite Timings
    Ball Checkerboard Solid box
    Standard Sprite: 100.00%
    RLE Sprite: 53.52%
    Compiled Sprite: 60.55%

    Standard Sprite: 100.00%
    RLE Sprite: 179.90%
    Compiled Sprite: 55.28%

    Standard Sprite: 100.00%
    RLE Sprite: 100.75%
    Compiled Sprite: 122.61%

    It should be pointed out that the Standard sprite does not use transparency at all, so it is just copying the pixels out. If it did take into consideration the transparency, then it would be much slower.

    As you can see from this example data (Run it on your machine... Cache and graphics cards make a big difference), the compiled sprites are faster when they are mostly transparent. Why is this? It is because the compiled sprites transfer data to memory in bite (byte) sized chunks using offsets, while the standard sprite is using a rep movs instruction. It is paying less for the instruction fetch because it only has to read one instruction from CS for each line. Anyways, enjoy compiled sprites, and don't overindulge!

    -----------------] Example Starts here [-----------------
    PROGRAM SpriteExample;
    USES GraphPro, Sprite, Timer;
    VAR
      SprNum : INTEGER;
      S : ARRAY[1..3] OF SpriteType;
      I            : INTEGER;
    
      T1T, T1S,   { Test 1 Time, Test 1 Size }
      T2T, T2S,   { Test 2 Time, Test 2 Size }
      T3T, T3S : ARRAY[1..3] OF REAL; { Sprites types 1..3 }
    
    CONST
      Size = 63;
    BEGIN
      InitGraph;
      StartTimer(1000);  { 1000 ticks per second... Accurate timer! }
    
      { Do the first test...                                      }
      ClrScr(0);
      FOR I := 0 TO Size SHR 1 DO                         { Ball sprite }
      BEGIN
        Line(Size SHR 1, I, I, Size SHR 1, 16+I);
        Line(Size SHR 1+1, I, Size-I, Size SHR 1, 16+I);
        Line(Size SHR 1, Size-I, I, Size SHR 1+1, 16+I);
        Line(Size SHR 1+1, Size-I, Size-I, Size SHR 1+1, 16+I);
      END;
    
      GetSprite(S[1], 0, 0, Size, Size);
      GetSprite(S[2], 0, 0, Size, Size);
      GetSprite(S[3], 0, 0, Size, Size);
    
      T1S[1] := S[1].DataLen;
      ConvertSpriteToRLE(S[2]);
      T1S[2] := S[2].DataLen;
      ConvertSpriteToCompiled(S[3]);
      T1S[3] := S[3].DataLen;
    
      FOR SprNum := 1 TO 3 DO
      BEGIN
        Time := 0;
        FOR I := 0 TO 319-Size DO
          DrawSprite(S[SprNum], I, (SprNum-1)*(Size+2));
    
        T1T[SprNum] := Time;
        T1T[SprNum] := T1T[SprNum] / 1000 / (320-Size); { Do not include division in time }
      END;
    
      { Do the second test...                                      }
      ClrScr(0);
      FOR I := 0 TO Size SHR 1 DO            { Checkerboard sprite }
      BEGIN
        Line(I*2, 0, Size, Size - I*2, 16+I);
        Line(0, I*2, Size - I*2, Size, 16+I);
      END;
    
      GetSprite(S[1], 0, 0, Size, Size);
      GetSprite(S[2], 0, 0, Size, Size);
      GetSprite(S[3], 0, 0, Size, Size);
    
      T2S[1] := S[1].DataLen;
      ConvertSpriteToRLE(S[2]);
      T2S[2] := S[2].DataLen;
      ConvertSpriteToCompiled(S[3]);
      T2S[3] := S[3].DataLen;
    
      FOR SprNum := 1 TO 3 DO
      BEGIN
        Time := 0;
        FOR I := 0 TO 319-Size DO
          DrawSprite(S[SprNum], I, (SprNum-1)*(Size+2));
        T2T[SprNum] := Time;
        T2T[SprNum] := T2T[SprNum] / 1000 / (320-Size); { Do not include division in time }
      END;
    
      { Do the second test...                                      }
      ClrScr(0);
      FOR I := 0 TO Size DO            { Checkerboard sprite }
        Line(I, 0, I, Size, 16+I);
    
      GetSprite(S[1], 0, 0, Size, Size);
      GetSprite(S[2], 0, 0, Size, Size);
      GetSprite(S[3], 0, 0, Size, Size);
    
      T3S[1] := S[1].DataLen;
      ConvertSpriteToRLE(S[2]);
      T3S[2] := S[2].DataLen;
      ConvertSpriteToCompiled(S[3]);
      T3S[3] := S[3].DataLen;
    
      FOR SprNum := 1 TO 3 DO
      BEGIN
        Time := 0;
        FOR I := 0 TO 319-Size DO
          DrawSprite(S[SprNum], I, (SprNum-1)*(Size+2));
        T3T[SprNum] := Time;
        T3T[SprNum] := T3T[SprNum] / 1000 / (320-Size); { Do not include division in time }
      END;
    
      EndTimer;
      CloseGraph;
    
      WRITELN('Summary of gathered data for ', Size+1,'x', Size+1, ' sprite: ');
      WRITELN;
      WRITELN('                                       Time (s)   | Size  | Time %');
    
      WRITELN('Test #1 - Ball, typical application');
      WRITELN('Standard:                             ', T1T[1]:2:9, ' |', T1S[1]:6:0,
        ' | ', T1T[1]/T1T[1]*100:3:2, '%');
      WRITELN('RLE     :                             ', T1T[2]:2:9, ' |', T1S[2]:6:0,
        ' | ', T1T[2]/T1T[1]*100:3:2, '%');
      WRITELN('Compiled:                             ', T1T[3]:2:9, ' |', T1S[3]:6:0,
        ' | ', T1T[3]/T1T[1]*100:3:2, '%');
      WRITELN;
    
      WRITELN('Test #2 - Checkerboard, RLE stress test');
      WRITELN('Standard:                             ', T2T[1]:2:9, ' |', T2S[1]:6:0,
        ' | ', T2T[1]/T2T[1]*100:3:2, '%');
      WRITELN('RLE     :                             ', T2T[2]:2:9, ' |', T2S[2]:6:0,
        ' | ', T2T[2]/T2T[1]*100:3:2, '%');
      WRITELN('Compiled:                             ', T2T[3]:2:9, ' |', T2S[3]:6:0,
        ' | ', T2T[3]/T2T[1]*100:3:2, '%');
      WRITELN;
    
      WRITELN('Test #3 - Solid, Compiled stress test');
      WRITELN('Standard:                             ', T3T[1]:2:9, ' |', T3S[1]:6:0,
        ' | ', T3T[1]/T3T[1]*100:3:2, '%');
      WRITELN('RLE     :                             ', T3T[2]:2:9, ' |', T3S[2]:6:0,
        ' | ', T3T[2]/T3T[1]*100:3:2, '%');
      WRITELN('Compiled:                             ', T3T[3]:2:9, ' |', T3S[3]:6:0,
        ' | ', T3T[3]/T3T[1]*100:3:2, '%');
      WRITELN;
      WRITE('Press Enter to continue!');
      READLN;
    END.
    
    -----------------] Example Ends here [-----------------


  • Created by Chris Lattner