You can find all the source code in this article on my repo.

Defender68

My midterm project for my first semester at the Florida Interactive Entertainment Academy was to create a game in the Motorola 68000 assembly language. Most of these projects from past semesters resulted in very simple games about on the level of Pong, only more buggy.

For some reason, I decided to go ham.

This project took over 2000 lines of (assembly) code (with comments) and way too many hours.

Assembly for n00bz

Just a bit of preface before we get into the nitty gritty of the rest of the project. First off, if you’ve never used assembly, it’s not as scary as you might think. You’ll of course be missing some all of the features you know and love from high-level programming languages, but once you get used to it you’ll find a sort of comfort in the small amounts of data you’re working with at a time.

Don’t think that assembly is some cryptic unreadable language filled with esoteric symbols. When used correctly and written by a human the code produced is quite readable:

lea.l	loadingScreenBMP, a0
move.l	#LOADING_SCREEN_POSITION, d0	; draw x, y
clr.l	d1								; bmp x, y
move.l	#LOADING_SCREEN_DIMENSIONS, d2	; width, height
bsr.w	DrawAlphaBitmap

Even if you’ve never seen assembly before, you can probably guess what that the above code draws a bitmap. What helps is that I’ve created labels for addresses and constants used in the above code. Assembly generated by a compiler usually won’t look as pretty. Luckily, when you’re looking at compiler-generated code, it’s usually because you’re debugging C/C++ and you have the source code right-there as reference.

Finally, if you’ve learned one assembly language, you’ve learned them all. Every assembly language I’ve seen follows similar semantics for instruction names, register names, semantics for using immediates and labels, etc. Of course, if you’re going to have a job where you’re going to be debugging ARM disassembly often, you’ll probably want to have some specific experience in ARM. But if you only know one instruction set intimately, you’ll probably be fine getting past the occasional need to view disassembly.

Goals

Since this was an assignment from my professor, the project of course had some academic goals:

  • To grow familiarity with the 68k assembly language (expertise which should be helpful in viewing any kind of assembly later in our careers)
  • To deeply understand how registers, memory, and the call stack all work together.
  • To learn how to manually draw bitmaps. This experience will translate into reading binary files in general in the industry.
  • To be forced to use and get comfortable with a debugger. There’s no printf in assembly, so the only way to debug our code is with EASy68K’s excellent debugger. Many students coming out of graduate schools have no experience using debuggers, so this is an excellent primer in getting us to use breakpoints, step-forward, step-into, step-out, viewing a hexdump of memory, and more.

With that all out of the way, let’s take a tour of what I consider to be some of the most interesting parts of the project.

Bitmap Efficiency

For those who don’t know, a bitmap is one of the simplest file formats there is. Typically, the pixel-data is laid out sequentially in binary format, pixel-by-pixel. Assuming you’re using 32-bit color depth, then each pixel will be 32-bits in length, 8 bits each for red, green, blue, and alpha. Conceptually, drawing them is very simple: think of the bitmap as a 2D-array and do a double-for loop over it.

It was a requirement of the project that we have a bitmap background that must be re-drawn after a sprite/object moves over it. You’re likely going to be having many different things moving each frame, meaning that the spaces where they were are now “invalidated” (that’s a technical term used in the biz). The naive solution is to re-draw the entire background every frame, then re-draw everything that exists on top of the background too. I think you can see where the problem in that is, especially on 16MHz processors like the 68000.

The solution to this is pretty simple: Only draw the space where the object was last frame. This can be achieved in one of two ways:

  1. Re-draw the bitmap as soon as you move something.
  2. Store where the object was and its size so that the re-draw can be done all at once later in the frame.

I tried out both techniques in my game, storing the information of where the spaceships and projectiles were in a fixed-size stack data structure I’d created. In my application, option 1 turned out to take less CPU cycles, so that’s what I went with. Though option 2 is a bit cleaner and cooler.

Even with this optimization, drawing bitmaps is easily the most time-consuming operation. And in my game it was even worse. Most students made their games run at EASy68K’s default resolution of 640x480. I’d decided to go big with a resolution of 1920x780. That’s almost 5x the pixels, which would translate into approximately 5x more clock cycles just to draw the background.

The solution was essentially to use a loophole in the assignment. If I fill the bitmap with mostly alpha pixels, I can skip over most of the image. This vastly improved the frame rate.

The optimizations didn’t stop there though. Much of the work in drawing a single pixel was preparing the bytes for EASy68K’s SETPEN trap code. This trap code expects the pixel data to be in the format 00GGBBRR. Meaning the most significant byte must be all zeroes, then the next byte is green, the next byte blue, and the least significant byte red. If your bitmap image stores information in the format RRGGBBAA then a lot of time must be wasted just to shuffle the data around and zero out the alpha bits.

I got around this by pre-processing the bitmaps on-load. That way they’re stored in-memory in the exact format EASy68K wants it. Sans the most significant bit, which I left as either 0 or 1 to signify whether or not it is an alpha bit. After loading the pixel into a register, I used the instruction bclr.l which sets a single bit to 0 while also testing whether or not that bit was 0 or 1 to begin with. Knowing this information, I can either skip the pixel or go on to drawing it.

Pseudo-Random Number Generation

Exact techniques in generating random numbers are fascinating. We often take rng for granted in programming. I have a lot of randomness in my game. The terrain is entirely procedural, and the ships spawn randomly too, both their positions and which sprite is used. Knowing what I wanted for the game, I was very paranoid early on in development at how expensive generating random numbers on-demand might be.

My professor hated this dirty trick, but it worked for me: I only generated one random number per frame. The reason this works is because of the properties of modulus and how different the applications for which my random numbers may be used.

Let’s say we’ve got the following all going on in one frame (worst-case scenario):

  • I need randomly pick a new x, y coordinate for the next terrain vertex.
    • The x coord can be in the range 100 to 200
    • The y coord can be in the range 540 to 690
  • I need to randomly pick a new x, y coordinate for the next enemy to spawn.
    • The x coord will always be the same: the right side of the screen.
    • The y coord can be in the range 0 to 520
  • I need to randomly pick a sprite for the enemy ship.
    • The index will be anywhere from 0 to 8

Let’s say for this frame my random number is 22891. We can calculate each needed number with the formula:

min + r % (max - min)

  • Our random terrain vertex is (191, 631)
  • Our random enemy y coord is 11
  • Our random index is 3

Those all look pretty random and unrelated to me! Of course you can spot some similarities, such as all the numbers being odd. The parity of all these uses of the random number will always be the same. Luckily, that won’t be very noticeable when rendered on-screen.

The exact algorithm I used is a form of Linear Congruential Generator. The code is fairly straightforward and fast, but there’s a problem with my implementation: it only generates a 16-bit value. If I want to get a 32-bit value out of it, I’ll need to run the algorithm twice. I decided to use a lazy trick here too: If the number last frame was random, then half of it will be random again this frame. So I keep the lower random 16 bits from last frame and that becomes the upper 16 bits for this frame.

Data Structures

I didn’t think I’d ever write a Queue or Set in Motorola 68000 assembly in 2019, but here I am. I’ll just talking about the Queue here for brevity. The data layout of my Queue looks like the following:

variable width comments
capacity 16-bits max number of elements
size 16-bits current number of elements
head 16-bits where we dequeue
tail 16-bits where we enqueue
bytesPer 16-bits how many bytes per element
array (bytesPer * capacity) bytes where all the data is stored

Every one of my Queue functions take two arguments: a starting address to the Queue and a n-bytes of data to add on the top of the stack. The key to readability within these methods is using labeled offsets:

Q_CAPACITY	equ	0
Q_SIZE		equ	2
Q_HEAD		equ	4
Q_TAIL		equ	6
Q_BYTES_PER	equ	8
Q_ARRAY		equ	10

This way a line of code which gets the tail index:

move.w	Q_TAIL(a0), d0

Looks a bit more readable.

Enemies

One of my personal goals for this project was to have a lot of enemies. I’d seen previous students work on these projects and they often only had one enemy on-screen at a time. Often they didn’t even have an enemy, more like an “adversary” such as an omnipotent pong paddle that follows the ball exactly.

We’re not talking about hundreds or thousands of enemies on-screen at a time, which would be big by today’s standards. I only wanted dozens of enemies on-screen, which was still a pretty mighty task.

strandbeest
Derivatives of Position

I obviously needed to have a very simple “AI” for these enemies. The one I came up with is so simple I don’t even feel comfortable calling it an “AI”: Go Left. That’s all enemies do. They don’t shoot or track the player or anything. They do go faster as the game goes on though in order to add some escalating challenge to the player.

The way their speed works is a little bit interesting. Every enemy has a speed and an acceleration. The longer the enemy lives, the faster they go. So when they’re closer to the player they’re more of a threat. As the game goes on, I increase the acceleration to increase the stakes.

You might remember from physics that speed can otherwise be known as velocity, and acceleration is the first derivative of velocity. Since I’m now manipulating acceleration too, we’re now also dealing with the derivative of acceleration which is jerk. I really wanted to find a way to work in the forth, fifth, and sixth derivatives of position just because I liked their names, but couldn’t come up with any reasonable application.

A very obvious sacrifice I made for performance reasons is hopefully very apparent if you pay close attention to the enemies while playing my game: Only one ever moves at a time. Even with all the optimizations I made for my background bitmap drawing, it was still very expensive to re-draw an enemy every frame, much less several. The expense wasn’t necessarily in drawing the background, but the enemy itself. The enemy bitmaps are mostly non-transparent pixels which means that the optimizations used for the background are less applicable.

I could potentially afford to move 2 or 3 enemies at a time on-screen, but moving 1 enemy was working out and I didn’t have time to experiment with what the magic number of enemies moving was. I could also experiment with using the age-old trick of sprite flickering by only drawing moving enemies every other frame. On old consoles where this technique was actually employed, the flickering effect was actually less noticeable since it was well timed with the refresh rate of CRTs.

Terrain

As soon as I decided I was doing my own spin on Defender, I knew that I needed terrain. Even if it was only aesthetic and didn’t have any impact on gameplay, the homage wouldn’t be complete without it.

I implemented the terrain by storing x, y coordinate pairs in my Queue data structure. Every frame, I iterated over the points and moved them left while at the same time using EASy68K’s DRAWLINETO trap code to draw a line from the previous “pen” position, which should be at the previous terrain vertex, to the current terrain vertex. When a vertex moves left beyond the bounds of the screen by a certain threshold, I dequeue it and enqueue a new random point.

That’s all there is to it. This turned out to be very simple once the Queue was implemented. This was one of the last features I implemented and it only took an hour or two’s worth of work.

High Scores

The final cherry on top of this excellent arcade-style shooter would be a traditional 3-letter high-score screen. In order for scores to persist between play sessions I had to employ EASy68K’s file IO trap codes.

Rendering the high score screen was a bit more difficult than it might appear. EASy68K has handy trap codes for drawing text, but the more user-friendly ones involve moving a cursor along a 255x128 character display (similar to the curses library). Unfortunately, that would only work for me if I were using EASy68K’s original 640x480 resolution. Since my game was 1920x720, all the text would show up in the top left corner, not the center like I wanted.

So I had to build a multi-resolution terminal-emulator within my game within this 68000 emulator. Emulator-ception. I wrote a series of helper subroutines that allowed me to easily render strings to the screen anywhere I chose. The coolest part of this to me was how responsive typing in the player’s name was, with exciting features such as backspace and enter! I feel like I could go further with this and develop a full command-line.

Once a score was retrieved from the user, how do I decide if it’s worthy of residing among the high scores? Performance wasn’t absolutely critical in this section, so I decided to do something fun:

I load the high scores from a binary file into my Set data structure. I then insert the new player’s score into the Set. I then take the Set’s internal array and Bubble-Sort it. Finally, I write the top n-1 scores back to the file. If the user’s score wasn’t worthy, it will be at the end of the array. If it was, then someone’s old score has been pushed out and won’t be saved.

Writing Bubble-Sort in assembly was a pretty strange experience. It also enlightened me as to why it’s the slowest of all the \(O(n^2)\) sorts: it’s all about the memory. A more efficient algorithm such as Insertion-Sort requires substantially less memory accesses than Bubble-Sort. For a cache-less architecture like the original 68000, reducing the number of memory accesses is a big deal.


The final product came out better than I ever could have hoped. I was surprised at how quickly I took to assembly programming. This project made me wish I’d been born in 1960 because I bet I could have made a killing as a 68000 programmer.

You can find all the source code in this article on my repo.