Boid Feeder 9 Billion and Clip Space Hell
I’ve been working on a game for awhile on a relatively underpowered game console called the Play Date. When I first learned about the playdate my girlfriend and I were in constant competition over a game called Devil Daggers. She’s since beat my high score soundly, but I made her a bet: that I could make a passable clone of it on the Play Date. The stakes were high: my pride was on the line. I had another motive of course: I thought it would be really cool to be able to aim horizontally using the crank while leaving the left and right buttons on the d-pad for strafing.
In order to pull this off, I would need to write a 3d rendering engine for the playdate. In addition to the renderer, I’d need to write a boid simulation, a collision engine, a player capable of aiming, jumping, and shooting. Each of these are an interesting challenge but none of them were novel to me in the way that a renderer is, I may cover these in the future however.
Any code I would write would have to be relatively fast because I want a lot of entities pretty much at all times, and I’m up against some pretty tight limitations. The playdate sdk assumes most games will be relatively simple 2D games written in Lua. The playdate hardware itself pretty much assumes this too, as there is no transformation hardware built in, not so much as a sprite rotater or scaler. Any kind of transformation on raw sprites is done serially, single threaded, on a CPU clocked at a pitiful 160 MHz. The Lua SDK actually provides libraries for doing some rudimentary 3D rendering, however it’s all written in Lua and painfully slow. Thankfully, there is a way to run native binaries on the playdate with the sdk. So after briefly flirting with Zig and failing to get any kind of allocation working, I put on my John Carmack hat and started writing a software rasterizer from scratch in C.
I’ve never made a rendering engine before. Thankfully though I have a history of creating multiplayer FPS games in various engines which has strenghtened my linear algebra skills to the point I felt confident I could pull this off. After all, what is a camera but a fancy inverse laser gun?
Spaces
Linear algebra is about a lot of things, but for game development most linear algebra is about coordinate spaces and how to move between them. There are many many many coordinate spaces that you can use. Traditionally in school we’re taught about two: polar coordinates \(<r, θ> \in \Bbb R^2\) and cartesian coordinates \(<x,y>\in \Bbb R^2\). Most of the logic in 3D games happens in the 3D extension of cartesian coordinates \(<x,y,z> \in \Bbb R^3\) known as world space. For example, the player in my game, every enemy, every bullet, and the points on the floor are all represented by points in world space. This 3D space is unbounded except by the floating point numbers you use to represent x, y, and z.
When you’re drawing stuff on the screen you need your coordinates in screen space though. Screen space coordinates are $$<x,y>$$ where x and y are positive integers. x and y are constrained to the range
$$x \in [0, SCREEN\_WIDTH]$$ $$y \in [0, SCREEN\_HEIGHT]$$
\(<0,0>\) represents the top left corner, and \(<SCREEN\_WIDTH, SCREEN\_HEIGHT>\) represents the bottom right corner.
Rendering in games is then, essentially a function that maps points in this unbounded 3D space to points in this bounded 2D space, followed by rasterization. It just so happens that functions on points are exactly what matricies are.
Transformation Matrices
Matrices can be thought of in a couple ways. In one sense they are big grids of numbers, like a 2D array of floats:
$$ M = \begin {bmatrix} 1 & 0 \\ 0 & 1 \\ \end{bmatrix} $$
This matrix has 2 rows and 2 columns so we call it a 2x2 matrix.
Points can be thought of as single column matrices, also known as vectors, often annotated with an arrow, (or a hat if their length = 1)
\[ <x, y ,z> = \begin {bmatrix} x \\ y \\ z \\ \end {bmatrix} \]
$$ \overrightarrow v = \begin {bmatrix} 3.6 \\ -4.2 \\ \end {bmatrix} $$
$$ \hat x = \begin {bmatrix} 1 \\ 0 \\ \end {bmatrix} $$
Many operations work on vectors. For example: vector addition
\[ \begin {bmatrix} x_1 \\ y_1 \end {bmatrix} + \begin {bmatrix} x_2 \\ y_2 \end {bmatrix} = \begin {bmatrix} x_1 + x_2 \\ y_1 + y_2 \end {bmatrix} \]
You can also subtract them, negate them, get their length, and scale them by a constant factor.
However sometimes we want to do more complex operations, like scale their coordinates by different values, or rotate them about an axis. For this purpose, matrices act less like grids of numbers and more like functions. When you view a matrix as a function, it’s called a transformation, and is based on matrix * vector multiplication:
\[ T_M(\overrightarrow p) = M \times \overrightarrow p \]
The nitty gritty of how matrix multiplication works is of interest to math students and matrix algebra library implementers, but these are the general properties we’re interested in:
-
Juxtaposition implies multiplication \[ A \times B = AB \]
-
You can only multiply vectors that have the same dimension as the number of columns in the matrix \[ \checkmark \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ 3 \\ \end{bmatrix} \]
\[ \unicode{x2718} \begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ \end{bmatrix} \] \[ \checkmark \begin{bmatrix} 1 & 0 \\ 0 & 2 \\ 0 & 0 \\ \end{bmatrix} \begin{bmatrix} 1 \\ 2 \\ \end{bmatrix} \] (though for the sake of 3D graphics you usually have \(n \times n\) matrices, \(n \times m\) matrices are rare in computer graphics.)
-
The vector must be to the right of the matrix \[ \checkmark M \overrightarrow p \] \[ \unicode{x2718} \overrightarrow p M \]
-
If you apply multiple transformations they are done in right to left order: \[ A B \overrightarrow p = A (B \overrightarrow p) \]
-
Multiplying transformation matrices is the same as function composition \[ T_A(\overrightarrow p) \circ T_B(\overrightarrow p) = T_A(T_B(\overrightarrow p)) = A B \overrightarrow p \]
-
Matrix multiplication is not commutative \[ AB \neq BA \]
-
It is however, associative \[ ABC = (A \times B) \times C = A \times (B \times C) \]
-
The inverse of a matrix undoes the the matrix \[ M^{-1} M \overrightarrow p = \overrightarrow p \]
-
The product of inverses is the same as the inverse of their product
\[ A^{-1}B^{-1} = (AB)^{-1} \]
These functions allow you to modify the coordinate system that the vectors are in. In 3D here’s a few useful matrices:
-
Zero - the zero matrix collapses all finite points to the origin $$ \begin{bmatrix} 0 & 0 & 0\\ 0 & 0 & 0\\ 0 & 0 & 0\\ \end{bmatrix} \overrightarrow p = \overrightarrow 0 $$
-
Identity - does nothing $$ Identity(\overrightarrow p) = \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1\\ \end{bmatrix} \overrightarrow p = \overrightarrow p $$
-
Scaling - scale a vector by a number $$ Scale(s, \overrightarrow p) = \begin{bmatrix} s & 0 & 0 \\ 0 & s & 0 \\ 0 & 0 & s \\ \end{bmatrix} \overrightarrow p = s \overrightarrow p $$
-
Rotations - rotates about an axis $$ Rotate_X(\theta, \overrightarrow p) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & cos \theta & -sin \theta \\ 0 & sin \theta & cos \theta \\ \end{bmatrix} \overrightarrow p $$ $$ Rotate_Y(\theta, \overrightarrow p) = \begin{bmatrix} cos \theta & 0 & \sin \theta \\ 0 & 1 & 0 \\ -sin \theta & 0 & cos \theta \\ \end{bmatrix} \overrightarrow p $$ $$ Rotate_Z(\theta, \overrightarrow p) = \begin{bmatrix} cos \theta & -sin \theta & 0 \\ sin \theta & cos \theta & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} \overrightarrow p $$
Notice how these are simply the 2D rotation matrix just rearranged into 3D. $$ Rotate_{2D}(\theta, \overrightarrow p) = \begin{bmatrix} cos \theta & -sin \theta \\ sin \theta & cos \theta \\ \end{bmatrix} \overrightarrow p $$ (For the record I do NOT have these memorized and in fact look them up every time from the greatest academic source of all time). By composing these you can rotate in any axis you want: $$ Rotate_{ZYX}(\theta _x, \theta _y, \theta _z, \overrightarrow p) = Rotate_Z(\theta_z, Rotate_Y(\theta_y, Rotate_X(\theta_x, \overrightarrow p))) $$
These operations are all pretty powerful, but they’re not enough to define our camera fully. Matrices are quite capable of a lot of things, however they can only represent linear transformations within their coordinate space. A linear transformation cannot move the origin or cause parallel lines to converge, those are non-linear operations. Unfortunately our 3D camera relies on two non-linear transformations on 3D points in order to represent our points properly: Translation and Perspective. The translation transformation slides all points within a space to another location, and because this includes the origin the transformation is not linear. You can do that without matrices using vector addition:
\[ Translate(\overrightarrow{slide}, \overrightarrow p) = \overrightarrow{slide} + \overrightarrow p \] Perspective projection is what gives our objects the appearance of being close or far by dividing the x and y coordinates by their distance. This is a nonlinear transformation because it takes parallel lines and causes them to converge at the origin: \[ Perspective(\overrightarrow p) = \begin{bmatrix} p_x / p_z \\ p_y /p_z \\ p_z \end{bmatrix} \]
Note that this transformation gets us close to screen coordinates but not quite, it actually puts us into normalized device coordinates, or NDC space, where the bottom left corner of the screen is (<-1, -1, depth>), and the top right corner of the screen is (<1, 1, depth>)
The thing is, we really want these to be matrices. The reason is that we want our camera to be a single transformation matrix, as it has to run for every single position of every point in the game, and our CPU is pitiful. If these were matrices then we could do all of the work for every single point in a single matrix multiplication, by utilzing the associative property.
Camera Function
We want our camera function to allow us to premultiply all our transformation matrices. The camera in my game has 3 essential properties: position, crank angle, and sway. In order to map any world coordinate to the screen we need to do the following in order:
- translate all points to be relative to the position of the camera
- rotate all points to be relative to the camera’s facing angle around the up axis (which has been chosen as +Y arbitrarily)
- rotate all points to be relative to the camera’s sway around the forward axis (which has been chosen as +Z arbitrarily)
- apply a Perspective transformation to all points
The order of these is essential. If we do them in the wrong order then bad things will happen™ and we won’t be in the correct coordinate space to draw stuff to the screen.
Relative should make you think “inverse”. For simple arithmetic, if I want to get the distance between two numbers it’s simple subtraction:
\[ a Relative To B = a - b \]
Subtraction is of course, the inverse of addition. In this case we want to do the inverse of translation, and the inverse of rotation:
- \(relative\_point = Translate^{-1}(camera.position, world\_point)\)
- \(rotated\_point = Rotate_Y^{-1}(camera.crank\_angle, relative\_point)\)
- \(swayed\_point = Rotate_Z^{-1}(camera.sway\_angle, rotated\_point)\)
- \(ndc\_point = Perspective(swayed\_point)\)
We will then take our ndc_point and remap it to the pixels on our screen.
Doing all this math gets expensive in terms of CPU cycles. Matrix multiplication is not cheap which is usually why GPUs are used for this purpose. However, note how for every world point, the position of the camera, the look angle, and sway is constant. This means the matrix that makes up their transformation functions are constant each frame. If translation and perspective were matrix multiplications rather than bespoke functions we could premultiply a single matrix for the entire frame saving tons of CPU cycles, like this:
\[ world\_to\_ndc(\overrightarrow w) = PR_z^{-1}R_y^{-1}T^{-1}\overrightarrow w \]
That’s a lot of inverses, and taking an inverse is kind of expensive. We can do better: \[ world\_to\_ndc(\overrightarrow w) = P(R_zR_yT)^{-1}\overrightarrow w \] Now every frame we just calculate one world_to_ndc matrix and use it on every point: \(P(R_zR_xT)^{-1}\)
As of present, we can’t do that though because perspective and translation are non linear operations in 3D coordinates. If only we had a coordinate system where this was not the case…
Homogenous Coordinates
There is a coordinate system where this is not the case. Points in homogenous coordinates have an extra coordinate w. \[ \begin{bmatrix} x \\ y \\ z \\ w \end{bmatrix} \]
To convert from a point in 3d space is pretty straightforward
\[ convert(\overrightarrow p) = \begin{bmatrix} p_x \\ p_y \\ p_z \\ 1 \end{bmatrix} \]
To go back to 3d space you simply divide by w \[convert^{-1}(\overrightarrow h) = \begin{bmatrix} h_x/h_w \\ h_y/h_w \\ h_z/h_w \end{bmatrix}\]
To convert from a 3d transformation matrix to a homogenous transformation matrix is also straightforward:
\[ convert(\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \\ \end{bmatrix}) = \begin{bmatrix} a & b & c & 0 \\ d & e & f & 0 \\ g & h & i & 0 \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \]
What are all these extra coordinates about? Well you can look at homogenous transformation matrices as an extension of their 3D counterpart, where the 3x3 matrix part still encodes for rotation and scaling, but the lower portion encodes for perspective, and the right portion encodes for translation:
\[ \begin {bmatrix} rs_{x0} & rs_{y0} & rs_{z0} & t_x \\ rs_{x1} & rs_{y1} & rs_{z1} & t_y \\ rs_{x2} & rs_{y2} & rs_{z2} & t_z \\ 0 & 0 & p & 1 \end {bmatrix} \]
those extra 0s are unused in most rendering systems, but they do technically let you do perspective in axes that aren’t Z. This is generally not desired behavior and looks weird
This is maybe a little clearer if we look at raw translation and perspective matrices:
\[ T(\overrightarrow t) = \begin {bmatrix} 1 & 0 & 0 & t_x \\ 0 & 1 & 0 & t_y \\ 0 & 0 & 1 & t_z \\ 0 & 0 & 0 & 1 \\ \end{bmatrix} \]
\[ P = \begin {bmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 \\ \end {bmatrix} \]
The vector and matrix library that I’m using, gb_math, has a somewhat more complex definition for the perspective matrix that takes into account field of view, as well as a near and far plane:
\[ P(fov, aspect, near, far) = \begin{bmatrix} \frac{1}{{aspect * tan(\frac{1}{2} fov)}} & 0 & 0 & 0 \\ 0 & \frac{1}{{tan(\frac{1}{2} fov)}} & 0 & 0 \\ 0 & 0 & -\frac{far + near}{far - near} & -2 \frac{far \times near}{far - near} \\ 0 & 0 & -1 & 0 \end{bmatrix} \]
This is similar to the matrix above but contains some scaling and translation realting to fov and the near/far planes respectively.
Now that we can do perspective, translation, and rotation in homogeonus coordinates, we can precompute our entire camera projection matrix every frame. This small amount of work up front saves a ton of work in the long run and will give our poor CPU a rest. Our function to render all the enemies in our game now looks like this:
for every object in the game
get the position of the object
convert the position from world coordinates to homogenous coordinates by setting w = 1
apply the camera transform to the homogenous coordinate
go from homogenous coordinates to normalized device coordinates by dividing by w
go from normalized device coordinates to pixels on the screen
???
have picture
Drawing to the screen
We now have a bunch of points in normalized device coordinates, or NDC space. What we need now are pixels on a screen. The playdate is much too slow to render every single enemy using polygons, so all we have is their position information. Thankfully we can use an old technique called billboarding, where we prerender all of our enemies from every possible viewing angle. That’s not good enough though, because the playdate is also very slow at scaling sprites. So let’s trade CPU cycles for RAM by rendering out every single angle and distance combination for every entity in the game. Given that playdate images are 1 bit, it’s only a megabyte or two, and we have 16 MB of RAM and 2GB of storage space, so this is mostly fine.
I want to draw these billboards on the screen so that far billboards are never drawn over near billboards. The playdate, being a 2D console, is lacking something called a Z buffer. On modern 3D graphics pipelines every single pixel has a Z depth in addition to color. We could implement our own sprite drawing functions but I have my doubts I could out optimize the one already in the SDK. Also, the playdate struggles enough with setting bits on screen with the ordinary drawing function, adding an entire z buffer on top of setting every pixel is likely much too slow if we want a reasonable framerate AND hundreds of entities. Luckily there exists an old way of doing this that doesn’t require a Z buffer called the painters algorithm:
projected_points = map camera_projection billboard_points
ndc_points = map divide_by_w projected_points
sorted_points = sort ndc_points from farthest to closest
for each point in sorted_points
sprite = get_sprite_by_distance_and_angle sorted_point
draw sprite at ndc2screen(point)
Note that for the sake of brevity I omitted the angle calculation.
This ndc2screen(point) function isn’t too hard to write. You can do it as a matrix if you desire, but it essentially amounts to this:
ndc2screen (x, y, z) = (
(x + 1) * SCREEN_WIDTH / 2,
(y + 1) * SCREEN_HEIGHT / 2
)
This makes it so we draw our closer sprites on top of our further sprites. It doesn’t work for things that aren’t point like though it’s good enough for our purposes. We have one issue though, as written this tries to draw every single entity, whether it’s outside our field of view, or behind us. If it’s behind us that’s extra bad, because they’ll render as though they’re in front of us, but mirrored across the x and y axes. This brings me to clipping.
Clip Space
Clipping billboards is relatively straightforward. Once we apply our perspective transform and divide by w, our points are in NDC space. NDC space is nice, because it’s bounded by a left plane \(x > -1\), a right plane \(x < 1\), a top plane \(y < 1\), a bottom plane \(y > -1\), a near plane \(z > 0\) and a far plane \(z < 1\). We just need to throw out all the points that are outside of thes boundary.
in_bounds p = p.x > -1 && p.x < 1 && p.y > -1 && p.y < 1 && p.z > 0 && p.z < 1
clip_points ndc_points = filter in_bounds ndc_poinds
However there is an issue with doing this inside of ndc_points that may not be apparent immediately, and it has to do with dividing by w. The w and z coordinates always have the same sign, which means that by doing this
homogenous2ndc h = (h.x/h.w, h.y/h.w, h.z/h.w)
We lose the sign of z. Now we can’t tell whether the enemies are in front of us or behind. This means we need to do our clipping in a coordinate space called “clip space” (shocker). Clip space is the space we’re in after applying the Camera transform but before dividing by w. We can modify our clipping algorithm to accommidate for this:
in_bounds p = p.x > -p.w && p.x < p.w && p.y > -p.w && p.y < p.w && p.z > 0 && p.z < p.w
Now we just need to modify our painter’s algorithm a little:
clip_space_points = map camera_projection billboard_points
clipped_points = filter in_bounds clip_space_points
ndc_points = map divide_by_w projected_points
sorted_points = sort ndc_points from farthest to closest
for each point in sorted_points do
sprite = get_sprite_by_distance_and_angle point
draw sprite at ndc2screen(point)
My pseudocode is all a little abstract, let’s switch to C:
void draw_boids(world_t* world, camera_t* camera, PlaydateAPI* pd) {
pd->graphics->pushContext(NULL);
// painter's algorithm!
billboard_t* billboards = pd->system->realloc(NULL, sizeof(billboard_t) * (BOID_MAX + BULLET_MAX));
int num_live = 0;
for (int i = 0; i < BOID_MAX; i++) {
if (world->boids[i].alive) {
apv_to_billboard(
&billboards[num_live],
&world->boids[i].apv,
camera,
BB_ENEMY,
BOID_SPRITE_ANGLES,
BOID_SPRITE_DISTANCES
);
billboards[num_live].from_index = i;
num_live++;
}
}
for (int i = 0; i < BULLET_MAX; i++) {
if (world->bullets[i].alive) {
apv_to_billboard(
&billboards[num_live],
&world->bullets[i].apv,
camera,
BB_BULLET,
BULLET_SPRITE_ANGLES,
BULLET_SPRITE_DISTANCES
);
billboards[num_live].from_index = i;
num_live++;
}
}
qsort(billboards, num_live, sizeof(billboard_t), cmpboids);
for (int i = 0; i < num_live; i++) {
billboard_t billboard = billboards[num_live - i - 1];
if (
billboard.center.x >= -billboard.center.w &&
billboard.center.y >= -billboard.center.w &&
billboard.center.z >= 0 &&
billboard.center.x <= billboard.center.w &&
billboard.center.y <= billboard.center.w &&
billboard.center.z <= billboard.center.w
) {
gbVec3 center;
gbVec3 edge;
gb_vec3_div(¢er, billboard.center.xyz, billboard.center.w);
float center_x = (center.x + 1) * (SCREEN_WIDTH / 2) + SCREEN_ORIGIN_X;
float center_y = (center.y + 1) * (SCREEN_HEIGHT / 2) + SCREEN_ORIGIN_Y;
int z = (center.z + 1) * (SCREEN_FAR - SCREEN_NEAR) / 2 + SCREEN_NEAR;
int dist_index = billboard.distance_index;
int rot_index = billboard.rotation_index;
pd->graphics->pushContext(NULL);
switch (billboard.sprite_subtype) {
case BB_BULLET:
pd->graphics->setDrawMode(kDrawModeInverted);
pd->graphics->drawBitmap(bullet_sprites[dist_index][rot_index], center_x - 128, center_y - 128, kBitmapUnflipped);
break;
case BB_ENEMY:
if (dist_index <= BOID_SPRITE_WHITE_INDEX) {
pd->graphics->setDrawMode(kDrawModeFillWhite);
dist_index = dist_index > BOID_SPRITE_MIN_INDEX ? dist_index : BOID_SPRITE_MIN_INDEX; //max(BOID_SPRITE_MIN_INDEX, dist_index);
}
if (world->boids[billboard.from_index].hit) {
pd->graphics->setDrawMode(
((int)(world->boids[billboard.from_index].death_timer * 30) % 2) == 0
? kDrawModeXOR
: kDrawModeFillWhite
);
}
pd->graphics->drawBitmap(boid_sprites[dist_index][rot_index], center_x - 128, center_y - 128, kBitmapUnflipped);
break;
}
pd->graphics->popContext();
}
}
pd->graphics->popContext();
pd->system->realloc(billboards, 0);
// debug draw stuff
char* str;
pd->system->formatString(&str, "%d", num_live);
pd->graphics->drawText(str, strlen(str), kASCIIEncoding, 0, 32);
pd->system->realloc(str,0);
}
Polygon Clipping and the Sutherland-Hodgman Algorithm
If all I wanted to have in my game was billboard’s I’d be golden. That’s a big if. I want a floor in my game, and I only need one of them. Without a floor it’s really hard to tell that you’re moving at all, there’s simply no frame of reference if the player is just floating around in a void.
Drawing polygons is seemingly pretty straightforward at first, apply the camera projection to every point in the polygon, convert ndc to screen space, and flood fill. Unfortunately for us, flood filling gets confused unless the polygon is entirely contained on screen. Also, similar to before if a point is behind you it will appear to be in front of you but mirrored in the x and y axes. This causes an hourglass like shape to be drawn instead of the trapezoid that should be drawn, and it’s actually extremely common in my game for some points on the floor to be behind you and some points to be in front of you.
We can’t use the clipping algorithm from before either. This is because we can’t always delete points that are outside the bounding volume. If we did, the floor may appear more like a triangle or a line segment than a square.
Instead we need to use a polygon clipping algorithm that can create additional points on the edges of the bounding volume. There’s several good ones but the most intuitive one to me is called the Sutherland Hodgman algorithm. Ordinarily this is implemented by your GPU vendor, but unfortunately as my GPU vendor is “None” I need to write this myself. It’s hard to find good resources on this algorithm, because I find explanations of it either incomplete or overly complex. An issue I have with many implementations of it available online is that they are pre-optimized, and are often written in terms of coordinates rather than vectors, treating each coordinate as a seperate case.
Optimization is great and all, but prematurely optimizing is a deadly sin of programming, and I really want to understand what’s going on with it, so I implemented it using the plainest code I possibly can:
sutherland_hodgman polygon_in = do
polygon_out = empty
polygon_working = copy polygon_in
for plane in clipping planes do
polygon_out = empty
for point_index in polygon_working.point_count do
p0 = polygon_working.points[point_index]
p1 = polygon_working.points[(point_index + 1) % polygon_working.point_count]
if !point_inside_plane plane p0 && !point_inside_plane plane p1 then
-- do nothing (delete the point)
if !point_inside_plane plane p0 && point_inside_plane plane p1 then
add_point polygon_out (plane_line_intersection plane p0 p1)
if point_inside_plane plane p0 && !point_inside_plane plane p1 then
add_point polygon_out p0
add_point polygon_out (plane_line_intersection plane p0 p1)
if point_inside_plane plane p0 && point_inside_plane p1 then
add_point polygon_out p0
polygon_work = polygon_out
Note that this algorithm doesn’t care whether you clip in NDC space or in clip space, just like before. However, just like before when you go from clip space to ndc space you lose the sign of Z and cannot clip against the near plane, again resulting in mirrored points from behind yourself.
This is where I got stuck, however. I personally have a hard time visualizing homogenous coordinates since they take place in a pseudo 4D space. The part of the algorithm I was having trouble with was the interactions between the points and the planes. I found it very difficult to locate a clear explanation of plane-line intersections in homogenous coordinates. I couldn’t even figure out how planes were represented in homogenous coordinates. I got stuck on this for about a week, and ended up shelving the project for an extended period of time, ready to accept my defeat at the hands of this algorithm.
Nearly a year later I got the itch. I couldn’t leave something so close to completion unfinished. I needed to wrap my head around homogenous coordinates, even if it killed me. I pored through as many resources for this algorithm as I could, until through careful study of various blogs, university power point slides, stack overflow posts, dark scriptures, bohdi meditations, and Wikipedia articles, I discovered the embarrasingly simple formulas that had me stuck for so long:
\[ clippingPlanes = \set{ \begin{bmatrix}1 \\ 0 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix}-1 \\ 0 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix}0 \\ 1 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix}0 \\ -1 \\ 0 \\ 1 \end{bmatrix}, \begin{bmatrix}0 \\ 0 \\ 1 \\ 0 \end{bmatrix}, \begin{bmatrix}1 \\ 0 \\ -1 \\ 1 \end{bmatrix} } \]
\[ pointInsidePlane(plane, p) = plane · point > 0 \]
\[ planeLineIntersection(plane, p_0, p_1) = lerp(p_0,p_1,\alpha) \]
\[ where \]
\[ \alpha = d_0 / (d_0 - d_1), \]
\[ d_0 = p_0 · plane, \] \[ d_1 = p_1 · plane, \] \[ lerp(p_0, p_1, \alpha) = \alpha * (p_1 - p_0) + p_0 \]
· being the dot operator
After kicking myself for not seeing the simplicity for so long, I wrote a few dozen lines of suprisingly simple C.
typedef struct polygon_clipspace_t {
gbVec4* points;
int point_count;
int point_space;
} polygon_clipspace_t;
// Implementation of polygon_clipspace_t's vector like functions are ommitted for brevity
gbVec4 clipping_planes[6] = {
{1, 0, 0, 1},
{-1, 0, 0, 1},
{0, 1, 0, 1},
{0, -1, 0, 1},
{0, 0, 1, 0},
{0, 0, -1, 1}
};
bool point_inside_plane(gbVec4 plane, gbVec4 point) {
return gb_vec4_dot(plane, point) > 0;
}
gbVec4 plane_line_intersection(gbVec4 plane, gbVec4 l0, gbVec4 l1) {
float l0_dot_plane = gb_vec4_dot(l0, plane);
float l1_dot_plane = gb_vec4_dot(l1, plane);
float alpha = l0_dot_plane / (l0_dot_plane - l1_dot_plane);
gbVec4 out;
gb_vec4_lerp(&out, l0, l1, alpha);
return out;
}
// Takes a polygon in clip space (homogenous coordinates)
// Clips it to be within -1 to 1 on most axes
// though Z is clippied to 0, 1
polygon_clipspace_t polygon_clip(PlaydateAPI* pd, polygon_clipspace_t polygon_in) {
// maybe make this static
polygon_clipspace_t polygon_out = polygon_new_empty(pd, polygon_in.point_space);
polygon_clipspace_t polygon_work = polygon_copy(pd, &polygon_in);
for (int plane_index = 0; plane_index < 5; plane_index++) {
polygon_empty(&polygon_out);
gbVec4 plane = clipping_planes[plane_index];
for (int line = 0; line < polygon_work.point_count; line++){
gbVec4 line_start = polygon_work.points[line];
gbVec4 line_end = polygon_work.points[(line + 1) % polygon_work.point_count];
gbVec4 intersection;
bool inside_start = point_inside_plane(plane, line_start);
bool inside_end = point_inside_plane(plane, line_end );
bool inserting = false;
if (!inside_start && !inside_end) {
}
else if (!inside_start && inside_end) {
polygon_push_point(pd, &polygon_out, plane_line_intersection(plane, line_start, line_end));
}
else if ( inside_start && !inside_end) {
polygon_push_point(pd, &polygon_out, line_start);
polygon_push_point(pd, &polygon_out, plane_line_intersection(plane, line_start, line_end));
}
else {
polygon_push_point(pd, &polygon_out, line_start);
}
}
polygon_overwrite_with(pd, &polygon_work, &polygon_out);
}
polygon_free(pd, &polygon_work);
return polygon_out;
}
void draw_floor(camera_t* camera, PlaydateAPI* pd) {
polygon_clipspace_t polygon_ndc_in = polygon_new_empty(pd, FLOOR_POINT_COUNT);
for(int i = 0; i < FLOOR_POINT_COUNT; i++) {
gbVec4 p;
gbVec4 clip_space;
gbVec3 ndc_space;
p.xyz = floor_points[i];
p.w = 1;
gb_mat4_mul_vec4(&clip_space, &camera->projection, p);
polygon_push_point(pd, &polygon_ndc_in, clip_space);
}
polygon_clipspace_t polygon_ndc_out = polygon_clip(pd, polygon_ndc_in);
int screen_points[FLOOR_POINT_COUNT * 2][2];
for (int point = 0; point < polygon_ndc_out.point_count && point < FLOOR_POINT_COUNT*2; point++) {
gbVec4 clipped_point_prediv = polygon_ndc_out.points[point];
gbVec3 clipped_point = clipped_point_prediv.xyz;
gb_vec3_diveq(&clipped_point, clipped_point_prediv.w);
screen_points[point][0] = (clipped_point.x + 1) * (SCREEN_WIDTH / 2) + SCREEN_ORIGIN_X;
screen_points[point][1] = (clipped_point.y + 1) * (SCREEN_HEIGHT / 2) + SCREEN_ORIGIN_Y;
}
pd->graphics->fillPolygon(
polygon_ndc_out.point_count, screen_points, floor_pattern, kPolygonFillEvenOdd
);
for (int point = 0; point < polygon_ndc_out.point_count && point < 64; point++) {
pd->graphics->drawLine(
screen_points[point][0],
screen_points[point][1],
screen_points[(point + 1) % polygon_ndc_out.point_count][0],
screen_points[(point + 1) % polygon_ndc_out.point_count][1],
2,
kColorBlack
);
}
polygon_free(pd, &polygon_ndc_in);
polygon_free(pd, &polygon_ndc_out);
}
Ultimately what largely tripped me up is that the majority of resources implementing the Sutherland Hodgman algorith are pre-optimized, or focus on one coordinate at a time rather than framing things in the language of linear algebra. Many of them use bitmasks and simplified dot products, which frankly should be left as an exercise to the reader, because without them the algorithm is very simple.
With the largest technical hurdles of this game complete, I was able to focus on the actual gameplay elements, tighten up controls, add scoring, etc. You know, normal game developer stuff rather than arcane wizard graphics programming that’s normally handled by your GPU. Now I have a real game on a crazy handheld and am planning on releasing it soon.