Optimising animation based collision volumes
Last time we talked about how we can approximate objects with complex shapes using simpler ones for our game’s physics simulation.
Further, we saw how we can use an often already existing feature: a skeleton for animating sprites – or vertices of a 3d-mesh – to make our collision shapes change position, and even size, as our object deforms.
Today I want to look at how knowing about the behaviour of CPUs – and especially their caches and memory – we can use simple optimisations to implement collisions with such objects very efficiently.
Using multiple simple shapes to approximate a more complicated one gives us an easy way to significantly step up the fidelity of our physics, resulting in a much more realistic looking game. At the same time, we neither have to write complex – and often error prone – collision tests between complicated shapes.
Apart from resulting in a more bug-free program, this also means that we can optimise our collision testing much easier, to make sure that physics will not bottleneck our game’s performance.
The way we designed our system last time, we have a skeleton or tree of nodes for our object, where each node can have a collision radius, that creates a colliding circle (or sphere) around it. The same system works with any other shapes as well, but circles are simple and easy to think about, so we will stick with them.
At the same time, the same skeleton can be used for drawing, and can animate to make both visuals and physics more dynamic as we like.
To get to the meat of the matter, let us take a few things as a given.
I do not want to talk about the actual calculations of checking for collisions today, so we will assume that we have access to functions that check for collisions between any shapes we are interested in.
In fact, I am not even terribly interested in what we are trying to collide with our circles.
The main purpose of this post is to show how we can make the collisions with our circles as efficient as possible.
We will also not concern ourselves with the animation skeleton and its nodes too much. Instead we will assume that this system is already implemented reasonably efficient.
Straight forward implementation
Let us start with a simple and obviously correct implementation. Lets say we have an object that needs to check against collision of all circles of all animation skeleton nodes of a list of other objects.
A simple implementation might look something like this:
foreach (var thing in things)
foreach (var node in thing.Skeleton)
var circle = new Circle(
So far so good.
Using immutable structs for atomic values
One think you may notice is that in this code we are creating a new circle for every node, for every thing – and we are probably doing so every single frame as well.
Circle is a class, this will put a lot of pressure on our garbage collector. However, we can simply make it a structure – basically a container for the position and radius – which solves that problem. In fact, since a circle pretty much represents an atomic value – albeit more complicated than a single number – using an immutable structure instead of a class makes a lot of sense in the first place.
Pre-computing skeletal node parameters
Let us imagine the (very likely) case there is more than a single object trying to collide with this same list of other objects at the same time – maybe even hundreds of them.
In this case, we better hope that the
CurrentRadius properties of the node are not too complicated. In the worst case, they might have to recurse all the way up to the root of the animation tree and apply an offset and rotational transform for every single node on the way. That would be very slow and easily lead to terrible performance.
The solution to this is again quite simple: We only have to go through all nodes of the tree and calculate their absolute positions, rotations, and maybe scales once a frame, after moving forward in our animation, and maybe moving the object itself.
If we do this starting from the root of the tree, and then going down towards the leaves, this can easily be done in O(n) time, where n is the number of nodes in the tree. This is because each node can calculate its position from its own local offset and the global position of its parent node in constant time.
This is code for another post however. Let us assume that the nodes are implemented in just this way, so that getting their position and radius are constant time operations.
Storing pre-computed data in reference free arrays
Even constant time does not necessarily have to be fast however. We still have to enumerate all nodes, which are at best simple objects stored in a list or an array. This means that we still have to follow a reference for every single node we look at.
This may not sound like much and it usually is not a problem. But remember that this means that we have to follow O(n * m * k) references, where n is the number of nodes, m the number of objects we collide with, and k the number of objects checking for collisions. If we are dealing with hundreds of thousands of objects, this can be a significant problem.
We can eliminate all these reference resolutions – and many potential cache misses – by not only pre-computing the parameters of the nodes, but by also storing these parameters in a single piece of memory, instead of in many different objects.
We can do that by creating an array of circles after updating our animation for each object.
This would change our code as follows:
// in the object that we are going to collide with
this.AnimationCircles = new Circle[this.Skeleton.Count];
for (int i = 0; i < this.Skeleton.Count; i++)
var node = this.Skeleton[i];
this.AnimationCircles[i] = new Circle(
// in the object that is colliding
foreach (var thing in things)
foreach (var circle in thing.AnimationCircles)
As we can see, for a little bit of work in the animated object, our collision code has become even more straight forward.
More importantly, we now have to enumerate each node only a single time – and right after it was updated, so it probably still is in our CPU’s cache.
Granted, enumerating over the array of circle structs in our collision code still has the same big-O notation run-time. However, it has considerably better caching efficiency, since all the array’s data is laid out as a single connected string of bits in memory. In addition, we have to resolve a lot less references, creating a lot less machine code between enumerating the list of objects, and actually testing for collisions.
The only thing left to do is reusing the array of collision circle, instead of creating a new one every frame. This is almost trivially possible, since the number of nodes in our skeleton is unlikely to change in most games, so that the size of the array can also stay the same.
The code we ended up with in this post is essentially a simplified version of what we use in Roche Fusion with great success. It is not uncommon for hundreds of projectiles to be checking against collisions of hundreds of enemies, each with possibly multiple collision circles.
While physics can sometimes be a bottleneck in that game, these optimisations are a big part of why we can still run at good performance, even if the screen is full of bullets.
I hope this small example of optimising a specific algorithm has given you some useful insights, and maybe it even makes a good argument for why understanding the hardware we work on is an important skill for every programmer.
Feel free to drop me a comment if you have any questions, or if you have ever used a similar – or different – solution for the same problem.
Enjoy the pixels!
|Reference:||Optimising animation based collision volumes from our NCG partner Paul Scharf at the GameDev<T> blog.|