On Collections

Paul ScharfApril 22nd, 2015Last Updated: April 21st, 2015

0 37 9 minutes read

Today I would like to talk a bit (though looking at my notes, probably more than a bit) about collections, collections as in lists of things.

Specifically, I would like to discuss the implementation of such collections in games, and some problems that occur due to the requirements of game development. I will go through some of the solutions I have seen and used myself to solve those problems, and in the end propose, implement and test a new solution I only designed recently.

Why do we care?

Pretty much any game needs lots of lists of all sorts of objects. Be it players, enemies, bullets, or whatever else. One common approach is to keep a main collection of all game objects that need to be updated every frame, and then loop over and update them in the game’s update loop. Sometimes this main collection is split up into multiple collections if the update order of groups of object is critical. For example, enemies may need to be updated before bullets to make sure hit detection works properly. Another reason for splitting up collections may also be better cache performance by keeping objects that share or have the same code closer together.

Apart from this one, or these few main collections, games often keep lists of subsets of objects. These lists can be useful at the global level, for example if one needs to loop over all enemies, and enemies only. Having to loop over all game objects and filtering which are enemies can waste a lot of time, especially for small subsets of objects.

Further, any game object may be interested in keeping its own subset of game objects, for example an enemy might keep a list of players it has seen, or a homing projectile might keep a list of enemies to follow or ignore.

An important feature, or rather requirement for collections in games is that they have to be mutable. In most games, objects are created and deleted all the time, and collections have to update accordingly.

Unmanaged considerations

In unmanaged languages, like C++, memory management can be all but trivial, and allocating space for newly created game objects can be costly. Thus, game engines, as well as other systems written in C++ often focus on reusing memory. Further, one has to be careful to maintain the integrity of pointers between objects.

While researching this post, I found quite a number of clever solutions to these problems. For example, a simple approach is to keep actual arrays of game objects, instead of arrays of pointers to objects, and simply ignore those that are deleted while iterating. One can then keep track of unused indices for reuse. This also maintains pointers (as indices into the list) for as long as objects stay alive, though one has to deal with the special case of a new object replacing an old one.

There are several other things to consider in an unmanaged approach, but instead of exploring those, we will instead look into the same problem from a managed perspective.

Advantages of managed environments

One key advantage of a managed runtime like that of C# is that allocation of new objects is virtually free. Further, cleaning up is done automatically by the garbage collector, which takes comparatively little amortised time. One catch, and this is a key problem of managed game development, is that the actual runtime of a garbage collection may take several milliseconds or longer. How to prevent, or at least minimize that problem, is a topic for another post however.

Another advantage is that managed code natively uses references instead of pointers, which always remain consistent.

Why we can’t have mutability

One big problem to solve, in both managed and unmanaged environments, is how to add and remove objects from collections. While this may seem trivial at first, running the following C# code shows us one of the first problems here:

 var list = new List<object> { new object() };
// will throw System.InvalidOperationException because
// "Collection was modified; enumeration operation may not execute."
foreach(var obj in list)
{
    list.Add(new object());
}

Modifying the collections of C# while iterating over them is not allowed, and for good reasons. For example, in a list with a backing array, removing an element will shift all elements behind it, and invalidate their indices. In a linked list, removing an element modifies all sorts of pointers between different nodes, and an iterator might get lost completely.

Adding elements to the end of a list is less problematic and in principle both of these problems can be circumvented by special cases, but this would make the code of both the iterator and the collection much more complicated.

How we can have mutability

Things I have seen people do

There are a lot of possible ways of circumventing the problems above.

For example, one can make a copy of the entire game object list every frame, and only modify that copy, while iterating over the original list. While this may seem inefficient, it does not increase the runtime in asymptotic notation, and only uses double the memory for a list of references, which will not be a lot compared to the actual objects in the first place.

Of course, using this approach means that objects can only be really deleted at the end of a frame, unless we do additional bookkeeping to ignore deleted objects during iteration.

While this works well for one or a few main lists of objects, the more lists there are, the more complicated it becomes to keep track of them, make copies of all of them, etc.

Another proposed solution is to only mark objects as deleted and simply delete them at the end of the frame in a separate pass. This removes the need for separate lists, but again is problematic if many subsets of game objects are stored in separate lists.

One low level solution I have seen used several times is iterating over the main lists of game objects from back to front. This way, objects already passed can easily be deleted without skipping indices, and new objects can easily be added to the end of the list without any ill effects. However, this approach is again not very suitable for having objects in a lot of different collections.

What I used to do

As the above examples show, the key problem to solve is how to be able to keep objects in both a few big, as well as many smaller lists, and still be able to add new objects and delete old ones easily.

A somewhat trivial solution would be for every objects to keep track of what lists it is in, and upon deletion delete itself from all of them. Removal from a list with a backing array takes linear time however, so it would be more efficient to use linked lists, and have each object keep track of the linked list nodes it is associated with.

This is in fact what I have been doing for most of my projects over the last few years. Especially when an object is in a finite and known number of specific subsets, it is easy to keep track of them and remove the linked lists node after deletion.

Here is some actual (though simplified) code from Roche Fusion:

 class Enemy : GameObject
{
    private readonly LinkedListNode<Enemy> node;
    public Enemy(GameEnvironment environment)
        : base(environment)
    {
        this.node = environment.Enemies.AddLast(this);
    }
    override protected void onDelete()
    {
        this.environment.Enemies.Remove(this.node);
    }
}

What is important however is that the object can never be deleted while anything is iterating over any of the subset lists that the object may be in. That would cause the same exception as mentioned above. An easy solution is to only allow objects to delete themselves. That way they only have to make sure not to delete themselves while iterating.

For the main collection containing all game objects, I used my own version of a linked list, which behaves similarly to regular linked lists, except for allowing addition and removal while iterating. This makes the code of the iterator slightly more complicated, while making the collection easier to use.

Ultimately, I do not consider the data structure used for the main collection to be of great interest however, since one rarely loops over it more than once or twice per frame.

A proposal

This approach above has one major caveat. During the development of Roche Fusion, we had several bugs caused by forgetting to delete an object from one of the collections it was in, or by deleting them from some place other than their own update method.

These are in principle easy to avoid, and there are even solutions making them almost impossible. For example, one can make the delete method protected and subscribe to deletion events to automatically remove the objects from all collections they are in.

However, I wanted to find a different solution for this post. I wanted to write a collection with the following features:

Objects can be added while iterating, even with multiple iterators.
Objects that have been deleted are ignored while iterating, even with multiple iterators.
Objects that have been deleted are automatically removed from the list.
Iterating should not be considerably slower than with C#’s linked list, which is what I used for this purpose until now.
If possible, I would like to use a backing array instead of a linked list structure. This saves a small amount of space, does not require additional objects for linked list nodes, and might improve cache efficiency.

Implementation

I would like to think I succeeded to fulfil all these requirements with a fairly basic idea. In essence my solution is a wrapper around a C# list with the following additional properties and behaviour:

The list can contain only reference types (no value types) that inherit from the interface IDeletable, which has a single property bool Deleted { get; }.
The IEnumereator of the collection ignores entries of the backing list that are null. It further checks if an object is deleted using the interface’s property, and if so informs the collection about it, and also ignores that entry.
When the collection is informed that an entry has been deleted it sets that entry to null to prevent future checks, and decreases its approximate count property.
The collection keeps track of how many iterators there are for it. To make this possible, the IEnumerator informs the list when it is disposed, which happens at the end of a foreach loop.
When there is no enumerator left and the collection’s approximate count becomes smaller than the lists Count property by a significant factor, it compacts the backing list by removing all entries that are null or have been deleted. While this is a linear operation, it is only executed automatically after iteration, which is also linear, so it does not increase the runtime complexity.

You can take a look at the code for this collection (as well as the mutable linked list mentioned above) on GitHub. It is published under the MIT license, so please feel free to use it, as well as any other parts of the utility library which it is part of, if you like.

Performance tests

Regarding my requirement for good performance, I wrote a small test, which you can also find on GitHub. My test requirements were to use several subsets of a main collection of objects, have squared runtime for each update for significant numbers of iteration, and delete and create objects fairly randomly.

To fulfil these requirements, I create a list of 4000 objects. On creation each object adds itself to two of ten subsets chosen randomly and also creates an array of random size (up to 10,000 bytes) to prevent the entirety of the program to fit into the CPUs cache too easily, and stores a random point.
I then iterate over the master list of objects ten times, and each objects loops over a random subset, finds the object with the closest point inside of it, and delete that object with a 10% chance. Each object also has a chance to create a new object with again 10% probability.

I believe this test is a fair approximation for the random access patterns that occur in the update loop of a game, given the test’s simplicity.

The test measures and compares the performance of the previews approach using linked lists, with my mutable linked list as master collection, with that of my proposed deletable object lists, both as master and subset collections.

In my tests run on several machines, my solution is a consistent 5 to 15% slower when run from Visual Studio. When running the executable alone, without the debugger attached, my solution is 20 to 25% faster.

I consider this a very good result, and am even surprised at the 25% improvement, while also simplifying the usage of the collection.

Conclusion

Of course it is by no means a perfect solution, and the requirement of using the IDeletable interface, as well as the instant removal behaviour from all lists, might not suit everybody, or every application. We also lose the thread safety and some of the flexibility of the default collections.

I consider the trade-off worthwhile however, and will start using this collection in the future and may report back on my thoughts after some more extensive, and more practical tests.

In the meantime, I am curious to see what others think about my solution to this problem. How do you go about storing lists of game objects and how do you deal with the constant addition and deletion? What do you think about my proposed solution? And will/would you give my collection, or an adaptation of it a try?

I am looking forward to your thoughts and comments.

Enjoy the pixels!

Reference:

On Collections from our NCG partner Paul Scharf at the Game<Dev> blog.

	`var list = new List<object> { new object() };`

	`// will throw System.InvalidOperationException because`
	`// "Collection was modified; enumeration operation may not execute."`
	`foreach(var obj in list)`
	`{`
	`list.Add(new object());`
	`}`

	`class Enemy : GameObject`
	`{`
	`private readonly LinkedListNode<Enemy> node;`

	`public Enemy(GameEnvironment environment)`
	`: base(environment)`
	`{`
	`this.node = environment.Enemies.AddLast(this);`
	`}`

	`override protected void onDelete()`
	`{`
	`this.environment.Enemies.Remove(this.node);`
	`}`
	`}`

	`/**`
	`* Compile TypeScript`
	`*/`
	`gulp.task('typescript-compile', ['vet:typescript', 'clean:generated'], function () { https://en.wikipedia.org/wiki/C_Sharp_(programming_language)`
	`log('Compiling TypeScript');`
	`exec('node_modules/typescript/bin/tsc -p src');`
	`https://github.com/tonysneed/Demo.VS2015.TypeScript`

	`<div class="attribution">`
	`<table>`
	`<tbody>`
	`<tr>`
	`<td><span class="reference">Reference: </span></td>`
	`</tr>`
	`</tbody>`
	`</table>`
	`</div>`

Why do we care?

Unmanaged considerations

Advantages of managed environments

Why we can’t have mutability

How we can have mutability

Things I have seen people do

What I used to do

A proposal

Implementation

Performance tests

Conclusion

Paul Scharf

Related Articles

Articles on CodeInject, Books, LiveWriter, .NET4.6.1

RavenDB Sharding – Adding a new shard to an existing cluster, the easy way

How to fake a singleton in C++

RavenDB 3.0. New Stable Release