.NET

Serialisation – Comparing XML, SDL, TOML, JSON

Script files are an important asset for a lot of games.

There are a lot of different uses for script files. In general, we use the term to describe two different files, those containing scripts – as in code – and those containing data.

In this post we will look at the second kind. Specifically we will look at a number of representations of actual objects we might encounter in our game environment.

We will further only look at human readable formats. Binary formats certainly have their place, but for most purposes having files that can be opened and changed in any text editor has huge advantages that I do not want to miss.

serialisation-header

I will introduce four examples, chosen somewhat arbitrarily. The goal is to see some of the different approaches people take, and explore the advantages and disadvantages of either using an example.

While I will give a recommendation in the end, there is no one right format for all purposes. One should always look at the specific requirements of a given problem, and choose the technology most applicable.

XML

The first format I want to look at is XML.

XML is old, well known, and many (de)serialisation libraries exist for almost any programming language.

Example

Before we look at the actual XML, let us first set up an example.

Below you see a small class serving as a template for units in for example a strategy game. It has simple properties, a list of string identifiers and a list of parametrised objects.

class UnitTemplate
{
    string name = "Tank";
    
    float speed = 0.5f;
    float health = 100;
    
    List<string> weapons = new List<string>
        { "big cannon", "small turret" };
    
    List<Ability> abilities = new List<Ability>
        {
            new CloakAbility(cooldown: 10),
            new RegenerateAbility(healthPerSecond: 1),
        }
}

While I would never write this code in a production setting, it serves as a good representation for the kind of data we will try to express with our script files.

If we represent the data of this object in an XML file, it might look something like this:

<UnitTemplate>
	<name>Tank</name>
	<speed>0.5</speed>
	<health>100</health>
	<weapons>
		<weapon>big cannon</weapon>
		<weapon>small turret</weapon>
	</weapons>
	<abilities>
		<ability type="cloak">
			<cooldown>10</cooldown>
		</ability>
		<ability type="regenerate">
			<healthPerSecond>1</healthPerSecond>
		</ability>
	</abilities>
</UnitTemplate>

Overall, this is certainly not terrible. It is very clear and the meaning of the different tags, attributes and their content is unambiguous.

It is however quite verbose. Most of the file is taken up by tags, instead of our data.

We can improve on this to some degree by making better use of attributes and closed tags as follows.

<UnitTemplate name="Tank" speed="0.5" health="100">
	<weapons>
		<weapon name="big cannon" />
		<weapon name="small turret" />
	</weapons>
	<abilities>
		<cloak cooldown="10" />
		<regenerate healthPerSecond="1" />
	</abilities>
</UnitTemplate>

This is a much more concise solution. However, we still have a lot of tags that seem somewhat redundant.

Here is an outline of the semantic data we want to represent, with as little formatting and syntactic necessities as follows:

UnitTemplate
	name="Tank"
	speed=0.5
	health=100
	weapons
		- "big cannon"
		- "small turret"
	abilities
		- cloak
			cooldown=10
		- regenerate
			healthPerSecond=1

There must be a way to represent this data in a concise and readable form, without as many tags as XML requires us to use.

Apart form the verbosity, there are other problems with XML, which prevent it from being suitable for our purposes.

One of them is the distinction between attributes and tags. While for its originally intended usage – representing documents (think of the related HTML) – this makes sense, for us, there is no difference.

Having two syntactic options to represent a single semantic can be confusing and lead to inconsistent usage. This may make editing XML files by hand significantly harder.

SDL

One alternative approach to XML is SDL, the Simple Declarative Language.

Translating our XML from about into SDL results in the following:

UnitTemplate name="Tank" speed=0.5 health=100 {
	weapons "big cannon" "small turret"
	abilities {
		"cloak" cooldown=10
		"regenerate" healthPerSecond=1
	}
}

Now, this certainly is concise. There is hardly anything here not corresponding directly to our data.

Also, something which I personally like especially is that SDL makes a clear distinction between numbers ans strings.

I am still mixing attributes and content just like above. However, we can easily change that without making the script much longer:

UnitTemplate {
	name "Tank"
	speed 0.5
	health 100
	weapons "big cannon" "small turret"
	abilities {
		"cloak" {
			cooldown=10
		}
		"regenerate" {
			healthPerSecond=1
		}
	}
}

This is maybe even more readable.

However, we still remain with the same ambiguity as above, for when to use attributes, and when to use properties.

Further, while this format is very readable, writing valid files may be more difficult. Note how some identifiers are wrapped in quotes and others are not (and having both for the same property is valid as well).

While the syntax of SDL defines how to handle the different cases, having to keep these rules in mind may be very confusing and lead to enough syntax errors to make writing SDL by hand impractical.

Also note how there is no real difference between lists with objects as elements and objects with properties, similar to XML. This again does not necessarily result in an intuitive representation of our data.

TOML

Another format we could consider is TOML, Tom’s Obvious, Minimal Language.

Expressing our data in TOML might result in this:

[UnitTemplate]
name = "Tank"
speed = 0.5
health = 100
weapons = [ "big cannon", "small turret" ]
 
[[UnitTemplate.Ability]]
type = "cloak"
cooldown = 10
 
[[UnitTemplate.Ability]]
type = "regenerate"
healthPerSecond = 1

As we can see, TOML is also able to express our data very concisely.

We also have a clear difference between strings and numbers, and lists of objects are represented differently than objects with properties.

There are no attributes, only properties, removing another source of ambiguity.

Overall, I think TOML is a neat format, but I still doubt whether it is the right thing to represent the kind of data in question.

My main criticism is that it feels somewhat unstructured to me. While the nesting is clearly defined and for the most part easy to understand and write, it is not necessarily obvious at a first glance.

It is however a format that seems well suitable for simpler cases, like settings or configuration files, which is in fact its stated purpose.

JSON

JSON – JavaScript Object Notation – is as the name implies a subset of JavaScript.

Our data represented in JSON might look like this:

{
	"UnitTemplate" : {
		"name" : "Tank",
		"speed" : 0.5,
		"health" : 100,
		"weapons" : [
			"big cannon",
			"small turret"
		],
		"abilities" : [
			{ 
				"type" : "cloak",
				"cooldown" : 10
			},
			{ 
				"type" : "regenerate",
				"healthPerSecond" : 1
			}
		]
	}
}

We again have something that looks a bit more verbose. This is mostly caused by JSON being very explicit with nesting. Every object must be surrounded with { } while every list is surrounded with [ ].

On the upside, this makes the relation between different properties very clear. With most programmers being used to C-style languages, these notations could be considered intuitive for at least the majority of programmers.

JSON also does not have a concept of attributes. There are only named properties of different types.

We are not stuck with the above verbose form of JSON, should be decide to use it however.

For example, here is a small pattern I like to use when considering a list of similar objects that have different properties and are identified by a string identifier:

"abilities" : [
	{ "cloak" : {
		"cooldown" : 10
	} },
	{ "regenerate" : {
		"healthPerSecond" : 1
	} }
]

This makes a clear distinction between the type and properties of the objects, and saves us some typing at the same time.

It can also significantly simplify deserialisation, since we are bound to read the name – and here type – of the object before reading its properties.

Note how I already used the same pattern in the first example for the entire object itself.

Further, like XML – but unlike the other two formats – JSON is white-space ignorant. This allows us to format our file in a much more compact form, while still keeping it just as readable:

{ "UnitTemplate" : {
	"name" : "Tank",
	"speed" : 0.5,
	"health" : 100,
	"weapons" : [ "big cannon", "small turret" ],
	"abilities" : [
		{ "cloak" : { "cooldown" : 10 } },
		{ "regenerate" : { "healthPerSecond" : 1 } }
	]
} }

In fact, note how closely this resembles our original C# code:

class UnitTemplate
{
    string name = "Tank";
    
    float speed = 0.5f;
    float health = 100;
    
    List<string> weapons = new List<string>
        { "big cannon", "small turret" };
    
    List<Ability> abilities = new List<Ability>
        {
            new CloakAbility(cooldown: 10),
            new RegenerateAbility(healthPerSecond: 1),
        }
}

The conversion is virtually one to one.

While this is not surprising, given the origins of JSON, it shows how well it is suited to represent the kind of data we are dealing with.

On a last note, in many applications, each script file is likely to contain only a single object – in this case unit template.

In that case, we can of course simplify our code even further, making it contain only the essentials:

{
	"name" : "Tank",
	"speed" : 0.5,
	"health" : 100,
	"weapons" : [ "big cannon", "small turret" ],
	"abilities" : [
		{ "cloak" : { "cooldown" : 10 } },
		{ "regenerate" : { "healthPerSecond" : 1 } }
	]
}

Further, when shipping our game, we could compress the files by removing all unnecessary white-space, turning it into a single line. That will both save space, and slightly improve parsing performance.

Comparison

As I am sure is obvious from my comments above, I strongly dislike the distinction between attributes and content/properties.

I further consider grouping objects and lists using brackets a positive feature, since it leaves nesting unambiguous, and clearly maps onto data representations in source code.

That is why JSON is my clear favourite of the above – and in fact any other format I have come across so far.

The only thing I do not like about JSON is that it allows any string as name for properties. Consequently, property names have to be wrapped in quotes, just like string values.

Would I define my own clear-text data storage format, I would take JSON, remove those quotes, and only allow alpha-numerical identifiers as property names.

Other than this, I have not found any fault with JSON, despite now using it heavily for several years.

Conclusion

Above I highlighted some of the differences between clear-text data storage formats XML, JSON, and the lesser known SDL and TOML.

I gave some arguments for why I consider them more or less suitable for different uses.

While any of them, and any number of other formats can be used to represent the same data, I hope I gave a coherent explanation for why I prefer JSON.

In either case, let me know what you think! Do you agree with my opinions and arguments? What formats or languages do you use, and why?

Make sure to leave a comment and feel free to share this post with anyone who may be interested.

Reference: Serialisation – Comparing XML, SDL, TOML, JSON from our NCG partner Paul Scharf at the GameDev<T> blog.

Paul Scharf

Paul is a self-publishing game developer. He believes that C# will play an ever growing role in the future of his industry. Next to working on a variety of projects he writes weekly technical blog posts on C#, graphics, and game development in general.

Related Articles

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments
Back to top button