Devlog #2: Let’s talk about Performance Optimization!

There’s nothing more satisfying and relaxing than sitting down with your favourite factory automation game, watching hundreds of little objects gliding around your buildings and conveyor belts in perfect (spaghetti) harmony. But as your factory grows, and the hundreds of objects become thousands, it becomes very important for the game to continue to run smoothly, and not stutter, judder, and drop frames. Factory automation games provide an enormous opportunity for players to create incredibly elegant, complex, and intricate designs which can sometimes be challenging from a performance perspective. Because of this, we need to dedicate time and effort to performance optimization on Modulus, to make sure your factories keep ticking along like clockwork.

[h2]Performance and optimization?[/h2]
But what do we actually mean by “performance” and “optimization”? They are terms that are thrown around a lot, so let’s break them down a little bit:

”Performance” refers to how the game runs, and it changes depending on how much is happening in-game, the game’s display resolution, and what kind of hardware is in the computer that the game is being run on. Internally, we have a number of performance targets; that is, we decide in advance how we want the game to behave when these conditions change.

“Optimization” refers to the techniques and processes used to try and keep performance in-line with these targets, and these take many forms - often, any specific optimization is made to address a particular performance issue. Let’s take a look at a couple of examples we’ve already encountered while building Modulus!

[h2]Deep dive: what have we done so far?[/h2]
[h3]Grass[/h3]
First up is something that is a very common performance problem in many games: grass! But why is this? Graphics cards (GPUs) are very good at displaying large numbers of polygons, so why is grass an issue?

Grass, waving gently in the wind.

Fields of grass are made up of many hundreds of near-identical blades. By default, a game engine would try to draw the grass blades just like any other object in a scene; it would go through the list of objects that represent the grass, one at a time, and draw each one. It takes a certain amount of time for commands from the CPU to reach the GPU, so if the game has to send a command for each blade of grass individually, it can take much longer to render a frame. Then, once the frame is done and displayed to the screen, it has to do it all over again, even though the grass hasn’t moved.

So instead, when we want to render large amounts grass in Modulus, we collect all the positions and rotations of the grass together when the scene loads. Then, we send all this information to the GPU in a structured form known as a “compute buffer”. This data stays in the GPU, instead of having to be sent over every frame. The CPU then only needs to tell the GPU what one blade of grass looks like, and it can use the positions in the compute buffer to draw all the grass at once - just as if it were a single, bigger, more complicated model. And suddenly, just like that, it takes almost no time at all to render the grass, even if there’s lots and lots of it!

[h3]Conveyor belts[/h3]
Next, how about one of the main attractions of factory automation games? That’s right - conveyor belts! The objects that sit on them need to move in the direction of the belt at the right speed. But when you have lots and lots of items on lots and lots of belts, the game can spend a very long time processing the objects one after the other. But this isn’t something we can easily run on the GPU like the grass, so what do we do?

Lots of items on lots of belts…

How about instead of processing the belts one at a time, we group them up into chunks and process the belts one at a time in each chunk - but multiple chunks at once? Modern CPUs can do this using a technique known as multithreading, where the CPU is actually made up of several units called cores. This is just like adding extra machines to process your raw materials in a factory game!

Two machines process the same amount of raw material faster!

On top of this, the Unity game engine has a utility called the Burst Compiler; by writing code in a specific way, it can be compiled to take advantage of special instructions that modern CPUs have, called SIMD instructions. SIMD stands for “single instruction, multiple data”. As the name suggests, it allows groups of information to be worked on in a single instruction by the CPU - further increasing the amount of belts and items that we can process at once! This is more like upgrading the machines in your factory game; you aren’t adding any more physically, but they can do more things more quickly!

So, by changing the code that makes items ride on conveyor belts to use multithreading and the Burst Compiler, we can have lots of objects moving smoothly along!

[h2]What’s next?[/h2]
We hope you’ve enjoyed reading about these two interesting performance problems, and the way they were optimized. But what’s next for us, in terms of optimization?
Well, performance optimization is a constantly-evolving process. We want to bring you a game that both runs well and looks beautiful; this means that we continually communicate with each other when new art and new features are implemented, and frequently check on the performance of the game as it continues to grow. Often there is a push-and-pull rhythm to performance in game development; a new feature gets added to the game, and the performance drops a little. Then, optimizations are made that improve the performance for that feature - so, step by step, little by little, the game moves forward and keeps improving. Just like building the perfect factory..!

Thanks for reading, and have yourself a great day!
- Jordan (Freelance Optimization Consultant)