Stone Monarch has supported multiplayer games from the start, but over time the networking model has gone through several iterations. I initially implemented a peer-to-peer lock step model based on this famous Age of Empires article.
Peer-to-peer lock step has some well known issues. The peer-to-peer aspect makes it hard for players to connect with each other and increases the network load with each new player. The lock step aspect is prone to tricky bugs where the game state gets out of sync between players. My current architecture introduces a server and also relaxes some of the deterministic requirements of lock step. It still uses the lock step concept of ‘turns’ to ensure that each client runs the same simulation and does not proceed without receiving the orders of all players.
Client-Server Lock Step
The game is divided into a series of ‘turns’ of a set duration. The first step at the start of the game is to determine the length of 1 turn. This is done by measuring the round trip time of a message from each client to the server. The server picks the longest time as the turn length. During the game the turn length can be adjusted based on observed network performance.
Whenever a player wants to perform an action, the requested action is immediately sent to the server. The server aggregates all actions it has received until it reaches the next turn boundary. At this point the server sends a turn message to all clients with the actions to be executed 1 turn in the future. By the time the clients are ready to execute the following turn, they should have received all the information necessary to simulate the turn.
In the example above, the turn length has been set to 100ms. The server receives actions during turn 1, and at the start of turn 2 it sends out a turn message with actions for turn 3. This should arrive at the clients in time for them to execute turn 3.
Handling Delays
If one of the clients does not receive a turn message by the time it is ready to execute that turn, it must pause the simulation until it receives the turn from the server. Once it receives the turn, it can immediately start executing again. At this point the client will lag slightly behind the server’s simulation, so it is more likely to receive turns in time. However, there will be increased latency between when the player’s commands are issued and when they are executed.
To prevent a client from lagging too far behind the server’s simulation (and suffering from high command latency), the server can attempt to detect this and pause its own simulation. To do this, the client can include the current game time with each action that it sends to the server. The server then compares this time with the current game time in its own simulation. If the difference is greater than the length of 1 turn (or any arbitrary length), the server can pause to allow the client to catch up. If any other clients are further ahead in their simulations, this is likely to cause them to pause too until all clients are more in sync with each other.
In my original implementation I made every pause be equal to the length of 1 turn, which seemed logical to keep clients from slipping ahead or behind each other in their simulation. However, the current method has dramatically reduced the duration of pauses when they do occur. Often they are not even noticeable to the player. Now that a slow client can be allowed to lag slightly behind other clients, they could theoretically be at a disadvantage as their actions will take longer to execute. However, this can be capped by the server so I can always tune it to the find the right balance.
Adjusting Turn Length
If too many pauses are observed by the server, it can increase the turn length by including a new turn length within a turn message. All clients will apply this when they begin executing the new turn. This will increase command latency equally for all players.
Conversely, if the server does not observe any pauses over a certain period of time, it can decrease the turn length to give all players lower command latency.
Non Deterministic Events
In traditional lock step networking, only player commands are issued for each turn, and the rest of the simulation is expected to proceed deterministically between clients. However, some game logic is very hard to keep deterministic, especially in Unity where the game engine does not provide any such guarantees.
In such cases, I ensure that the non-deterministic game logic only gets executed once, and issue the result as its own action along with the actions requested by the players.
For example, let’s say a player attacks a bomb which causes it to explode. The player issues an attack action which all clients execute. At the point where the bomb will explode, it needs to check the surrounding area to see which units will be hit. This is done using a Unity API which is not deterministic. Thus, only one client (maybe the client that owns the bomb) will do this simulation and send the result (the list of units who should take damage) as a new action to the server. In this way the simulation continues to be identical across each client
If cheating is a concern, the server can be the one to compute these actions instead of the clients. This would require the server to be running the same simulation as the clients are. This is the direction I’m moving in, although some non-deterministic decisions are currently still made by clients.
The advantage to this approach is that additional data only has to be sent when some particular game logic is expected to be non-deterministic. Anything deterministic (for example a villager continuing to gather until their resources are full) does not use any additional network resources. It’s also much easier to implement than re-writing Unity features to be deterministic, especially with respect to physics.
The downside is that if I am wrong and think something is deterministic when it isn’t, it will still cause the game to go out of sync.
RELEASE DATE??