Performing Lag Compensation in Unreal Engine 5

Approaches and obstacles to implementing backwards reconciliation using UE5 replication

By Jay Mattis | Tuesday, February 21, 2023

In this article, we’ll discuss a lag compensation technique called backwards reconciliation, evaluate existing implementations in Unreal Engine 5, and explore some of their limitations.

What is backwards reconciliation?

Backwards reconciliation is a server-side technique to compensate for latency and reconcile actions in competitive networked games like Overwatch, Valorant, Call of Duty, Counter-Strike, and more. Although the technique can be used for a variety of game mechanics, it is most commonly used to implement server-authoritative hit detection for hitscan weapons. In this context, players fire their weapons instantly on their screens but it takes time for their actions to reach the server due to Internet latency. Applying backwards reconciliation allows the server to determine who/what the player had in their crosshairs at the time. The video below illustrates how this concept works within SnapNet.

What’s wrong with client-authoritative hit detection?

Unreal Engine 5 doesn’t provide any functionality for performing lag compensation. As a result, many games employ client-authoritative hit detection instead, including Epic’s ubiquitous ShooterGame and Lyra samples. In essence, the client sends a message to the server that says “I shot this guy, trust me.”

Problem 1: Security

The security implications of this approach are probably obvious, but Alvaro Jover-Alvarez has an excellent write-up on his blog. In a subsequent post, he highlights several other exploits present when shooting is triggered via RPC (Remote Procedure Call) such as being able to shoot from impossible locations or change the fire rate of weapons.

Although it appears at first glance that these types of exploits could be easily mitigated with additional validation, it’s not so simple in practice. Because messages sent on the Internet don’t have consistent travel times or can be lost entirely, the server has no guarantee when it receives an RPC how long ago it was sent. In the case of the fire rate exploit, for example, how can you tell whether two subsequent weapon fire RPCs actually represent a fire rate that is faster than should be possible?

It’s important to note that identifying and preventing all of these exploits is challenging and this is only for the simple case of weapon firing. Every game mechanic networked using RPCs will have similar exploits. How confident can you be that you’ve closed all of these gaps? Is this even really something gameplay engineers should need to think about when implementing new game mechanics?

Problem 2: Player Experience

But cheating may not even be the most important reason to avoid client-authoritative hit detection. When a player has a very high latency to the server they are shooting at enemies based on where they were a long time ago. If the server always allows clients to shoot enemies based on where they see them regardless of latency, then players with good connections may get killed even though they are no longer anywhere near the shooter! By contrast, when using backwards reconciliation, the developer decides how much latency is acceptable in this regard which is critical to maintaining competitive integrity.

Potential Solutions

Ok, so how can we implement backwards reconciliation in Unreal Engine? Let’s start by surveying existing implementations. I’ve found a few:

Unreal Tournament by Epic Games and the UE4 Community
Rewinding a networked game by Alvaro Jover-Alvarez
Unreal Engine 5 C++ Multiplayer Shooter by Stephen Ulibarri
RB - Lag Compensation Plugin by Renegade Bananas

All of these implementations function more or less the same way and, despite being well-reasoned, they all operate under the same flawed assumptions. To understand what’s wrong, we must first examine how they work:

Step 1: Store a history of hitboxes on the server

The server stores a history of where hitboxes were in the past along with the corresponding time.

Step 2: Trace against client hitboxes

When a player fires a bullet, the client performs a trace against the hitboxes of the other players to determine who was hit.

Step 3: Send RPC

Once the client determines who was hit by their bullet they send an RPC to the server informing them of who it is, the location/direction they fired the bullet, and, critically, a timestamp of when this occurred. Time is a tricky subject when using Unreal’s replication as we’ll see below, but most of the implementations above calculate the timestamp as the current server time minus your average round-trip latency.

Step 4: Trace against historical server hitboxes

When the server receives an RPC indicating that the client hit someone, it uses the provided timestamp to seek into the history buffer stored in step 1. The server then traces against the hitboxes as they were when the client originally fired their weapon—dealing damage upon confirmation of a successful trace.

Results

After implementing the above approach into Unreal’s ShooterGame sample and running the game with Emulation Target: Everyone and Network Emulation Profile: Average, I collected the distance between the hitbox used on the client and the one used on the server when firing:

Shot	Error Without Rewind	Error With Rewind
1	93.4909	14.5628
2	48.8423	22.9744
3	87.999	1.72896

As can be seen from the data above, rewinding this way seems to get the hitboxes closer to what the client saw but it raises several questions. Why isn’t it accurate? Why is there still so much variance in the error? Most importantly, can it be relied upon?

The Flaw

The answer is no and that’s because the timestamp used in step 3 is invalid for a couple of reasons:

Offsetting the current server time by the player’s ping may be correct on average but, due to packet jitter and loss, it’s almost always wrong for any given communication. There is no guarantee how long it has been since you last received an update to the character’s location, especially when you also factor in Unreal’s actor relevancy and priority. Further, there is no way of knowing what server time the last update you received corresponds to.
On clients, remote characters are smoothed towards the most recent update received from the server and also continue to tick forward in time. In other words, your client is effectively guessing where the player is moving and then blending out the errors when it’s wrong. That means that characters will be in places they never were on the server and so there is no timestamp that would match any past state that the server may have stored.

It’s critical to note here that, although the errors above seem like they may be manageable, there is no bound on how large the error can be because server and client are completely misaligned in time. If the player recently spawned/teleported, used a movement ability, or changed poses (crouched, for example) any hit detection attempted on the server may be completely meaningless.

In other words, the packet jitter of an average network connection makes any server-side validation of our shots a total guess when using this implementation—which is exactly where we started before we did all this complicated rewinding!

Exploring Solutions

To fix this, let’s consider what we need for accurate hit detection:

All of the relevant player data to reconstruct hitboxes:
- Location
- Rotation
- Animation pose (or enough information to derive it)
- Server timestamp (so the server can later refer back to this data for hit detection)
On the client, we need to render players using only information from the data above since that’s what the server will be storing and referring back to.

Replicating the relevant player data

Location and rotation data are already replicated via the actor’s ReplicatedMovement property. Unfortunately, due to packet jitter and Unreal’s actor relevancy and priority, we have no way of knowing how long ago it was sent or, more precisely, what server time this data corresponds to.

At this point, you might be thinking of adding a new replicated property to your character that simply holds the latest server timestamp. Once again, Alvaro Jover-Alvarez does an excellent job explaining why that won’t work. There is no guarantee that the timestamp property would get replicated or reach the client at the same time as the ReplicatedMovement property.

As he points out, one option might be to create a new USTRUCT that holds an FRepMovement and the server timestamp, and then replicates them both atomically using a custom NetSerialize function. Something like this:

USTRUCT()
struct FMyRepMovement
{
    GENERATED_BODY()

    UPROPERTY()
    FRepMovement RepMovement;

    UPROPERTY()
    float ServerTime;

    bool NetSerialize(FArchive& Ar, class UPackageMap* Map, bool& bOutSuccess);
};

template<>
struct TStructOpsTypeTraits<FMyRepMovement> : public TStructOpsTypeTraitsBase2<FMyRepMovement>
{
    enum
    {
        WithNetSerializer = true
    };
};

That would work, but it would also use an extra 4 bytes per replicated player. In a game with 64 players, that’s ~25% of an entire packet just to synchronize those timestamps alone! Consider that the struct doesn’t yet include any data needed to reconstruct an actor’s pose. That might start to look something like this:

USTRUCT()
struct FMyRepMovement
{
    GENERATED_BODY()

    UPROPERTY()
    bool bAimingDownSight;

    UPROPERTY()
    float AimPitch;

    UPROPERTY()
    float AimYaw;

    UPROPERTY()
    bool bCrouched;

    UPROPERTY()
    float LocomotionTime;

    UPROPERTY()
    TObjectPtr<UAnimMontage> Montage;

    UPROPERTY()
    float MontagePlaybackPosition;

    UPROPERTY()
    FRepMovement RepMovement;

    UPROPERTY()
    float ServerTime;

    UPROPERTY()
    bool bSprinting;

    UPROPERTY()
    TObjectPtr<AWeapon> Weapon;

    bool NetSerialize(FArchive& Ar, class UPackageMap* Map, bool& bOutSuccess);
};

The bandwidth considerations for an approach like this are probably becoming clear now. Using Unreal’s current replication system, this will quickly consume all available bandwidth and significantly limit how many actors can be sent each frame—reducing the quality and responsiveness of the game online. Other netcode architectures, like most of the ones mentioned at the top of this article, typically employ delta-encoding to only transmit what has changed. Epic’s experimental Iris Replication offers some hope in this regard but that’s beyond the scope of this article.

Rendering players so they can be reconstructed on the server

Let’s assume for the sake of argument we’ve been able to efficiently transmit the data needed to clients. We still need to make sure that the client only renders the players in a way that can later be reconstructed on the server when performing hit detection.

By default, characters in Unreal update their transforms when new updates come in from the server and continue to tick and run through the character movement code in between updates. Since this results in the characters teleporting when network data is received, their transforms are then smoothed over time to avoid any visual discontinuities. Because of this process of extrapolation and smoothing, it is impossible for the server to accurately recreate the pose of the character as seen by a client. You can see this behavior in action below:

Simulated proxy behavior at one network update per second. The blue box represents current server position, the red box indicates last received update, and the red arrow points along the player's velocity.

Note how in the video above the character is ahead of the latest server update received (red box) due to its predictive simulation. Then, when the character on the server (blue box) changes direction, the character on the client warps to correct its mistake when the next update is received: commonly called rubber-banding.

To achieve accurate hit detection, we’ll need to take a different approach. Instead, we can wait until we’ve received two sequential network updates, set the character state to match the pose from the first update and then interpolate over time to the pose from the second update. The goal is to do this over time so that we just receive the third network update by the time the player finishes interpolating to the second and so on. Here’s what that would look like in comparison:

Interpolating proxy at one network update per second. As above, the blue box represents current server position, the red box indicates last received update, and the red arrow points along the player's velocity.

In contrast to the first video, you can see that the character is now behind the latest server update received (red box) and is always moving toward it, interpolating only through known good states.

It’s worth noting that one network update per second is not sufficient and so neither looks good but it’s useful for visualizing the behavior you get with each approach. In this case, interpolation achieves a few important things:

Character motion is smooth. No more rubber-banding corrections.
No more guessing. The character is always between two states that actually occurred on the server.
We can communicate a single time to the server from which it can recreate exactly what the client was rendering.
Remote characters no longer need to run the character movement code so CPU usage is greatly reduced.

Putting it all together

Taken together, synchronizing timestamped slices of data and interpolating them over time are a common technique called snapshot interpolation and this is exactly what those big-budget games I mentioned earlier are doing. You’ve seen some of the benefits first-hand in this article, but there are more! To name a few:

Backwards reconciliation can be systemized so that it automatically works for all actors—not just characters.
You can send updates about many more actors in a single packet than with Unreal replication. That means smoother and more responsive gameplay.
Match replays and even instant replays i.e., killcams, become straightforward to implement.

Unfortunately, as we’ve seen in this article, Unreal’s replication system is not designed to utilize this technique and cannot easily be leveraged to do so. It’s possible that Epic’s Iris Replication and Network Prediction Plugin will improve this situation in the future but the former is still experimental and the latter isn’t officially supported or acknowledged.

Conclusion

Lag compensation is tricky and—like many netcode techniques—it can be difficult to find good information on what’s needed for a robust production-quality implementation. Hopefully, this article has been helpful in communicating the gap between what’s reasonably achievable using Unreal’s replication system out of the box and what players have come to expect from the latest AAA titles.

Ultimately, the characteristics of network connections vary widely so it’s important to identify the worst-cases in your implementation. They will undoubtedly occur more often than you think. The more guarantees your netcode architecture can provide, the less you’ll need to consider things like latency and exploits while implementing gameplay. That directly translates to faster iteration times, fewer bugs, higher security, and better games overall.

As you’ve seen in this article, while Unreal Engine 5’s replication is a reasonable general-purpose framework, it doesn’t provide many guarantees to gameplay code and it can be extremely challenging to align things accurately in time. For fast-paced games where fidelity and competitive integrity are critical, additional techniques and approaches should be considered. We’ve developed SnapNet to specifically address many of these issues so please let us know if you’d like to discuss how your project might benefit.

Finally, if you have any questions, if I got anything wrong, or even if you just want to chat about netcode in general, don’t hesitate to reach out!

Jay Mattis is a founder of High Horse Entertainment and author of SnapNet.
SnapNet delivers AAA netcode for real-time multiplayer games.

←Previous
Next→