New details of what went wrong during “Fortnite’s” Playground mode outage were released by Epic Games.
The limited time mode was originally released on June 27, and gives players one hour to build, explore, and practice in the world of “Fortnite” without the pressure of an ongoing match. However, shortly after launch the Playground was closed, leading to complaints from players and pressure on Epic Games to fix the issue.
At that time, we knew that there were errors with the matchmaking system the game uses. Now we know, more specifically, that the issue is related to how many matches the Playground mode made versus the Battle Royale mode.
Basically, each node in the matchmaking groups has a list of dedicated, available servers. Players connect to the matchmaking service, and the service then assigns the player to the appropriate node in their region, which then automatically selects one of its free servers for the player of that region.
The Battle Royale Mode has to match up 100 players to duke it out. The Playground mode is making matches for groups of one to four players at a time. So what this means is more virtual matches are being made overall, which required 15 times as many servers as other modes of “Fortnite.”
What this meant was the nodes were continuously requesting to join servers, requesting extra servers, and overall taking 15 times longer to check for available servers. As the news release puts it, this ultimately led to the CPU holding “a backlog of pending requests resulting in a feedback loop” which meant Epic Games had no choice but to close the Playground for repairs.
To repair the limited time mode, Epic split the Playground’s matchmaking service into its own service cluster. This was, in part, to stop the server jam from affecting other modes and would allow the developers to pick at the service until they could get it back online.
The solution was to ensure repeated searches from the nodes for available servers weren’t necessary, which meant the team had to bulk up the “rebalance sessions from other nodes.” The matchmaking system now shifts regional capacity from nodes with excess space to those that are running low, which means nodes won’t have to search outside of local region lists of servers as much as they used to.
Epic then tested the new process by hurling millions of theoretical players at the Playground mode, in a process they refer to as the “tweak-test-evaluate cycle” to be sure that the mode could withstand being relaunched.
While the takedown of Playground was a stressful event for the developer, it is noted in the news release that they learned a lot about their own matchmaking service and its “failure points.”
“The process of getting Playground stable and in the hands of our players was tougher than we would have liked, but was a solid reminder that complex distributed systems fail in unpredictable ways,” the news release states. “We were forced to make significant emergency upgrades to our Matchmaking Service, but these changes will serve the game well as we continue to grow and expand our player base into the future.”