The long-standing narrative of artificial intelligence (AI) is one defined by monumental victories over human champions in the realm of games. For decades, researchers have utilized the structured environments of games as the ultimate proving ground for computational logic and machine learning. From the structured grids of a chessboard to the sprawling, high-speed battlefields of modern esports, machines have consistently demonstrated an ability to process data at a scale and speed that dwarfs human capability. However, a new critical assessment from leading researchers suggests that while AI can master specific games through brute-force repetition, it remains fundamentally inferior to humans in the art of general gaming—the ability to encounter a completely new title and understand its mechanics, goals, and nuances within minutes.
In a recent research paper titled "What is General Video Game Playing and Why Does it Matter?" authored by Julian Togelius, a professor of computer science at New York University, and his colleagues, the authors argue that the current state of AI lacks the "common sense" and lived experience that allow human players to thrive in diverse environments. While a model can be trained to play a specific game with superhuman precision, that same model often becomes entirely helpless when faced with even minor variations in the game’s design. This discrepancy highlights a significant hurdle in the journey toward Artificial General Intelligence (AGI), suggesting that the true measure of a machine’s "human-level" intellect may not be its high score in a single game, but its adaptability across a hundred different ones.
A Chronology of Machine Mastery and the Limits of Specialization
The history of AI development is inextricably linked to the history of gaming. In 1997, IBM’s Deep Blue made headlines globally by defeating world chess champion Garry Kasparov. This was a watershed moment, yet Deep Blue was a "narrow" AI—it was a machine built for one purpose, utilizing search algorithms to evaluate millions of potential positions. It could not play checkers, nor could it describe what a chess piece looked like in the real world.
Nearly twenty years later, in 2016, Google DeepMind’s AlphaGo achieved what many experts believed was decades away: defeating Lee Sedol, one of the world’s greatest Go players. Go is significantly more complex than chess, with more possible board configurations than there are atoms in the observable universe. AlphaGo succeeded by using deep neural networks and reinforcement learning, a process where the AI plays against itself millions of times, refining its strategy through trial and error.
By 2019, the frontier shifted to real-time strategy and multiplayer online battle arena (MOBA) games. OpenAI Five defeated the world champions of Dota 2, and DeepMind’s AlphaStar reached the "Grandmaster" level in StarCraft II. These games required the AI to handle "imperfect information"—meaning the machine could not see the entire map at once—and required long-term planning over the course of a 45-minute match. Despite these triumphs, the NYU researchers point out a glaring flaw: if you changed the map layout or adjusted the health points of a single character, these "superhuman" models would often collapse, unable to generalize their knowledge to the new parameters.
The Data Gap: Reinforcement Learning vs. Human Intuition
The primary method for training game-playing AI today is reinforcement learning (RL). In this framework, an agent is placed in an environment and given a "reward" signal for positive actions, such as gaining points or winning a match. While effective, this method is incredibly data-hungry. Supporting data from recent studies indicates that a curiosity-based RL model may require upwards of four million keyboard interactions to complete a single game. In practical terms, this equates to roughly 37 hours of continuous, high-speed simulation play before the machine "understands" the basic loop of the game.
In stark contrast, the average human gamer can pick up a controller for a game they have never seen and grasp the fundamental mechanics—movement, jumping, combat, and objectives—in less than ten minutes. Most players can reach a functional level of proficiency in under 10 hours. This efficiency is rooted in "transfer learning" and lived experience. A human knows that a "heart" icon usually represents health, that "red" barrels often explode, and that "platforms" are meant to be jumped upon.
Human babies begin to identify individual objects and understand basic physics, such as gravity and object permanence, between 18 and 24 months of age. By the time a person sits down to play a video game, they bring two decades of "training data" from the physical world. AI models, unless specifically designed with "world models," lack this context. To a standard AI, a character jumping is merely a change in the Y-axis coordinates of a sprite; to a human, it is a physical action with an intuitive purpose.

The Challenge of Open-Ended Worlds
The NYU paper emphasizes that as games move away from rigid win/loss conditions toward open-ended "sandbox" experiences, the AI’s disadvantage grows. In a game like Red Dead Redemption 2, the "goal" is not merely to reach the end of the story, but to inhabit a character. Success in such a game is subjective and multifaceted.
A machine can be programmed to optimize for a score, but it struggles to optimize for "fun," "exploration," or "moral choice." In Minecraft, an AI might successfully learn to mine blocks through reinforcement learning, but it lacks the abstract thinking required to decide what to build or to understand the aesthetic value of a structure. The researchers argue that well-designed games are "expertly tailored to human capabilities, intuition, and common sense." Because games are made by humans for humans, they rely on a shared cultural and physical vocabulary that machines simply do not possess.
Official Responses and the Move Toward Generalist Agents
The AI industry is not blind to these limitations. Google DeepMind recently unveiled SIMA (Scalable Instructable Multiworld Agent), a project designed to address the very issues raised by Togelius and his team. Unlike AlphaStar, which was built for one game, SIMA is trained across a variety of 3D environments, including titles like No Man’s Sky and Goat Simulator 3.
DeepMind’s approach with SIMA involves integrating large language models (LLMs) like Gemini into the agent’s framework. By allowing the AI to "read" instructions and "reason" about its environment using language, researchers hope to bridge the gap between pixel-processing and conceptual understanding. Early results show that SIMA can perform tasks in games it has never seen before by following verbal commands, representing a significant step toward generalist AI.
However, Togelius and his colleagues remain skeptical that we are close to a "human-level" solution. They have proposed a new benchmark for the industry: an AI must be able to play and win any of the top 100 games on Steam or the iOS App Store without prior specific training, and it must do so within the same timeframe a human would require. "It is not at all clear that current methods and models are suited to this problem," the authors state, noting that general video game playing is a "very hard challenge that we are nowhere near solving."
Broader Implications for Artificial General Intelligence
The struggle to master general gaming is a microcosm of the broader struggle to develop AGI. If an AI cannot figure out how to navigate a virtual world designed for entertainment, it raises serious questions about its ability to navigate the unpredictable real world. The implications extend to robotics, autonomous vehicles, and automated assistants, all of which must be able to handle "out-of-distribution" scenarios—events they were not specifically trained for.
The ability to generalize is what allows a human to drive a car in a different country, use a new tool, or solve a problem through creative analogy. If AI remains stuck in the paradigm of "narrow" mastery, it will continue to be a tool rather than a peer. The NYU researchers suggest that the path forward may require a fundamental shift in how we build AI, moving away from simple reward-seeking behavior toward models that can plan, imagine, and understand the "why" behind their actions.
In conclusion, while machines may continue to set records in specific, high-stakes matches, the humble human gamer remains the gold standard for versatility. The true "singularity"—the point where AI surpasses human intelligence—will not be marked by a computer winning a game of chess, but by a computer that can sit down, pick up a controller, and say, "I think I see how this works," just as quickly as a teenager in a basement. For now, that level of cognitive flexibility remains a uniquely human achievement.




