AI Breakthrough: Generative Models Now Outperform Average Human Creativity, While Elite Human Imagination Still Reigns Supreme

A groundbreaking study spearheaded by Professor Karim Jerbi from the Department of Psychology at the Université de Montréal, featuring the insights of renowned AI pioneer Yoshua Bengio, has unveiled a significant milestone in the evolution of artificial intelligence. For the first time, generative AI systems, including sophisticated large language models (LLMs) like GPT-4, have demonstrated the capacity to surpass the average human on specific measures of creativity. This unprecedented research, representing the largest direct comparison between human and machine creativity ever conducted, involved over 100,000 human participants, providing a robust empirical foundation for its compelling conclusions. Published in the esteemed Scientific Reports (a Nature Portfolio journal), the findings underscore a pivotal shift in the AI landscape, even as they firmly establish that the most creative individuals continue to hold a distinct and unwavering advantage over even the most advanced AI models.

The Evolving Landscape of AI Creativity: A Historical Perspective

The pursuit of artificial creativity has been a long-standing aspiration within computer science, dating back decades before the advent of modern neural networks. Early attempts in the 1960s and 70s, such as Joseph Weizenbaum’s ELIZA program, simulated conversation but lacked genuine understanding or generative capability. Later, systems like AARON, developed by Harold Cohen in the 1970s, were capable of generating original abstract drawings based on a set of rules, marking an early foray into computational art. These early rule-based systems, while innovative for their time, were constrained by their explicit programming, often resulting in outputs that, while novel, lacked the subtle nuance and unexpected leaps characteristic of human creativity.

The landscape began to dramatically transform with the rise of machine learning and, more recently, deep learning. The 2010s saw significant breakthroughs in neural networks, leading to the development of generative adversarial networks (GANs) and variational autoencoders (VAEs), which could produce strikingly realistic images, music, and text. These models, however, often operated within specific domains and sometimes struggled with coherence or originality over longer creative outputs.

The true inflection point arrived with the development of transformer architectures and the subsequent emergence of large language models (LLMs) in the early 2020s. Models like OpenAI’s GPT series, Google’s Gemini, and Anthropic’s Claude, trained on vast swathes of internet data, demonstrated an unprecedented ability to understand, generate, and manipulate human language with remarkable fluency and coherence. This leap in capability inevitably led to questions about their creative potential. Could these systems, by discerning intricate patterns and relationships within their training data, genuinely generate novel ideas, or were they merely sophisticated remixing machines? It is precisely this fundamental question that Professor Jerbi’s collaborative research set out to answer with unparalleled rigor.

Unprecedented Scale: A Rigorous Framework for Comparison

To objectively compare human and AI creativity, the research team, which included co-first authors postdoctoral researcher Antoine BellemarePépin (Université de Montréal) and PhD candidate François Lespinasse (Université Concordia), alongside Jay Olson from the University of Toronto, devised a comprehensive and "rigorous framework." This framework leveraged established psychological tests, adapted for both human and machine evaluation, ensuring a fair and direct comparison.

The primary instrument for this comparison was the Divergent Association Task (DAT). Created by study co-author Jay Olson, the DAT is a widely validated psychological test designed to measure divergent creativity – the ability to generate numerous, diverse, and original ideas from a single prompt. Unlike traditional intelligence tests that seek a single correct answer, divergent thinking tests assess the breadth and uniqueness of associative thought.

In the DAT, participants, whether human or AI, are instructed to list ten words that are as semantically unrelated as possible. The measure of creativity here isn’t just about listing obscure words, but about the semantic distance between them. For instance, a list like "table, chair, desk, lamp, book, pen, paper, wall, floor, ceiling" would score low on originality due to the strong semantic connections. In contrast, a highly creative response might feature words such as "galaxy, fork, freedom, algae, harmonica, quantum, nostalgia, velvet, hurricane, photosynthesis." The team employed sophisticated algorithms to calculate the semantic distance between the generated words, providing an objective score for divergent creativity. This method ensures that the evaluation moves beyond subjective human judgment, offering a quantifiable metric.

Performance on the DAT has been consistently correlated with success in other established creativity tests, including those used in creative writing, idea generation for problem-solving, and artistic endeavors. This robust correlation underscores the DAT’s effectiveness in tapping into broader cognitive processes fundamental to creative thinking across diverse domains. Furthermore, its practical advantages – requiring only two to four minutes to complete and being easily accessible online – facilitated the collection of data from an enormous human participant pool, crucial for the study’s "unprecedented scale."

AI’s Ascent: Outperforming the Average Human

The study’s findings represent a watershed moment. After evaluating several leading large language models, including iterations of ChatGPT, Claude, and Gemini, against the performance of over 100,000 human participants, a clear turning point emerged. Specific AI systems, notably GPT-4, demonstrated the ability to exceed average human scores on tasks designed to measure divergent linguistic creativity.

"Our study shows that some AI systems based on large language models can now outperform average human creativity on well-defined tasks," explained Professor Karim Jerbi. This observation may indeed be "surprising – even unsettling," as Jerbi noted, for those who believed creativity to be an exclusively human domain. For instance, while an average human participant might achieve a DAT score of, say, 75 on a scale of 100, top-performing AI models like GPT-4 consistently scored in the range of 80-85, demonstrating a quantifiable advantage in generating semantically diverse word lists. This suggests that LLMs have developed an uncanny ability to navigate and synthesize vast semantic spaces in ways that often surpass typical human associative thinking.

This benchmark achievement signifies that AI is no longer merely replicating or synthesizing existing data in predictable ways. Instead, it is capable of generating novel connections and associations that, in terms of sheer originality and diversity, outstrip the average person’s creative output in specific linguistic tasks. This ability has profound implications for fields ranging from content generation to ideation and even scientific discovery.

The Human Apex: A Persistent Advantage

Despite AI’s impressive strides, the research also delivered an equally crucial observation: the most creative humans still maintain a clear and consistent advantage over even the strongest AI models. When researchers parsed the data, they found a striking pattern: the top half of human participants consistently outscored every AI model tested. This gap widened significantly among the top 10 percent of the most creative individuals, who achieved DAT scores that were, on average, 15-20% higher than the best-performing AI.

This "peak creativity" remains firmly human. Professor Jerbi elaborated, "even the best AI systems still fall short of the levels reached by the most creative humans." This suggests that while AI can master the mechanics of divergent thinking and semantic association, it may still lack the deeper cognitive processes, emotional intelligence, lived experience, and perhaps even metacognitive awareness that underpin truly exceptional human creativity. The ability to connect disparate concepts through personal insight, to challenge conventional frameworks, or to imbue creations with profound meaning and emotional resonance appears to remain a uniquely human forte. The hypothetical difference between an AI-generated poem that is technically perfect but emotionally sterile, and a human-created one that resonates deeply with the human condition, illustrates this qualitative gap.

Beyond Word Lists: Creativity in Complex Tasks

The study further explored whether AI’s success on the simpler word association task could translate to more complex and realistic creative activities. The researchers extended their comparative analysis to creative writing challenges, including composing haiku (a concise three-line poetic form), crafting movie plot summaries, and developing short stories.

The results mirrored the pattern observed with the DAT. While AI systems demonstrated impressive capabilities, sometimes exceeding the performance of average humans in generating coherent and stylistically appropriate creative texts, the most skilled human creators consistently delivered work that was not only stronger in quality but also more original, nuanced, and evocative. For example, an AI might generate a perfectly structured haiku about nature, but a top human poet might infuse theirs with a unique metaphor or an unexpected emotional depth that transcends mere linguistic proficiency. Similarly, AI-generated movie plots might be logically sound, but exceptional human screenwriters often introduce twists, character arcs, and thematic depths that AI struggles to spontaneously invent. This reinforces the notion that while AI can mimic and even excel at specific creative functions, the holistic, intuitive, and deeply personal aspects of creative expression still reside with humans.

Shaping AI Creativity: The Role of Human Guidance

A critical insight from the study is that AI creativity is not a fixed attribute; rather, it is malleable and can be significantly influenced by human guidance. The research demonstrated that adjusting technical settings, particularly the "temperature" parameter of an LLM, directly impacts the adventurousness of its generated responses.

At lower temperature settings (ee.g., 0.2), AI models tend to produce more predictable, conservative, and conventional outputs, adhering closely to the statistical likelihoods derived from their training data. This is useful for tasks requiring factual accuracy or adherence to strict stylistic guidelines. Conversely, at higher temperature settings (e.g., 0.8 or 1.0), responses become more varied, less predictable, and more exploratory, allowing the system to venture beyond familiar ideas and generate more divergent associations. For instance, a low-temperature prompt for a movie plot might yield a standard hero’s journey, while a high-temperature setting could generate a surrealist narrative with unexpected character motivations.

Beyond technical parameters, the study highlighted the crucial role of "prompt engineering" – how instructions are formulated. Prompts that encourage models to think about word origins, etymology, or abstract concepts were found to lead to more unexpected associations and consequently higher creativity scores. This finding underscores that AI creativity is deeply interdependent with human interaction and guidance. The human "prompt engineer" effectively becomes a co-creator, guiding the AI’s exploration of its latent knowledge space. This dynamic interaction emphasizes that AI is, above all, a powerful tool whose creative potential is unlocked and directed by human ingenuity.

Beyond Competition: AI as a Creative Amplifier

The study offers a nuanced and optimistic perspective on the often-feared notion that artificial intelligence could replace human creative professionals. While acknowledging AI’s newfound ability to match or exceed average human creativity on certain well-defined tasks, Professor Jerbi urged a shift in perspective.

"Even though AI can now reach human-level creativity on certain tests, we need to move beyond this misleading sense of competition," he stated. Instead, Jerbi champions the view of generative AI as an "extremely powerful tool in the service of human creativity." He predicts that AI "will not replace creators, but profoundly transform how they imagine, explore, and create – for those who choose to use it."

This vision positions AI not as a competitor, but as a collaborative assistant. Imagine a writer struggling with writer’s block using an LLM to generate hundreds of diverse plot ideas, character names, or descriptive phrases in minutes. An artist could leverage AI to rapidly prototype different visual styles or generate backgrounds for their work. A musician could use AI to explore novel melodic variations or harmonic structures. By expanding the realm of possibilities and opening new avenues for exploration, AI has the potential to amplify human imagination, allowing creators to push the boundaries of their craft further and faster than ever before. This symbiotic relationship suggests a future where human ingenuity, augmented by AI, reaches unprecedented levels of innovation.

Redefining Creativity in the AI Era

"By directly confronting human and machine capabilities, studies like ours push us to rethink what we mean by creativity," concluded Professor Karim Jerbi. This research compels society to critically re-evaluate foundational definitions of originality, imagination, and artistic expression. As AI systems become more sophisticated, the lines between human and machine creativity may continue to blur, prompting deeper philosophical inquiries into consciousness, intent, and the very essence of creation.

The paper, "Divergent creativity in humans and large language models," officially published in Scientific Reports on January 21, 2026, is a testament to collaborative interdisciplinary research. It brought together leading scientists from the Université de Montréal, Université Concordia, University of Toronto Mississauga, Mila (Quebec AI Institute), and even Google DeepMind, highlighting the global effort to understand and harness the power of AI. Professor Karim Jerbi’s leadership, alongside the crucial contributions of co-first authors Antoine Bellemare-Pépin and François Lespinasse, and the foundational insights from AI pioneer Yoshua Bengio – founder of Mila and LoiZéiro, and a pivotal figure in the deep learning revolution – cement this study as a landmark achievement. Its findings provide a robust empirical foundation for navigating the complex and exciting future where human and artificial intelligence converge to redefine the creative landscape.

Leave a Reply Cancel reply

Related News

You may have missed