May 10, 2026
survey-enterprises-say-they-are-ready-for-agentic-ai-failures-but-few-test-recovery-often

Most enterprise organizations say they are ready to recover from disruptions involving agentic AI, but a new survey of more than 300 IT decision-makers from Australia, New Zealand, Europe, the United Kingdom, and the United States suggests relatively few test those plans often enough to prove it. This dichotomy between perceived readiness and practical validation highlights a critical vulnerability in the rapidly evolving digital landscape, where autonomous AI systems are increasingly central to business operations. The survey, conducted by Keepit, a vendor-independent cloud backup and recovery service based in Denmark, paints a concerning picture of organizational complacency and a lack of rigorous, systematic testing protocols for emerging AI-driven risks.

The findings from Keepit’s comprehensive "Data Report 2026" reveal that a staggering 94% of respondents expressed confidence that their existing disaster recovery (DR) plans adequately cover agentic AI systems. However, this high level of assurance is sharply contrasted by the fact that only 32% of these organizations test those crucial plans on a monthly basis. This significant gap between confidence and consistent validation raises serious questions about the true resilience of enterprises against the unique challenges posed by agentic AI failures. Such infrequent testing, especially in a dynamic technological environment, means that many plans may be outdated, incomplete, or simply unworkable when a real crisis strikes. The report underscores that while the intent to protect data and systems is present, the practical execution of robust recovery strategies often falls short.

The Rise of Agentic AI and Its Unique Risks

Agentic AI systems, characterized by their autonomy, goal-driven behavior, and ability to make decisions and take actions without continuous human oversight, are rapidly being integrated into core enterprise functions. From automating customer service and supply chain logistics to optimizing financial trading and cybersecurity defenses, these systems promise unprecedented efficiencies and innovation. However, their very nature introduces novel and complex failure modes that differ significantly from traditional IT disruptions.

Unlike conventional software, which typically fails predictably based on coding errors or infrastructure issues, agentic AI can fail in more subtle, unpredictable, and cascading ways. A malfunctioning AI agent might execute erroneous transactions, disseminate incorrect information, compromise data integrity across interconnected systems, or even trigger security vulnerabilities by misinterpreting commands or environmental cues. The autonomous nature means that a small error can quickly propagate, leading to widespread operational paralysis, significant financial losses, reputational damage, and even regulatory penalties. Recovering from such failures requires not just restoring data but also understanding the state of the AI, its decision-making context, and the integrity of the data it processed or generated. This complexity demands a specialized approach to disaster recovery that goes beyond traditional backup and restore procedures, necessitating a deeper understanding of AI ethics, explainability, and control mechanisms.

Survey: Enterprises Say They Are Ready for Agentic AI Failures, but Few Test Recovery Often -- Campus Technology

The Testing Gap: A Closer Look at Enterprise Preparedness

The Keepit survey delved deeper into the nuances of enterprise preparedness, uncovering additional layers of vulnerability. Perhaps more worrying than the infrequent testing, 33% of IT and security leaders responding to the survey admitted to having only partial control over the use of agentic AI within their organizations. This lack of comprehensive oversight suggests a shadow IT problem or simply an inability to keep pace with the rapid deployment of AI tools by various departments. Without full control, it becomes exceedingly difficult to integrate these systems into a centralized disaster recovery framework or to even identify all potential points of failure. Furthermore, 52% of respondents harbored doubts about whether their existing recovery plans truly cover all agentic AI scenarios. This internal skepticism, coming directly from decision-makers, stands in stark contrast to the overall 94% confidence figure, indicating a potential disconnect between high-level assurances and ground-level understanding of AI-specific risks.

Kim Larsen, Group Chief Information Security Officer at Keepit, emphasized the critical need for a more structured approach. "Organizations need to put more emphasis on creating long-term, structured, and tested disaster recovery plans," Larsen stated. "This also means putting a spotlight on data governance and accountability, which is the foundation for any resiliency plan." His comments highlight that robust recovery is not merely a technical exercise but a strategic imperative rooted in strong governance and clear lines of responsibility. Without these foundational elements, even the most advanced recovery technologies will struggle to deliver effective results.

The survey’s key findings also indicated that while most organizations have evaluated large-scale data recovery at least once (around 90%), this testing is often inconsistent and not systematically applied across all critical systems. This sporadic approach means that enterprises might be prepared for a generic data loss event but ill-equipped for specific, complex scenarios involving agentic AI, which can have unique dependencies and interconnections.

The Overlooked Pillars: Identity and Authentication Systems

A particularly alarming finding of the report pertains to the neglect of identity and authentication systems in recovery planning. Essential components of modern IT infrastructure, such as Microsoft’s Entra ID (formerly Azure Active Directory) and Okta, are tested far less often than other data systems. Identity-related systems are the gatekeepers of an organization’s digital assets, controlling who can access what. A failure in these systems can lead to widespread lockout, unauthorized access, or complete operational standstill, effectively paralyzing an enterprise regardless of whether its core applications and data are intact.

Survey: Enterprises Say They Are Ready for Agentic AI Failures, but Few Test Recovery Often -- Campus Technology

Compared to productivity applications like Microsoft 365, Google Workspace, and Salesforce, Keepit found that, on average, productivity applications are restored four times as frequently as identity applications. The report starkly illustrates this disparity: "For every four companies who run a yearly test on their productivity workload, only one of them (25%) will have run a test on their identity applications." This oversight is a significant vulnerability. Imagine a scenario where all your enterprise data is backed up, but no one can log in to access it because the identity provider is down or compromised. The business effectively grinds to a halt. The potential for catastrophic disruption from an identity system failure is immense, making its neglect in recovery planning a critical blind spot that demands immediate attention.

The survey also observed that most restore activity involves single-file downloads, reflecting routine operational needs rather than large-scale recovery events. While granular recovery is vital for day-to-day operations, it does not prepare an organization for systemic failures. Many incidents are granular, making it faster and more practical to retrieve a specific file, but this focus can inadvertently foster a false sense of security regarding readiness for broader catastrophes. The report’s authors noted that backup creates value when organizations can recover confidently, correctly, and efficiently, whether the need is small and immediate or broad and time-critical. Restore activity was also found to be robust among larger organizations, suggesting that smaller entities might face even greater challenges in comprehensive recovery.

Real-World Stress Tests: Missed Opportunities for Learning

To gauge whether external, high-profile events influenced restoration behavior, Keepit investigated user activity following three significant incidents: the solar flares in May 2024, the CrowdStrike incident in July 2024, and the Microsoft outages in October 2025. These events represented diverse threats—natural phenomena, supply chain cyberattacks, and major platform outages—all of which could have caused data loss or unavailability and should have prompted organizations to verify their backups and recovery processes.

The results were profoundly worrying: none of these events prompted any discernible change in user behavior. There was no sign of increased activity to confirm that backups were working in the days and weeks following these incidents. This lack of proactive validation after real-world stressors indicates a dangerous passivity within enterprises. Two theories were proposed in the report regarding this behavior. First, organizations might not have experienced widespread, immediate restoration needs as a direct result of these specific events, leading to a perception that no action was necessary. Second, and perhaps more critically, the results suggest that "awareness moments"—even those that highlight the fragility of digital infrastructure—do not automatically translate into changes in recovery routines. This indicates a systemic failure to leverage external crises as learning opportunities to strengthen internal resilience. It underscores a reactive rather than proactive mindset, where organizations wait for a direct impact before evaluating their preparedness, a strategy fraught with peril in an age of complex, interconnected threats.

The Call for Proactive Resilience: Keepit’s Recommendations

Survey: Enterprises Say They Are Ready for Agentic AI Failures, but Few Test Recovery Often -- Campus Technology

The solution, according to the report’s authors, lies in shifting from a reactive posture to a proactive and preventative one. They advocate for organizations to "use external events as structured triggers for guided recovery checks – short, repeatable validations that reinforce confidence without requiring large-scale, disruptive exercises." This approach transforms potential threats into opportunities for systematic, low-impact testing, allowing enterprises to continuously validate their recovery capabilities without the cost and disruption of full-scale disaster simulations.

Furthermore, the report suggested implementing "guided recovery" enabled by Model Context Protocol (MCP). MCP, a conceptual framework, opens the door to "asking for help" in the moment that matters. An MCP-enabled assistant could help identify unhealthy tenants or suspicious patterns in protected data and guide administrators through the correct recovery steps, effectively turning recovery into a manageable, repeatable process. Such intelligent assistance could significantly reduce the cognitive load on IT teams during a crisis, minimize human error, and accelerate recovery times. It represents a paradigm shift from manual, document-driven recovery to an intelligent, automated, and context-aware process.

"It all boils down to knowing who is in charge of recovery and which systems are restored first when multiple systems are affected," Larsen reiterated, emphasizing the importance of clear leadership and prioritized recovery sequences. "When decisions are delayed, recovery takes longer than necessary." This highlights the human and organizational factors that are as crucial as the technological ones in ensuring effective disaster recovery.

Broader Implications and the Path Forward

The findings of the Keepit report carry significant implications for enterprise strategy, regulatory compliance, and overall business continuity. In an era where digital operations are synonymous with business operations, the inability to swiftly and effectively recover from disruptions, especially those involving sophisticated agentic AI, poses an existential threat. The financial repercussions of downtime, data loss, and reputational damage can be immense, often dwarfing the investment required for robust DR solutions. Beyond immediate costs, there are long-term impacts on customer trust, market position, and employee morale.

Regulatory bodies globally are also increasing their scrutiny of data resilience and cybersecurity practices. Frameworks like GDPR, CCPA, and emerging AI regulations often mandate robust data protection and recovery capabilities. Organizations that fail to demonstrate adequate preparedness risk not only operational failure but also significant fines and legal liabilities. The partial control over agentic AI usage reported by a third of organizations is particularly concerning in this context, as it suggests potential blind spots for compliance.

Survey: Enterprises Say They Are Ready for Agentic AI Failures, but Few Test Recovery Often -- Campus Technology

The path forward for enterprises must involve a multi-faceted approach. Firstly, a comprehensive inventory of all agentic AI systems and their dependencies is essential. This must be followed by a rigorous risk assessment tailored to AI-specific failure modes. Secondly, disaster recovery plans must be explicitly updated to address these unique AI risks, moving beyond generic data restoration to consider the integrity of AI models, training data, and decision logs. Thirdly, the frequency and scope of DR testing must increase dramatically, particularly for critical identity and authentication systems. These tests should simulate real-world scenarios, including complex AI failures, and involve all relevant stakeholders.

Finally, organizations must foster a culture of resilience, where data governance and accountability are paramount. Leadership commitment is crucial to allocate necessary resources, establish clear ownership for recovery, and continuously refine strategies. Leveraging advanced tools like MCP-enabled assistants can augment human capabilities, but they are only effective when integrated into a well-defined and regularly practiced recovery framework. The insights from Keepit’s report serve as a stark warning and a timely call to action: true readiness for agentic AI failures demands not just confidence, but consistent, proven capability.

The full report is available on the Keepit site (registration required) and provides a detailed roadmap for enterprises seeking to fortify their defenses against the unpredictable landscape of agentic AI.

Leave a Reply

Your email address will not be published. Required fields are marked *