Summary of the Final Report
On June 11, 2026, DENIC published its final assessment of the May 5, 2026 DNS outage. It explicitly builds on the first analysis from May 8, 2026 and confirms its statements in full — while adding the complete technical mechanism, the explicitly ruled-out causes, and a concrete set of remediation measures.
In short: a software bug in an in-house DENIC tool — the so-called rollover agent — caused a routine DNSSEC key rollover to generate three key pairs instead of one, all with identical metadata including the same key tag. Regular operation was fully restored overnight on May 6; DENIC puts the total duration of the disruption at roughly three hours.
Background: DENIC’s Signing System
The DNSSEC signing process for the .de zone combines standard software (Knot) with in-house tooling connected to Hardware Security Modules (HSMs). In April 2026 — just weeks before the incident — DENIC had brought online the third generation of this system since DNSSEC was introduced for .de in 2011. Per the final report, the new generation was tested in advance and externally audited.
The production signing system comprises multiple HSMs, split across two geographically and network-separated data centers — a standard redundancy measure. This multi-HSM architecture is the key to understanding the bug described in the next section.
Root Cause: One Key Pair per HSM Instead of One Shared Pair
The rollover agent is the in-house tool that generates new key material during a planned key rollover and pushes it to all connected HSMs. Here is how it was supposed to work:
1 key pair is generated.
The same key pair is pushed to all connected HSMs.
Result: every HSM can sign with the same key — all signatures are interchangeably valid.
A separate key pair is generated per connected HSM.
Each key pair is pushed only to the one HSM it was generated for.
Result: 3 distinct key pairs with identical metadata (key tag 33834) — only one matches the published DNSKEY.
All three key pairs generated this way shared the same identifiers, including key tag 33834. DENIC explicitly clarifies: “This was not a classic key-tag collision, but three substantively different key pairs with identical metadata.” A key-tag collision would have been a random clash between two otherwise independent keys — this was a systematic generation bug.
Follow-On Bug: Why Only About a Third of Signatures Were Valid
The rollover agent’s faulty logic then wrote one of the three generated ZSKs (Zone Signing Keys) with key tag 33834 into the zone — i.e., into the published DNSKEY record. The problem: only one of the three HSMs held the private key matching that exact published DNSKEY.
In practice, that meant: signing requests that happened to be served by that one HSM produced valid RRSIGs. Signing requests served by either of the other two HSMs produced RRSIGs that were formally present but cryptographically mismatched to the published key — and therefore recognizable to validating resolvers as malformed.
Since load was roughly evenly distributed across the three HSMs, DENIC states that only about a third of all RRSIGs in the zone were validatable. Additionally, the SOA record must be regenerated and re-signed on every zone change due to its serial number — it was therefore intermittently valid and invalid throughout the incident, depending on which HSM served each signing run.
Why the Bug Wasn’t Caught Before Go-Live
The faulty code was introduced into the in-house tool as part of a set of improvements, without the existing test scenarios covering this failure case. The reason is structural, rooted in the test environment itself:
DENIC’s test environment, per the final report, consists of only a single HSM at a single location. The faulty rollover-agent code only manifests its incorrect behavior when multiple connected HSMs interact — with just one HSM, the difference between “one shared key pair for all HSMs” and “one key pair per HSM” is simply not observable, since there is only one HSM to receive it either way. The defect was therefore caught neither in test runs nor in cold standby operation before go-live.
This pattern is not unusual in software quality assurance: a test environment whose topology doesn’t match the production environment can conceal exactly the class of bugs that only emerge from the interaction of multiple redundant components.
Why the Invalid Zone Was Published Despite Monitoring
The .de zone is continuously updated via the registration system. Due to its size, changes to resource record sets are applied incrementally — individual zone versions don’t exist as complete zone files that could simply be checked as a whole.
DENIC runs three separate, continuously running validation tools designed to detect anomalies such as missing or invalid signatures. Per the final report, these systems “detected the errors as designed” — but the resulting alerts were not processed correctly, so no timely intervention occurred.
Notably, this was not a detection problem but a process gap between alert and response — a distinction that matters for evaluating the announced remediation measures.
What DENIC Explicitly Ruled Out
The comprehensive assessment explicitly examined and ruled out the following plausible alternative causes:
Technical and Practical Impact
The .de zone overwhelmingly serves so-called referral responses (delegation information). Their validity also depends on signed NSEC3 records — particularly when proving the absence of a DS record for an unsigned child zone. These exact NSEC3 signatures were affected by the bug.
Invalid signatures over NSEC3 records caused validating resolvers to flag delegation information as suspicious (“bogus”). As a result, second-level domains that don’t use DNSSEC at all also became unresolvable — consistent with the blackfort-tec.de observation described in our technical deep-dive. Non-validating resolvers served the .de zone without issue throughout.
DENIC puts the total duration of the disruption at roughly three hours. Some large resolver operators temporarily suspended DNSSEC validation for .de domains during the incident, mitigating the impact for their users — DENIC explicitly thanks them for this support in the report.
Remediation: What DENIC Is Changing
Some initial findings — such as improvements to the code review process — have already been implemented, per the report. The incident response process and communication during outages are also being reviewed and adjusted. Five concrete short-term measures are being added:
Extended Alerting
Additional alerts, building on improved visibility into possible errors in the continuously running validation tools, plus expanded relevant metrics.
Accelerated Failover Procedure
An accelerated procedure to provision a valid zone backup faster in an emergency.
Partial Validation Before Delivery
A partial validation step for the zone before it is delivered at all — intended to catch exactly this class of bug before publication.
Suspension of Further ZSK Rollovers
Further Zone Signing Key rollovers are suspended until software development security and test coverage are improved.
External Security and Process Review
An external, final security and process analysis will be conducted in addition to the internal assessment.
What Changes Compared to the Earlier Statement
DENIC’s first statement from May 8/10, 2026 — already quoted in both of our May articles — already mentioned “three key pairs instead of one” and “roughly a third of signatures validatable.” The current final report confirms these figures in full and adds three points that were not previously public:
- →The exact mechanism: one key pair per HSM instead of one shared pair, with identical metadata — explicitly not a key-tag collision.
- →Why testing failed: the single-HSM test environment was structurally unable to cover this failure class.
- →Why monitoring didn’t help: detection worked, alert handling didn’t — plus a concrete five-point remediation plan.
Lessons Learned for Organizations
The final report is instructive well beyond DENIC itself. Three takeaways apply directly to your own infrastructure and software development practice:
Test environments must mirror production topology
Redundant components (multiple HSMs, multiple nodes, multiple data centers) create failure classes that simply aren’t observable in a simplified single-instance test environment. If you run n instances of a critical security component in production, test with n (or at least n>1) instances too.
Detection without reliable alert handling is useless
DENIC’s monitoring worked correctly — the incident still happened because the alert didn’t reach the right person in time. That’s a process-audit question for every SIEM: does every critical alert actually reach someone with the authority to act, in time?
Partial validation before delivery as a general pattern
DENIC’s announced partial validation of the zone before delivery is a pattern that applies to any critical configuration change: an automated plausibility check before rollout, not only monitoring after the fact.
Conclusion: A Commendably Transparent and Instructive Final Report
With this final report, DENIC delivers an unusually detailed root-cause analysis — including the explicit naming of the software bug, the ruled-out alternative causes, and a concrete remediation plan. That builds confidence in how the incident was handled, but it doesn’t change the core conclusion from our May analysis: even a correctly configured domain can become selectively unreachable due to a bug in the upstream DNSSEC trust chain.
For organizations, the consequence remains the same as in May: DNS and DNSSEC monitoring belongs in every security monitoring program, regardless of how carefully the relevant registry operates. This incident also shows that even a registry with external audits and multiple layers of redundancy can fail at a point that no test scenario covered.
Frequently Asked Questions About the DENIC Final Report
What is the actual root cause of the May 5, 2026 DNS outage?
Was this a classic key-tag collision?
Why was only about a third of the signatures accepted as valid?
Why didn’t the test system catch this bug?
Did DENIC not have monitoring that could have prevented this?
Did DENIC rule out an attack or compromise as the cause?
DNS Security · Consulting
From Root-Cause Analysis to Your Own Resilience
Even a registry with external audits and multiple layers of HSM redundancy couldn’t prevent this bug. Our DNS Resilience Assessment evaluates your infrastructure against real-world failure scenarios, with regulatory context for NIS2 and DORA.
