Blackfort Technology
DNS Security · Incident AnalysisJune 11, 2026·Christian Gebhardt·~9 min read

DENIC Final Report: The Rollover Agent Bug Behind the May 5, 2026 DNS Outage

More than a month after the incident, DENIC published its final root-cause analysis on June 11, 2026. It confirms the earlier statements and, for the first time, discloses the full technical mechanism: a software bug that generated a separate key pair per Hardware Security Module instead of one shared pair — and why neither testing nor monitoring caught it in time.

Follow Blackfort on LinkedIn

Security incidents, technical analyses, and field insights — straight into your LinkedIn feed.

Follow now →
Symbolic depiction of Hardware Security Modules with key signatures in a data center
This article builds on our early-May 2026 coverage. For the original symptom-level findings (SERVFAIL, NSEC3, affected resolvers), see our technical deep-dive or the plain-English explainer. This article covers exclusively the now-published final report.

Summary of the Final Report

On June 11, 2026, DENIC published its final assessment of the May 5, 2026 DNS outage. It explicitly builds on the first analysis from May 8, 2026 and confirms its statements in full — while adding the complete technical mechanism, the explicitly ruled-out causes, and a concrete set of remediation measures.

In short: a software bug in an in-house DENIC tool — the so-called rollover agent — caused a routine DNSSEC key rollover to generate three key pairs instead of one, all with identical metadata including the same key tag. Regular operation was fully restored overnight on May 6; DENIC puts the total duration of the disruption at roughly three hours.

Background: DENIC’s Signing System

The DNSSEC signing process for the .de zone combines standard software (Knot) with in-house tooling connected to Hardware Security Modules (HSMs). In April 2026 — just weeks before the incident — DENIC had brought online the third generation of this system since DNSSEC was introduced for .de in 2011. Per the final report, the new generation was tested in advance and externally audited.

The production signing system comprises multiple HSMs, split across two geographically and network-separated data centers — a standard redundancy measure. This multi-HSM architecture is the key to understanding the bug described in the next section.

Root Cause: One Key Pair per HSM Instead of One Shared Pair

The rollover agent is the in-house tool that generates new key material during a planned key rollover and pushes it to all connected HSMs. Here is how it was supposed to work:

Intended Behavior

1 key pair is generated.

The same key pair is pushed to all connected HSMs.

Result: every HSM can sign with the same key — all signatures are interchangeably valid.

Actual Behavior

A separate key pair is generated per connected HSM.

Each key pair is pushed only to the one HSM it was generated for.

Result: 3 distinct key pairs with identical metadata (key tag 33834) — only one matches the published DNSKEY.

Root Cause per DENIC’s Final Report

All three key pairs generated this way shared the same identifiers, including key tag 33834. DENIC explicitly clarifies: “This was not a classic key-tag collision, but three substantively different key pairs with identical metadata.” A key-tag collision would have been a random clash between two otherwise independent keys — this was a systematic generation bug.

Follow-On Bug: Why Only About a Third of Signatures Were Valid

The rollover agent’s faulty logic then wrote one of the three generated ZSKs (Zone Signing Keys) with key tag 33834 into the zone — i.e., into the published DNSKEY record. The problem: only one of the three HSMs held the private key matching that exact published DNSKEY.

In practice, that meant: signing requests that happened to be served by that one HSM produced valid RRSIGs. Signing requests served by either of the other two HSMs produced RRSIGs that were formally present but cryptographically mismatched to the published key — and therefore recognizable to validating resolvers as malformed.

Concrete Impact

Since load was roughly evenly distributed across the three HSMs, DENIC states that only about a third of all RRSIGs in the zone were validatable. Additionally, the SOA record must be regenerated and re-signed on every zone change due to its serial number — it was therefore intermittently valid and invalid throughout the incident, depending on which HSM served each signing run.

Why the Bug Wasn’t Caught Before Go-Live

The faulty code was introduced into the in-house tool as part of a set of improvements, without the existing test scenarios covering this failure case. The reason is structural, rooted in the test environment itself:

The Decisive Gap

DENIC’s test environment, per the final report, consists of only a single HSM at a single location. The faulty rollover-agent code only manifests its incorrect behavior when multiple connected HSMs interact — with just one HSM, the difference between “one shared key pair for all HSMs” and “one key pair per HSM” is simply not observable, since there is only one HSM to receive it either way. The defect was therefore caught neither in test runs nor in cold standby operation before go-live.

This pattern is not unusual in software quality assurance: a test environment whose topology doesn’t match the production environment can conceal exactly the class of bugs that only emerge from the interaction of multiple redundant components.

Why the Invalid Zone Was Published Despite Monitoring

The .de zone is continuously updated via the registration system. Due to its size, changes to resource record sets are applied incrementally — individual zone versions don’t exist as complete zone files that could simply be checked as a whole.

Detected, but Not Acted On

DENIC runs three separate, continuously running validation tools designed to detect anomalies such as missing or invalid signatures. Per the final report, these systems “detected the errors as designed” — but the resulting alerts were not processed correctly, so no timely intervention occurred.

Notably, this was not a detection problem but a process gap between alert and response — a distinction that matters for evaluating the announced remediation measures.

What DENIC Explicitly Ruled Out

The comprehensive assessment explicitly examined and ruled out the following plausible alternative causes:

No indication of compromise or attacks on the signing system or other DENIC infrastructure
No misbehavior identified in the Knot nameserver in use
No misbehavior identified in the HSMs in use
No classic key-tag collision

Technical and Practical Impact

The .de zone overwhelmingly serves so-called referral responses (delegation information). Their validity also depends on signed NSEC3 records — particularly when proving the absence of a DS record for an unsigned child zone. These exact NSEC3 signatures were affected by the bug.

Domains Without DNSSEC Were Affected Too

Invalid signatures over NSEC3 records caused validating resolvers to flag delegation information as suspicious (“bogus”). As a result, second-level domains that don’t use DNSSEC at all also became unresolvable — consistent with the blackfort-tec.de observation described in our technical deep-dive. Non-validating resolvers served the .de zone without issue throughout.

DENIC puts the total duration of the disruption at roughly three hours. Some large resolver operators temporarily suspended DNSSEC validation for .de domains during the incident, mitigating the impact for their users — DENIC explicitly thanks them for this support in the report.

Remediation: What DENIC Is Changing

Some initial findings — such as improvements to the code review process — have already been implemented, per the report. The incident response process and communication during outages are also being reviewed and adjusted. Five concrete short-term measures are being added:

01

Extended Alerting

Additional alerts, building on improved visibility into possible errors in the continuously running validation tools, plus expanded relevant metrics.

02

Accelerated Failover Procedure

An accelerated procedure to provision a valid zone backup faster in an emergency.

03

Partial Validation Before Delivery

A partial validation step for the zone before it is delivered at all — intended to catch exactly this class of bug before publication.

04

Suspension of Further ZSK Rollovers

Further Zone Signing Key rollovers are suspended until software development security and test coverage are improved.

05

External Security and Process Review

An external, final security and process analysis will be conducted in addition to the internal assessment.

What Changes Compared to the Earlier Statement

DENIC’s first statement from May 8/10, 2026 — already quoted in both of our May articles — already mentioned “three key pairs instead of one” and “roughly a third of signatures validatable.” The current final report confirms these figures in full and adds three points that were not previously public:

  • The exact mechanism: one key pair per HSM instead of one shared pair, with identical metadata — explicitly not a key-tag collision.
  • Why testing failed: the single-HSM test environment was structurally unable to cover this failure class.
  • Why monitoring didn’t help: detection worked, alert handling didn’t — plus a concrete five-point remediation plan.

Lessons Learned for Organizations

The final report is instructive well beyond DENIC itself. Three takeaways apply directly to your own infrastructure and software development practice:

Test environments must mirror production topology

Redundant components (multiple HSMs, multiple nodes, multiple data centers) create failure classes that simply aren’t observable in a simplified single-instance test environment. If you run n instances of a critical security component in production, test with n (or at least n>1) instances too.

Detection without reliable alert handling is useless

DENIC’s monitoring worked correctly — the incident still happened because the alert didn’t reach the right person in time. That’s a process-audit question for every SIEM: does every critical alert actually reach someone with the authority to act, in time?

Partial validation before delivery as a general pattern

DENIC’s announced partial validation of the zone before delivery is a pattern that applies to any critical configuration change: an automated plausibility check before rollout, not only monitoring after the fact.

Conclusion: A Commendably Transparent and Instructive Final Report

With this final report, DENIC delivers an unusually detailed root-cause analysis — including the explicit naming of the software bug, the ruled-out alternative causes, and a concrete remediation plan. That builds confidence in how the incident was handled, but it doesn’t change the core conclusion from our May analysis: even a correctly configured domain can become selectively unreachable due to a bug in the upstream DNSSEC trust chain.

For organizations, the consequence remains the same as in May: DNS and DNSSEC monitoring belongs in every security monitoring program, regardless of how carefully the relevant registry operates. This incident also shows that even a registry with external audits and multiple layers of redundancy can fail at a point that no test scenario covered.

Frequently Asked Questions About the DENIC Final Report

What is the actual root cause of the May 5, 2026 DNS outage?
According to DENIC’s final report, the cause was a software bug in an in-house tool that controls the so-called rollover agent. It was supposed to generate a single key pair and push it to all connected Hardware Security Modules (HSMs). Instead, the bug caused it to generate a separate key pair per connected HSM — all three sharing identical metadata, including the same key tag, 33834.
Was this a classic key-tag collision?
No. DENIC explicitly clarifies in the final report that no classic key-tag collision occurred. This was three substantively different key pairs with identical identifiers — not a random clash between two otherwise independent keys.
Why was only about a third of the signatures accepted as valid?
The faulty logic wrote one of the three generated ZSKs (key tag 33834) into the zone. But only one of the three HSMs held the private key matching that published DNSKEY record, so only RRSIGs generated by that one HSM were cryptographically valid — in practice, roughly a third of all signatures in the zone.
Why didn’t the test system catch this bug?
DENIC’s test environment, per the final report, consists of only a single HSM at a single location. The faulty rollover-agent code only manifests its incorrect behavior when multiple HSMs are connected — with just one HSM, there is no observable difference between "one shared key pair" and "one key pair per HSM." The defect was therefore caught neither in test runs nor in cold standby operation before go-live.
Did DENIC not have monitoring that could have prevented this?
It did — DENIC runs three separate, continuously running validation tools that, per the final report, detected the anomaly as designed. The actual failure was that the resulting alerts were not processed correctly, so no timely intervention occurred. This was a process failure, not a detection failure.
Did DENIC rule out an attack or compromise as the cause?
Yes. The final report explicitly rules out: any indication of compromise or attacks on the signing system or other DENIC infrastructure, misbehavior of the Knot nameserver in use, misbehavior of the HSMs in use, and a classic key-tag collision. The cause was exclusively an internal software bug in the in-house tooling.

DNS Security · Consulting

From Root-Cause Analysis to Your Own Resilience

Even a registry with external audits and multiple layers of HSM redundancy couldn’t prevent this bug. Our DNS Resilience Assessment evaluates your infrastructure against real-world failure scenarios, with regulatory context for NIS2 and DORA.

Kontakt aufnehmen

DNSSEC Monitoring and Security Analysis

Blackfort helps organizations build resilient security monitoring and DNSSEC verification processes — from technical analysis to integration into SIEM, ISMS, and incident response.