Date of incident: Tuesday, 5 May 2026
Time: 08:02 – 08:43 BST (41 minutes)
Status: Resolved
What happened?
On Tuesday 5 May 2026, patient-facing services were unavailable for 41 minutes. During this window, patients trying to submit patient triage requests, or respond to messages, questionnaires and self-book links saw timeouts and error pages.
What was impacted?
The outage affected our patient-facing services, including:
Patient Triage (online consultations)
Patient Portal
Patient Response (patients replying to messages via links)
Questionnaires
Self-book
What caused it?
The root cause of this incident was a gradual memory leak in the service that powers our patient-facing websites, combined with a memory limit that hadn't been visible to our modern monitoring tools. This section explains how these factors interacted to cause a patient-facing outage.
When patients visit our websites, our servers build each page on their behalf before sending it on. A library update we made in March introduced a subtle change in how those built pages are held in memory after being delivered. Instead of being cleared from memory straight away, each one remained in memory for 5 minutes longer than designed. This meant that our patient-facing services were using more memory than expected.
Separately, our patient-facing services had a hard memory ceiling of 1 GB that had been set several years ago. This ceiling wasn't visible on the modern dashboards we use today to monitor and scale our services in times of high traffic, like after bank holidays.
On Tuesday morning, patient traffic pushed the service past its fixed memory limit, causing it to restart repeatedly. Our infrastructure then detected those restarts as failed health checks and stopped routing traffic to the patient services, which is what caused patients to see error pages and timeouts.
In summary, a slow memory leak was running in the background for some time and it ran into a hidden limit. Many patient requests, using more memory than anticipated, hit the limit quickly and caused the system to stop handling patient traffic.
What we investigated
Early symptoms of this incident pointed in several plausible directions. In the interests of transparency, the main avenues we explored and ruled out during the response were:
Unusual activity from a single patient session. During the incident response, we noticed a very large number of errors coming from one patient triage session, which initially looked like it could be triggering the issue. After investigation, we concluded this was unrelated to the underlying cause and was a symptom of the wider instability, not the source of it. No individual practice or patient was responsible for the outage.
A deliberate attack on the service. A concentrated burst of errors from one source can sometimes resemble malicious traffic. We assessed the pattern and confirmed there was no evidence of a deliberate attack.
Post–bank holiday traffic levels. Patient-facing traffic is typically heavier on the first working day after a bank holiday. We checked whether the traffic alone could have overwhelmed the service, and confirmed it could not. A healthy version of the service would have handled the morning's traffic without an issue, as it had over the winter and after the holiday period. This led us to investigate issues with memory usage in more depth.
A recent change. Rolling back the most recent change is often the fastest way to resolve an incident, and we carried this out early on. The rollback didn't stablilise patient-facing services, which told us the cause of the incident was likely to be an older underlying issue.
What we did to fix it
We took action in three stages:
Restore service (5 May, 08:43). Engineers manually adjusted the health checks on patient-facing service so that traffic could begin flowing again. Error rates dropped immediately, and the service stabilised. Our status page was updated to operational once we had confirmed sustained recovery.
Stabilise capacity (5 May). We deployed a change to double the memory available to patient-facing services and increased its infrastructure memory limits. This removed the immediate risk of recurrence.
Address the root cause (13 May). Once we had identified the root cause of the incident, we released a permanent fix that clears the system memory more quickly. Since this change was implemented, our monitoring systems have shown that baseline memory usage has dropped from around 260 MB to around 165 MB, indicating a successful fix.
In addition to this, we made sure to keep Accurx users informed and up to date. During this incident, we published live updates on our status page and shared the Patient Triage: What to do in an outage guidance.
This was followed by direct communications to all affected GP users once the service was stable, and provided copies to ICBs and NHS England.
What's next?
We've already begun a programme of work to make our patient-facing services more resilient, to reduce the likelihood of issues like this from taking place. This includes:
Assessing the system used to support our current patient-facing service architecture.
Improving monitoring and alerting so that pod restarts, memory pressure, and gateway health issues page our on-call engineers.
Adding rate limiting on patient-facing endpoints to protect the service from unusual traffic spikes.
Standardising our error pages so that, if a future issue does occur, patients see a clear message rather than a raw gateway error.
If you have any remaining questions or concerns, please reach out to our support team. We sincerely apologise for the disruption and thank you for your patience as we work to make our systems even more resilient.
Thanks,
The Accurx team