Skip to main content

INC-2077: Accurx Inbox temporarily unavailable on 20 April 2026

Written by Cam
Updated today

Date of incident: Monday, 20 April 2026

Time: 11:49 – 12:45 BST (56 minutes)

Status: Resolved

1. What happened?

On Monday 20 April 2026, Accurx experienced a temporary service disruption that prevented clinicians from loading the Accurx Inbox. Users saw "502 Bad Gateway" error messages, in which some refreshes of Accurx loaded normally, while others failed. The incident began at 11:49 BST and services returned to normal by 12:45 BST.

2. What was impacted?

The Accurx Inbox failed to load for clinicians during the impact window. This meant new patient messages, questionnaire responses, and triage requests could not be reliably viewed or actioned by clinical staff while the outage was ongoing.

What was NOT affected:

  • Outgoing clinician-to-patient messages continued to be sent successfully.

  • Patient-facing services (such as online consultations, response links, and questionnaires) remained available, and patient submissions were delivered to organisations as normal.

  • Batch and scheduled messages continued to run in the background.

  • Patient data — no data was lost, and no messages or submissions were dropped.

Once service was restored, all patient submissions made during the outage were available in the Inbox for review. Clinicians did not need to retake any actions.

3. What caused it?

The disruption was caused by a routine configuration change to our monitoring infrastructure. The change was intended to be a small, low-risk adjustment to how we collect system metrics, but it triggered an unexpected reload of a key component. That component was running with less spare memory than expected, and the reload caused it to restart ungracefully - dropping all active connections with Accurx at once.

When many users' sessions reconnected simultaneously, the resulting surge added further pressure to our databases, which slowed our system recovery.

4. What we did to fix it

Immediate fix:

  • Responders identified the infrastructure change as the likely trigger within minutes and began rolling it back,

  • Server-side rate limiting paced reconnections so the database could recover without being overwhelmed,

  • Services returned to normal by 12:45 BST

  • Later the same day, engineers permanently removed the memory constraint that caused the original failure and safely re-applied the intended monitoring change.

Keeping users informed:

  • The Accurx Status Page was updated throughout the incident.

  • NHS England Service Bridge was notified (reference INC0387747).

5. What's next?

We've already completed, or are carrying out, the following preventative actions so this class of issue doesn't recur:

  • We’ve removed the memory constraint on the infrastructure component that failed, and raised its resource allocation to a safer level.

  • We’ve added alerting for sustained high memory usage on the same class of components, so we catch high memory usage before it threatens system stability.

  • Infrastructure rollouts of this type will be blocked during NHS core operating hours — changes of this kind will now only be made during low-traffic windows.

  • We’ve improved reconnection handling in our web client so that a brief disconnect no longer triggers a surge of expensive backend requests, and added rate-limiting on those endpoints as a second line of defence.

  • We’re working to improve our in-product error messaging for issues of this type.

If you have any remaining questions or concerns, please reach out to our support team. We sincerely apologise for the disruption and thank you for your patience as we work to make our systems even more resilient.

Thanks,

The Accurx team

Did this answer your question?