Skip to main content

INC-1366: Accurx Outage 04/06/2025

Incident report for INC-1366

Cam avatar
Written by Cam
Updated over a week ago

Date of incident: Tuesday, 4th June 2025

Time: 09:15 - 11:58 (2 hours 43 minutes)

Status: Resolved

What happened?

On 4th June 2025, between 9:15 am and 11:58 am, Accux systems experienced a slowdown due to a sudden spike in activity on one of our core databases. This database supports many essential features in our platform, and the high demand caused some areas of the system to become slow or unresponsive.

What was impacted?

All Accurx users and patients experienced service disruptions across Accurx products over the affected window, with the severity and nature of the failures varying by product.

Patient-facing Services:

Most patients experienced slowness or were unable to access their response links or questionnaires. Any failed actions would have been clear to the patient. Submitted responses appeared in the Accurx Inbox after the issue was resolved.

Self-book:

Patients may have faced delays or access issues when booking appointments. Failed bookings were visible to patients and were retried automatically after the system recovered, with confirmations sent as usual.

Patient Triage & Patient-initiated Follow-up:

Patients could submit requests, but healthcare organisations faced delays accessing them. Submissions became visible in the Accurx Inbox once the system recovered.

Questionnaires & Patient Response:

Patients experienced slowness or inability to complete questionnaires or response links. Responses were delayed but appeared in the inbox after the system recovered.

AccuMail:

Sending and receiving messages between healthcare providers was delayed or temporarily inaccessible. Messages appeared in the Accurx Inbox once the system recovered.

Outbound Messaging:

Some scheduled messages were delayed but automatically retried after the incident. Appointment reminders were either sent as expected or shortly afterwards. Save-to-record actions were also retried once the system recovered.

What caused it?

From our investigation, no single event was identified to have triggered this issue, though a combination of the following factors led to the occurrence of this incident:

  • A spike in system activity following some clients briefly losing their connection and trying to reconnect at the same time.

  • Automated retry behaviour in our system: when the database began to slow down, systems automatically sent more requests in quick succession, which unintentionally worsened the problem.

  • Shared infrastructure load: our databases share computing resources, and we suspect competition for those resources contributed to the issue.

What we did to fix it

To resolve the issue, we:

  • Increased the processing capacity of the affected database.

  • Tuned how the system handles retries to avoid overwhelming the database.

  • Temporarily scaled down background tasks that were using too many resources.

  • Moved another system sharing the same database resources to a separate environment to reduce contention.

We mitigated the issue in 2 hours 43 minutes, and normal services resumed at 11:58 am.

What’s next?

We’re investing in modernising our infrastructure to prevent issues like this in the future. We're working on improving how the system handles slow responses and reducing the risk of this kind of overload happening again by restructuring our databases to support better performance and scalability. In addition, we’re making our platform more modular, with clear boundaries between systems, so that if one part has a problem, it won’t impact everything else. These improvements are a priority, and we’re committed to delivering the reliability you expect from Accurx.


If you have any remaining questions or concerns, please reach out to our support team.

We sincerely apologise for the disruption and thank you for your patience as we work to make our systems even more resilient.

The Accurx Team

Did this answer your question?