[Online Testing] Online Testing Rosters not saving
Incident Report for Illuminate Education
Postmortem

During the morning of Monday, October 14, one of the auxiliary services responsible for processing background tasks became flooded with requests creating a backlog and preventing some tasks from being performed in the usual, near-instantaneous manner. One of the higher priority tasks impacted by this event was the saving of Online Testing rosters. Due to the importance of these tasks, they are prioritized above other requests. However, the backlog was so large from the flood of requests that even high priority tasks such as saving Online Testing rosters were delayed.

The engineering team took steps necessary to clear the backlog and moved the lower priority tasks off the queue thereby allowing the auxiliary service to resume processing requests. The team is also implementing adjustments to the priority settings for tasks to ensure the auxiliary service will always be able to process higher priority tasks regardless of the number of lower priority tasks in the queue.

The surge in requests that generated the backlog was an anomalous event. The root cause for the flood of requests has been identified and a separate fix has been implemented to prevent such a situation from happening in the future.

Posted Oct 17, 2019 - 08:56 PDT

Resolved
This issue has been resolved. Performance has returned to normal.
Posted Oct 14, 2019 - 11:47 PDT
Monitoring
The team is observing a steady trend toward stable application performance. Engineers have taken additional steps to further expedite the processing of Online Testing rosters.
Posted Oct 14, 2019 - 09:26 PDT
Identified
This morning, at 5:38am PST, incidents were reported of Online Testing rosters not saving. Our engineers have identified and resolved the source of the problem which caused user jobs to be delayed. It was discovered that other areas of the system were impacted. We expect to see the backlog of pending jobs completed by 8:30am PST.
Posted Oct 14, 2019 - 06:59 PDT
This incident affected: Illuminate Application (Online Testing).