Denticon Performance Degraded
Incident Report for Planet DDS
Postmortem

Denticon reported slowness around 6:00 AM PST on Tuesday, July 14, 2020.

During the initial investigation we started to eliminate some of the potential causes. One of these was an update to fix problems with the new word processor released the night before. Even with the potential of this update causing issue being low, we removed this update.

After further investigation, we narrowed the source of the slowness down to our Load Balancer. Manufactured by Kemp. A load balancer directs our web traffic to the 20 Denticon web servers , depending on which server has the lightest load. Its job is to make sure that each user has the best possible user experience by optimizing the use of all our servers at their peak performance.

We installed a firmware update for our load balancer on Saturday July 11, 2020. All appeared to be functioning correctly on Sunday and Monday until this morning when things got slow.

We determined that the caching and compression in the load balancer was not functioning properly. Working with Kemp engineers we continued to trouble shoot and look for potential issues. After a few troubleshooting actions, we determined, turning caching off dramatically improved performance. This is counter intuitive, but with the need to stabilize the system being critical, we decided to test this configuration. With continued monitoring we observed that with more and more users, the system stabilized and remained so over an extended period of time. We decided to temporarily leave it this way for the remainder of the day.

After more discussions with the Kemp engineers, our plan of action is to roll back the changes made to the Kemp firmware and put it back to its stable state, prior to the Saturday update. Our plan is to start and complete this rollback process tonight, Tuesday, July 14, after 09:00 PM PST. Once the rollback is complete, we will test and monitor the system to make sure that it is performing at its expected high level. Late tonight and early tomorrow morning, we will continue to monitor and be ready to take any action necessary if we see any issues.

In parallel, we will continue to work with the Kemp engineers and review logs and test out the new firmware to understand the core issue. We will only upgrade the software when we are 100% comfortable with the root cause and fixes for those causes have been addressed by Kemp.

Posted Jul 14, 2020 - 21:14 PDT

Resolved
This incident has been resolved. We will update the postmortem soon.
Posted Jul 14, 2020 - 10:24 PDT
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Jul 14, 2020 - 10:23 PDT
Identified
The issue has been identified and a fix is being implemented.
Posted Jul 14, 2020 - 10:23 PDT
Update
We are continuing to investigate this issue.
Posted Jul 14, 2020 - 06:38 PDT
Investigating
We are experiencing unusually heavy loads which is causing some performance degradation in Denticon. We are currently investigating the root cause. We hope to have this resolved as quickly as possible.
Posted Jul 14, 2020 - 06:37 PDT
This incident affected: Denticon Group (Denticon).