Update - We continue to work on upgrading additional Kubernetes services and will provide our next update around 11am CST.
May 17, 2023 - 06:23 UTC
Monitoring - Please note that this message was composed by a member of non-technical team (Amanda), and therefore may be technically inaccurate for the time being.

These messages are not reviewed by the technical team until after the incident is resolved, as that is their first priority in the meantime.

A fix was found and we're monitoring the situation.

We upgraded a number of services relating to NGINIX ingress and NGINIX. This allowed us to start serving traffic again.

We are still working on upgrading a handful of other services but we believe that we will remain online throughout the upgrades from this point forward.

Assuming that we do not experience any further downtime we will provide our next update once we finish the upgrades, with a postmortem to follow.

We're grateful to our team and our clients for their patience while we work diligently through the night to ensure the issue is resolved and complete the remaining upgrades. We apologize for any inconvenience this may have caused.

May 17, 2023 - 04:30 UTC
Identified - Please note that this message was composed by a member of non-technical team (Amanda), and therefore may be technically inaccurate for the time being.

These messages are not reviewed by the technical team until after the incident is resolved, as that is their first priority in the meantime.

We're quite certain at this point that the issue is Kubernetes.

Tonight at 7:12pm CST, Google updated their status page saying that the Legacy Image API was returning high number of errors that the US region we use, Iowa, was affected. No workaround was provided.

Google continued to provide updates throughout the incident, which you can read here:
https://status.cloud.google.com/incidents/LTRFobVHV4eSL5vfgasv#RP1d9aZLNFZEJmTBk8e1

As of 8:10pm CST Google still did not know the root cause of the issue and as of 9pm CST Google still had no ETA for mitigation of the issue.

At 9:47pm Google updated their status page to say that the issue had been resolved however, when reviewing our Kubernetes dashboard, we found a note stating that our nodes in the node pool were being updated. It continued:

"For node version upgrades, it typically takes 4-5 minutes to upgrade a single node or longer (e.g., due to pod disruption budget or grace period). For updates to node metadata like labels, taints and tags, it takes less than a minute per node and it does not recreate the nodes or cause any disruption to running workloads."

The estimated time remaining for the update as of 11:10pm CST was 79 minutes.

Further, we've identified that the specific part of our Kubernetes configuration which is not operating properly is NGINIX.

We'll continue to troubleshoot and test until the issue is resolved.

We're grateful to our team and our clients for their patience while we work diligently through the night to resolve the issue. We apologize for any inconvenience this may be causing.

I'll continue to share relevant information as we uncover it.

May 17, 2023 - 04:23 UTC
Investigating - Please note that this message was composed by a member of non-technical team (Amanda), and therefore may be technically inaccurate for the time being.

These messages are not reviewed by the technical team after the incident is resolved, as that is their first priority in the meantime.

At approximately 9:23pm CST the Edlink website, Dashboard and API stopped responding. We were not performing updates at the time and have no reason to believe is is an Edlink-created bug.

The engineers on call are currently investigating the issue but our earliest lead is that Google is performing unannounced maintenance to our Kubernetes clusters.

I'll continue to share relevant information as we uncover it.

May 17, 2023 - 02:23 UTC
Edlink Core Systems Operational
90 days ago
99.91 % uptime
Today
Edlink API ? Operational
90 days ago
99.91 % uptime
Today
Edlink Dashboard ? Operational
90 days ago
99.91 % uptime
Today
Data Sources Operational
90 days ago
100.0 % uptime
Today
Canvas ? Operational
90 days ago
100.0 % uptime
Today
Schoology Operational
90 days ago
100.0 % uptime
Today
Blackboard Operational
90 days ago
100.0 % uptime
Today
Brightspace Operational
90 days ago
100.0 % uptime
Today
Classlink Operational
90 days ago
100.0 % uptime
Today
Clever Operational
90 days ago
100.0 % uptime
Today
Google Classroom Operational
90 days ago
100.0 % uptime
Today
Microsoft Teams Operational
90 days ago
100.0 % uptime
Today
Google Cloud Platform Google Kubernetes Engine Operational
Google Cloud Platform Google Cloud Storage Operational
Google Cloud Platform Google Cloud SQL Operational
Google Cloud Platform Cloud Key Management Service Operational
GitHub Actions Operational
GitHub Actions Operational
GitHub Copilot Operational
GitHub Pull Requests Operational
GitHub Pages Operational
GitHub Issues Operational
GitHub Git Operations Operational
ClassLink Partner Portal Operational
ClassLink Roster Server Operational
Clever Apps Dashboard Operational
Clever Data API Operational
Clever Events API Operational
Clever Single Sign-On Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Past Incidents
May 31, 2023

No incidents reported today.

May 30, 2023

No incidents reported.

May 29, 2023

No incidents reported.

May 28, 2023

No incidents reported.

May 27, 2023

No incidents reported.

May 26, 2023

No incidents reported.

May 25, 2023

No incidents reported.

May 24, 2023

No incidents reported.

May 23, 2023

No incidents reported.

May 22, 2023

No incidents reported.

May 21, 2023

No incidents reported.

May 20, 2023

No incidents reported.

May 19, 2023

No incidents reported.

May 18, 2023

No incidents reported.

May 17, 2023

Unresolved incident: Kubernetes Upgrade.