Date of Incident: May 4, 2026
Duration: 12:07 PM – 1:04 PM (57 Minutes)
Impact: Total interruption of user login capabilities.
Status: Resolved
Between 12:07 PM and 1:04 PM today, Edlink experienced a severe API degradation that prevented substantially all users from logging into client products via SSO. The incident was caused by database connection starvation, which was triggered by a service deployment that utilized an older pinned version of the PostgreSQL (PG) library. The issue was mitigated by bringing down the problematic service to free up connections, followed by a full revert of the change. Normal operations have been fully restored.
The incident was traced back to a deployment that occurred today at 12:07pm CT where a part of our service was upgraded.
Background on Dependency Version Pinning:
To provide context on why an older version was in use, we recently implemented a strict policy for managing our software dependencies. This proactive security measure was enacted following an industry-wide supply chain attack incident in March of 2026. Edlink was not affected by this incident directly, but we determined that the attack vector was not one that we were comfortable with, and as such, we moved to proactively update our systems.
To safeguard Edlink's platform and ensure our systems remain secure, we instituted a policy to "pin" or lock all our software dependencies to explicitly verified, “safe” versions. This prevents our systems from automatically downloading new, potentially unverified updates. This is the attack vector that was used in the industry-wide attack a few weeks back.
While this policy is crucial for preventing malicious, unverified updates from automatically infiltrating our systems, in this specific instance, it had an unintended side effect. In this situation, it forced our deployment to utilize an older version of the PG library than we had intended.
This older version of the library managed Postgres connections highly inefficiently and would leak unused connections. Upon deployment, the service rapidly consumed the available database connection pool. Because the connections were not being properly released or managed, other critical infrastructure (notably the service handling user logins) was left without available database connections, leading to the system-wide login failures.
The immediate bleeding was stopped by intentionally spinning down the problematic service, which allowed the authentication services to recover. The underlying code change was then completely reverted in version control and redeployed to ensure the service could be brought back online safely without causing a secondary outage.
To ensure we can more quickly identify and resolve similar issues in the future, we have integrated additional monitoring tools focused on detecting failed logins. This will allow us to immediately detect authentication issues and mitigate them before they cause a prolonged system-wide impact.
Additionally, we have conducted a review of any other dependency packages that may have been inadvertently downgraded during this “pinning” process.