Partial indexing failures
Incident Report for Vectara Inc
Resolved
On of the indexing servers OOMed, due to which indexing requests going to that server failed. The OOM resulted in the server to be in a limbo state, which the liveness prober did not detect as FAILURE and hence, did not restart the server in time. This resulted in a long partial outage.

Start: June 25 2024: 7:40 am UTC
End: June 25 2024: 9:28 am UTC

While the issue has been fixed, and a long term solution for improving the liveness prober is in progress.
Posted Jun 25, 2024 - 07:00 UTC