All the 9s: Keeping Vault Resilient and Reliable
Description
Learn how site reliability and platform engineering came together at Indeed to scale Vault into a resilient, reliable platform that delivers millions of secrets every day to globally distributed workloads. The talk delves into how Indeed prioritizes observability, sets SLOs and SLAs, builds runbooks, and enables platform engineers to handle common incidents. It covers building a resilient Vault, monitoring and alerting, and achieving operational excellence through iterative reviews. The presentation highlights Indeed's infrastructure, including thousands of microservices across six cloud regions, and their use of Vault Agent within Kubernetes for secret delivery. It also discusses strategies for preventing unintentional denial-of-service attacks, performance testing, immutable infrastructure, automated unsealing with AWS KMS, and advanced monitoring techniques using dashboards and runbooks.