All the 9s: Keeping Vault Resilient and Reliable

October 25, 2023 29 min Free

Description

Learn how site reliability and platform engineering came together at Indeed to scale Vault into a resilient, reliable platform that delivers millions of secrets every day to globally distributed workloads. The talk delves into how Indeed prioritizes observability, sets SLOs and SLAs, builds runbooks, and enables platform engineers to handle common incidents. It covers building a resilient Vault, monitoring and alerting, and achieving operational excellence through iterative reviews. The presentation highlights Indeed's infrastructure, including thousands of microservices across six cloud regions, and their use of Vault Agent within Kubernetes for secret delivery. It also discusses strategies for preventing unintentional denial-of-service attacks, performance testing, immutable infrastructure, automated unsealing with AWS KMS, and advanced monitoring techniques using dashboards and runbooks.