live site engineering
-
Having effective runbooks for all services is crucial for managing outages. They should include checklists, be regularly tested, and integrated into new projects. When outages occur, ask for help, communicate with stakeholders, identify changes, and roll back if necessary. Stay calm and follow deployment procedures. Test backup plans regularly.
