
Meta Incident Management
Tasks
UX, UI, Research, Product strategy, Process design
Team
1 Designer, 4 Engineers
Time
1 Year, 8 months
I lead design and product initiatives building tools and introduced systems that helped software engineers record incidents, and efficiently collaborate to quickly mitigate system failures.
Created designs and processes to successfully decreased the average time to report an incident from 8 minutes to 3 minutes, with improved data capture quality. This meant Incident alerts were sent out quicker, leading to better mitigation.
Context
A SEV is a ticket created when a product incident occurs. It documents the issue, alerts relevant teams, and helps coordinate mitigation. After resolution, a review identifies the root cause, extracts learnings, and assigns follow-up tasks to prevent recurrence.
The SEV review process faces significant inefficiencies, with 72% of SEVs remaining unreviewed after 30 days, leading to missed learnings and recurring incidents.
This issue stems from a fragmented and non-intuitive system that was developed without design input. Users must navigate multiple tools to complete a review, creating unnecessary complexity. As the company has grown, these inefficiencies have scaled, making it increasingly difficult for multiple large teams to review SEVs effectively.
Given the company's reliance on SEVs, addressing these challenges became a critical priority.
Goals
The objective was to improve incident reviews and reduce recurrence, with a target of decreasing the percentage of SEVs unreviewed after 30 days from 72% to 30%.
Develop a seamless end-to-end review tool that allows users to easily find SEVs needing review, assign them to the right meeting, and schedule regular review sessions efficiently.
This would be achieved by:
Streamlining workflows into a single tool, SEV Review Series
Automating processes and pre-populating data based on Series configurations
Integrating with existing tools (email, calendar, team chat) for seamless communication
Our startegy
Standardise data
Improve and standardise data capture to reduce SEV creation TIME, and to allow for automation.
Automate processes
Use data models to automate admin tasks across multiple teams to prepare for review meetings
Proactive reporting
Foster a proactive culture of reporting incidences early without Increasing workload associated with SEVs.