Meta Incident Management

Tasks
UX, UI, Research, Product strategy, Process design

Team
1 Designer, 4 Engineers

Time
1 Year, 8 months

I lead design and product initiatives building tools and introduced systems that helped software engineers record incidents, and efficiently collaborate to quickly mitigate system failures.

Created designs and processes to successfully decreased the average time to report an incident from 8 minutes to 3 minutes, with improved data capture quality. This meant Incident alerts were sent out quicker, leading to better mitigation.

Context

A SEV is a ticket created when a product incident occurs. It documents the issue, alerts relevant teams, and helps coordinate mitigation. After resolution, a review identifies the root cause, extracts learnings, and assigns follow-up tasks to prevent recurrence.

The SEV review process faces significant inefficiencies, with 72% of SEVs remaining unreviewed after 30 days, leading to missed learnings and recurring incidents.

This issue stems from a fragmented and non-intuitive system that was developed without design input. Users must navigate multiple tools to complete a review, creating unnecessary complexity. As the company has grown, these inefficiencies have scaled, making it increasingly difficult for multiple large teams to review SEVs effectively.

Given the company's reliance on SEVs, addressing these challenges became a critical priority.

Goals

The objective was to improve incident reviews and reduce recurrence, with a target of decreasing the percentage of SEVs unreviewed after 30 days from 72% to 30%.

Develop a seamless end-to-end review tool that allows users to easily find SEVs needing review, assign them to the right meeting, and schedule regular review sessions efficiently.

This would be achieved by:

  • Streamlining workflows into a single tool, SEV Review Series

  • Automating processes and pre-populating data based on Series configurations

  • Integrating with existing tools (email, calendar, team chat) for seamless communication

Our startegy

Standardise data

Improve and standardise data capture to reduce SEV creation TIME, and to allow for automation. 

Automate processes

Use data models to automate admin tasks across multiple teams to prepare for review meetings

Proactive reporting

Foster a proactive culture of reporting incidences early without Increasing workload associated with SEVs.