Job Description
Responsibilities
-
- Support the efforts of keeping one of the fastest growing companies in the world up and available in a 24/7/365 environment
- Provide on-call coverage in the capacity of an Incident Manager (this will be shift based and include some weekend working)
- Work as part of a high performing team and actively contribute to a culture of support, collaboration, knowledge-sharing and learning
- End to end responsibility for providing rapid response to incidents, triggering escalation procedures, gathering relevant stakeholders, regular high quality and timely communication of updates, excellent record keeping throughout until resolution
- Monitor system dashboards for health, up-time, and availability and work closely with the Client Engagement Team to identify issues early on
- Work with stakeholders to identify the root cause of incidents, assist in postmortem activities and agree follow up actions
- Routinely review incident response playbooks, maintain escalation flow schedules, and participate in table top exercises
- Work closely with the sister function, Technical Project Management, and other stakeholders to hand off items needing remediation and identify long-term improvement strategies
- Identify and drive forward key process improvement opportunities
Requirements
-
- Proven problem-solving experience and root cause analysis utilizing 5 Whys techniques or similar approaches
- Recent experience as an Incident Responder, Incident Manager or similar role
- Highly responsive and extremely organized with the ability to direct the flow of a highly available technical environment that operates 24/7/365
- Good interpersonal skills – ability to liaise with personnel at all levels and adapt style accordingly
- Experience leading bridge calls with a large number of technical participants
- Strong understanding of the software development life-cycle including the importance of testing and rollback planning practices
- Attention to detail orientated
- A fast learner with an inherent ability to understand complex technology solutions and communicate the impact of incidents in both IT and business terms
- Previous experience using Atlassian Tools (Confluence/Jira) would be an advantage
To apply for this job please visit jobs.lever.co.