Site Reliability Engineer

Assignment description

We are looking for a Senior Site Reliability Engineer.

As a Senior Site Reliability Engineer, you will join a development team passionate about delivering the best-operated digital experience. You will play a crucial role in defining and implementing the strategies, tools, and culture that will enable us to achieve operational excellence in Azure-based workloads.

Key responsibilities:

Lead the design, implementation, and evolution of SRE practices tailored for Azure workload.
Collaborate with the development teams to ensure reliable, scalable, and efficient systems, while embedding SRE principles into the development lifecycle.
Together with your team, own the operational health and performance of workloads in Azure.
Define and implement strategies for monitoring, incident management, and post-incident reviews.
Automate operational tasks and processes, using software/scripting in languages such as C#, Python or Node.js (or any suitable)
Mentor and guide the team on best practices regarding SRE, including reliability, observability, and cloud operations.
Participate and lead incident responses.
Establish and maintain a desired state operational model, collaborating with stakeholders and platform team aligned with goals and outcomes.

Qualifications:

You have substantial and relevant work experience in the information technology field.
You have proven experience with Site Reliability Engineering, where you’ve been part of a reliability team, providing modern and state-of-the art ops together with development teams.
You have deep knowledge of Azure cloud, its components and how to monitor, operate and troubleshoot workloads, such as Azure functions, Azure container apps, Azure app services, SQL Azure, Cosmos DB among others.
You have a good understanding of how databases work, including SQL and NoSQL. You’ve dealt with backups and restoring operations when things have gone bad.
You have proficiency in software development, to a degree that you can automate repetitive manual tasks, using C#, Node, Python or any suitable language.
You have practical experience in Azure DevOps, and you know your git branching and code review processes.
You have a good understanding of networking, especially virtual networks in Azure.
You have practical knowledge of incident and problem management, including experience leading incident command.

Meritorious:

Azure certification(s)
Understanding of distributed systems, and how microservices operate
Kubernetes (AKS, and OpenShift) – we still have some on-prem stuff as well. This won’t be your focus.
Familiarity with ITSM tools like ServiceNow

We believe you are:

You’re a team-player, who has worked in a large organization, and been able to coordinate positive change in previous roles.
You have strong communication skills in both Swedish and English, enabling you to build trust and alignment with stakeholders.
You are curious and visionary, in the sense you will be able to set a path on what good looks like, and how an SRE team should operate, together with the rest of the development department.
You will be able to set a desired state picture and be part of reaching these goals.
You are a problem-solver who thrives on challenges and can navigate complex systems with a calm and methodical approach.

Detaljer

Referens: 106096

Ort: Stockholm

Omfattning:100%

Startdatum:2025-04-28

Slutdatum:2025-12-31

Konsultförmedlare

Christoffer Svensson

christoffer.svensson@upgraded.se
070 856 86 90

Det går inte längre att söka den här tjänsten.