Assignment Description

We are seeking a Senior DevOps Engineer to join our client’s AI platform team at IGNITE. The ideal candidate will possess excellent experience working with Kubernetes and KubeFlow, having designed and deployed large-scale production infrastructure and platforms for various scientific use cases (expertise in other industries will also be valued). This position entails leveraging these skills to tackle some of the most exciting machine learning challenges in drug discovery.

The successful candidate will become a vital part of a new, collaborative team comprising multidisciplinary engineers. Together, they will have the opportunity to develop tools that advance healthcare standards, ultimately improving the lives of millions of patients worldwide. Their data science environments will support significant AI initiatives such as clinical trial data analysis, knowledge graphs, patient safety systems, deep learning-based drug discovery, and software as a medical device for various therapy areas. Additionally, they will play a crucial role in providing frameworks for data scientists to develop scalable machine learning and predictive models within a growing data science community, ensuring safety and robustness.

As a proficient software developer with a penchant for building complex systems, you will be tasked with pioneering the utilization of technology, machine learning, and data to enhance client productivity. Your responsibilities will include designing, building, deploying, and evolving the next generation of data engines and tools at scale, bridging the gap between science and engineering with deep expertise in both domains.

This role offers the opportunity to delve into cutting-edge technologies surrounding Machine Learning Platforms, pushing the boundaries to test, develop, and implement new ideas, technologies, and opportunities.

Key Accountabilities:

  • Collaborate closely with data science teams to design, deploy, and manage Kubernetes platforms for Machine Learning.
  • Provide the necessary infrastructure and platforms to support the deployment and monitoring of ML solutions in production, optimizing for performance and scalability.
  • Deploy systems, applications, and tooling for data science on AWS cloud environments.
  • Collaborate with BTG data scientists to understand their challenges and assist in productionizing ML pipelines, models, and algorithms for innovative science.
  • Take responsibility for all aspects of software engineering, from design to implementation, QA, and maintenance with support from ML experts.

Requirements:

  • 8+ years’ experience or equivalent in architecting and managing large Kubernetes clusters.
  • Experience in managing service mesh, such as Istio.
  • Familiarity with Kubernetes ML platforms and toolkits (e.g., Kubeflow).
  • Certified Kubernetes Administrator/Developer.
  • Experience with scheduling strategies on clusters with different node types.
  • Modern DevOps mindset, utilizing best-of-breed DevOps toolchains such as Docker, Git, and Jenkins.
  • Proficiency in infrastructure as code technologies such as Ansible, Terraform, and CloudFormation.
  • Experience in managing and automating real-world platforms/applications on AWS.
  • Strong software coding skills, particularly in Python, although exceptional ability in any language will be recognized.
  • Familiarity with system monitoring tools such as Grafana, Prometheus, Thanos, etc.
  • Experience with Continuous Integration and the building of continuous delivery pipelines (e.g., Helm, ArgoCD).

Other Desirable Skills:

  • Experience with open-source and cloud-native Machine Learning Platforms and Toolkits.
  • Demonstrable knowledge of building MLOps environments to a production standard.
  • Understanding of Kubernetes internal networking and its impact on multi-node GPU ML training performance.
  • Experience with declarative management of Kubernetes objects using tools like kustomize.
  • Experience with multi-cloud environments (AWS/Azure/GCP).
  • Familiarity with data storage technologies, including RDBMS and NoSQL.
  • Experience in mentoring, coaching, and supporting less experienced colleagues and clients.
  • Familiarity with SAFe agile principles and practices.
Detaljer

Referens:39391

Ort: Göteborg

Omfattning:100%

Startdatum:2024-03-18

Slutdatum:2025-03-18

Konsultförmedlare

Det går inte längre att söka den här tjänsten.