Senior Site Reliability Engineer

The Site Reliability Engineer is a pivotal role in our SaaS strategy. You will work closely with the engineering team to ensure unrivaled observability, availability, and performance of our clients SaaS Products. As a Site Reliability Engineer (SRE), you'll be the driving force of our user-facing services and production systems. We're seeking individuals with pragmatic operational skills and software craftsmanship, applying engineering principles, and operational discipline to elevate our operating environments and codebase to new heights. At the core of your responsibilities, you'll specialize in systems such as operating systems, storage subsystems, observability and networking while implementing best practices for availability, reliability, and scalability. But that's just the beginning of your thrilling journey with us!

Type:

Full-time

Remote

Job ID:

JR-123238

Apply now
Technologies:
Azure
AWS
Terraform
GitHub
Kubernetes
Datadog
Prometheus
JIRA
Betterstack
Locations:
Ukraine
Poland
Moldova
Czech Republic
Montenegro
Albania
Latvia
Lithuania
Georgia

Table of contents

Apply now
Let’s be in touch!

Job (Project) Description

Customertimes is a global digital engineering, product development, and technology consulting company. Headquartered in New York, we have a team of 1300+ experts and offices in 12 countries.  

Requirements:

  • Proficiency in Terraform syntax and GitHub Actions configuration, including pipelines and job management using GitOps;  
  • Working knowledge of SaaS architecture concepts and designs;
  • Understanding of Kubernetes, including CLI usage and service re-provisioning  
  • Ability to provision and set up metrics along with managing alerts and silences;  
  • Identify Service Level Indicators (SLIs) that align the team with availability and latency objectives;
  • Experience with Linux operating system configuration, package management, and troubleshooting;  
  • Working experience with cloud environments like AZURE or AWS and provisioning infrastructure there;
  • Good cultural fit: clear communication, empathy, curiosity & continuous learning, no blame attitude, but instead supportive.  

Responsibilities:

  • Design, build, and maintain the product cloud infrastructure that enables seamless scaling to support hundreds of thousands of concurrent users;
  • Develop advanced monitoring systems that proactively alert on symptoms, ensuring rapid response to potential issues;
  • Leverage tools like Terraform, GitHub actions, and Kubernetes to efficiently manage our AWS or AZURE infrastructure;  
  • Continuously enhance operational processes, making deployments, upgrades, and other tasks as boring and automated as possible;
  • Collaborate with product engineers on daily basis and influence product architectures designs;
  • Be part of an on-call (PagerDuty) rotation to respond swiftly to incidents affecting availability, offering support to product engineers during customer incidents;
  • Act as a reliability champion for stable counterpart assignments, ensuring a robust and resilient infrastructure;
  • Propose innovative ideas and solutions within the SRE organization and engineering;  
  • Plan, design, and execute solutions to achieve goals agreed upon by the team;
  • Leading by example with positive and inclusive attitude and fostering constructive discussions between SRE and engineering;  
  • Proactively identify opportunities to enhance system availability and performance by applying insights gained from monitoring and observation;  
  • Share your learnings with the wider community;  
  • Be the first responder during emergencies and on-call duties, promptly addressing symptoms and conducting root cause analysis to implement corrective actions and prevent recurring issues.

What We Offer:

  • Financial stability and competitive compensation;
  • Transparent professional and career growth development plans;  
  • Сareer development: horizontal, professional, managerial;  
  • Health insurance, life, and accident insurance (opportunity to insure relatives at the corporate rate);  
  • E-education, certification coverage, access to language learning platform goFluent;  
  • Remote work or work from the office;  
  • Flexible work schedule;  
  • Referral bonus.

Apply now

Senior Site Reliability Engineer