Senior Site Reliability Engineer

We are looking for a Site Reliability Engineer (SRE) to join our team and help design, build, and maintain scalable, reliable infrastructure and operational processes. This role involves managing infrastructure as code, implementing monitoring and alerting systems, and supporting our production environment to ensure high availability and performance.

Type:

Remote

Job ID:

JR-108091

Apply now
Technologies:
Linux
Python
Terraform
Kubernetes
Docker
Prometheus
Grafana
CI\CD
ELK stack
Kafka
Locations:
Spain
Portugal

Table of contents

Apply now
Let’s be in touch!

Job (Project) Description

Customertimes is a global digital engineering, product development, and technology consulting company. Headquartered in New York, we have a team of 1300+ experts and offices in 12 countries.  

Requirements:

Strong experience with Linux systems administration (advanced user level);

Hands-on experience with Infrastructure as Code (Terraform);

Experience with CI/CD practices and tools;

Proficiency with Docker and Kubernetes for container orchestration;

Experience with monitoring and alerting systems (Prometheus, Grafana, Elastic Stack);

Familiarity with operational practices such as runbooks and on-duty/on-call support

Nice to Have:

Basic knowledge of Apache Spark (understanding how it works);

Familiarity with Kafka for event streaming;

Python skills at a junior/mid level for scripting or automation;


Responsibilities:

Design, build, and maintain infrastructure using Infrastructure as Code (IaC) tools such as Terraform;

Implement and manage CI/CD pipelines for application and infrastructure deployment;

Manage containerized workloads with Docker and Kubernetes (K8s);

Monitor, troubleshoot, and optimize Linux systems (CPU, processes, I/O, logs);

Set up and maintain logs, monitoring, and alerting systems using Prometheus, Grafana, Elastic Stack, or similar tools;

Maintain and improve runbooks for operational support and on-call duties;

Collaborate with development and operations teams to ensure system reliability, scalability, and security

What We Offer:

  • Competitive salary;
  • 100% remote opportunity;
  • Opportunities for professional growth and advancement;
  • A cooperative and innovative work environment;
  • 20 days of paid vacation, 15 paid days of sick leave with a doctor’s note, and 5 days of paid sick leave without a doctor’s note;
  • Medical insurance coverage for employees, with optional family insurance at a corporate rate;
  • Support for participation in professional development opportunities (webinars, conferences, trainings, etc.);
  • Regular team-building activities and bi-annual company-wide events

Apply now

Senior Site Reliability Engineer