Over 6298 new job opportunities are waiting for you!

Staff Site Reliability Engineer

Job Description

We are seeking an experienced Site Reliability Engineer with solid software engineering skills and practical knowledge of operating modern cloud‑native infrastructure. In this position, you will contribute to building and scaling our AI platform by ensuring our systems on AWS, Kubernetes, and GitOps workflows are reliable, observable, and automated. The ideal candidate will have advanced technical expertise, excellent communication abilities, and a talent for collaborating well with engineering teams.

📍 Location: This is a fully remote position located in Colombia.

You will be reporting to:

Amir Toole

Contact:

Maira Russo - Senior Talent Acquisition Partner

Key Responsibilities

Maintain reliable, high‑performing AWS production systems.

Manage EKS clusters for configuration, scaling, and workload stability.

Set up and support Istio service mesh for traffic control and security.

Oversee GitOps workflows to ensure secure, consistent infrastructure changes.

Create automation tools and platform enhancements.

Design, implement, and manage monitoring, logging, and tracing solutions across a diverse range of applications—including AI workloads, microservices, and data pipelines—to ensure visibility, reliability, and rapid issue resolution.

Respond to incidents, analyze root causes, and recommend lasting solutions.

Work with developers and platform teams to enhance deployments and system operations.

Support nx‑based monorepos for scalable, effective developer workflows.

Technical Skills

Deep understanding of AWS services commonly used in production (EKS, EC2, IAM, networking, load balancing, etc.).

Professional experience with Kubernetes (EKS), including workload autoscaling, networking, RBAC, and cluster operations.

Hands‑on experience with service meshes, specifically Istio.

Expertise with GitHub, GitHub Actions, and modern CI/CD workflows.

Experience working with monorepos, especially nx.

Understanding of GitOps practices (we use Flux CD).

Strong grasp of Linux systems, networking, containers, and Docker.

Familiarity with infrastructure‑as‑code: CDK, Terraform.

Knowledge of SLOs, error budgets, incident management, and production readiness best practices.

Strong English language communication and collaboration skills

Soft Skills

Excellent communication and cross‑team collaboration.

Strong analytical thinking and problem‑solving abilities.

Bias toward ownership, clarity, and operational excellence.

Perks & Benefits

Contrato a termino Indefinido with all the legal benefits

Prepaid Medicine

Life insurance and funeral assistance

Internet allowance

Home office stipend

Competitive compensation — above the market average

100% remote work environment and an excellent work-life balance

Opportunity to work for a growing global SaaS leader company

A culture that promotes independence, innovation, trust, and accountability

Open space to be creative, innovative, and strategize for the future

Mentorship by a highly experienced professional

Budget for training, we want you to grow

5 Personal Time Off days per year

Sick Leave Top up to total 100% of salary paid by the employer from Day 3 to 90.

Recognition Award, additional paid time off in recognition of the corresponding year of service

Upgrade vacation starting at 5 years of service

Site Reliability Engineer II

Job Description The Site Reliability Engineer II will be responsible for supporting, enhancing, and maintaining Restaurant365’s cloud infrastructure and applications. Qualified candidates will demonstrate growing expertise in site reliability practices, with skills in incident response, sys

Staff Site Reliability Engineer

Job Description About Zscaler Zscaler is a pioneer and global leader in zero trust security. The world’s largest businesses, critical infrastructure organizations, and government agencies rely on Zscaler to secure users, branches, applications, data & devices, and to accelerate digital tran

Senior Site Reliability Engineer

Job Description Senior Site Reliability Engineer (Enterprise Platform)Location: Remote - US - Open to Europe if happy to overlap with ESTCompensation: CompetitiveWe are a high-growth software company supporting the development of a premier open-source, EVM-compatible public ledger built for