Site Reliability Engineer | Mid - Senior | NordVPN
Job Description
Are you excited by the challenge of managing large-scale systems, automating infrastructure, and ensuring seamless service reliability? We’re seeking a Site Reliability Engineer (SRE) to play a key role in shaping the future of our global infrastructure.
Overseeing a global infrastructure of ~10,000 on-prem servers, you’ll tackle unique technical challenges, engineer scalable systems, and have a direct impact on the reliability and performance of our products.
Main Responsibilities
Deliver projects on time: Plan, delegate, execute, and oversee key projects;Collaborate: Work closely with stakeholders and other teams. Mentor colleagues and lead knowledge transfer;Ensure quality and reduce technical debt: Deliver solutions with solid design and address blockers, toil, and debt to keep systems healthy;Drive engineering excellence: Aim for quality and choose the right solution for the problems we face;Protect solution quality: Ensure designs are implemented with proper quality and minimal tech debt;Data‑backed decisions: Help teams and stakeholders navigate data and act on insights;Design and maintain highly available, scalable infrastructure with monitoring, alerting, and anomaly detection;Automate everything: Create and optimize automation to streamline deployments, improve speed, and cut manual work;Solve complex issues: Troubleshoot, debug, and resolve critical issues in complex systems;Use AI: Integrate AI into workflows and processes to speed up delivery and reduce toil.
Core Requirements
Observability: Experience with monitoring tools and frameworks to ensure system observability (OpenSearch, VictoriaMetrics, Prometheus, Thanos, Mimir, OpenTelemetry, Nagios);Databases and storage systems: Experience operating highly available SQL, NoSQL databases, and object stores at scale (MySQL, Percona, PostgreSQL, Cassandra, ClickHouse, Timescale, Druid, MinIO);Data visualization: Ability to build meaningful dashboards that show the right insights (Grafana, OpenSearch Dashboards);Alerting and anomaly detection: Ability to build anomaly detection and alerting pipelines;Programming: Proficiency in one or more programming languages for automation scripts and integrations (Python, Go, Rust, C);Linux: Strong knowledge of Linux systems, especially Debian‑based distributions;Workflow: Ability to use workflow automation frameworks (Airflow, Prefect, n8n);Configuration management: Ability to design and develop configuration management codebases and deployment pipelines (SaltStack, Ansible, Rundeck);Networking: Strong understanding of networking protocols and concepts (Overlay, VPN, Proxy, DNS, HTTP, SSL, TCP, UDP);Security: Ability to design secure systems and working knowledge of security concepts and tools (Vault, PKI, mTLS).Salary Range
Gross Salary 23300 - 34000 PLN/Month